The other day, my colleague asked, “Why can’t I just give our dataset to ChatGPT and ask it to build a predictive model?”

To be honest, it’s a good question. With the incredible pace that the AI sphere is moving, it feels like we should be able to do just that.

Large Language Models (LLMs) like ChatGPT, Gemini, and Claude have shown an almost magical ability to understand and generate human-like text, write reasonably good code, summarise documents, and even help with some data analysis tasks (but more on that latter). So, the idea of simply handing over terabytes of data and asking for a top-tier predictive model to be built, trained, and deployed is incredibly appealing.

But they can’t do it. And it’s a fairly straightforward no.

Why LLMs can’t do data

When most people think of “AI”, what they mean (at least these days) are LLMs.

While LLMs are undeniably powerful (and can indeed assist in many parts of the machine learning lifecycle), they are not currently designed to independently take a large, raw dataset and autonomously build, train, and deploy a predictive model in the way a dedicated data scientist or specialised AutoML platform would.

Why? The clue is in the name. LLMs are language models, not numeric. Although there’s a lot more to it in terms of how the model learns, contextualises your questions, and rationalises its answers, these models still share an awful lot in terms of how they work with the auto-complete on your phone.

Great for words, but awful with a CSV.

*I asked ChatGPT to imagine itself struggling with big data…*

Think of an LLM as an incredibly versatile language assistant. It can draft emails, write useful scripts, research topics, and explain concepts. But you probably wouldn’t ask it to single-handedly design, engineer, and launch a mission-critical rocket, because that involves handling numbers, simulation, and a deep understanding of the domain.

Building SOTA predictive models on “big” datasets is a similarly complex, specialised endeavor, and in particular they struggle with:

“Context windows” (or the data is too big): Most LLMs have limitations on the amount of data they can “see” or process at once (their “context window”). “Big Data” is, by definition, big! You can’t simply paste terabytes of data into a prompt. While techniques exist to work with larger data in chunks for analysis, training a complex predictive model on the entirety of a massive dataset through this interface is currently impractical.
Generative vs predictive power: LLMs are fundamentally generative models. They excel at predicting the next word or code token in a sequence, while predictive models (like those for fraud detection, sales forecasting, or customer turnover) are often discriminative models or other complex regressors that learn intricate patterns and relationships within structured numerical and categorical data. While LLMs can generate code for these models, they don’t - and indeed can’t - inherently perform the iterative training, deep feature engineering, and nuanced hyperparameter optimisation required for SOTA results on specific datasets themselves.
Data preprocessing and feature engineering: Real-world datasets are messy. Building a SOTA model requires significant data cleaning, preprocessing, transformation, and, crucially, feature engineering. This is often an iterative, domain-knowledge-intensive process. While LLMs can assist with some basic text preprocessing, they generally cannot perform complex data cleaning, transformations and validations that are standard in a robust ML pipeline. The quality of these steps directly impacts model performance.
Specialised algorithm optimisation: SOTA performance often comes from highly specialised algorithms (e.g., XGBoost, LightGBM, custom deep learning architectures) meticulously tuned for the specific problem and data. LLMs can write code to implement these, but they don’t possess the intrinsic capability to select the optimal architecture and tune it from scratch for your unique, large dataset in the same way an AutoML system or a data scientist does.
Computational demands: LLMs are enormous models themselves, requiring substantial computational resources. Using an LLM to then train another potentially large and complex predictive model on Big Data would layer on significant computational overhead, if it were even architecturally feasible for the LLM to manage this directly.
“Hallucinations” and reliability: Amongst the major limitations facing LLMs is their ability to generate incorrect or nonsensical information (“hallucinations”). For critical predictive models that drive business decisions, or in our case potentially put patients at risk, the reliability, explainability, and robustness of the underlying model are paramount. Handing this entire process over to a general-purpose LLM without deep oversight could be risky.
End-to-end MLOps: Building a model is just one part of the equation. Deploying it, monitoring its performance, retraining it as data drifts, and managing the entire MLOps (Machine Learning Operations) lifecycle are specialised tasks that are currently beyond the scope of what an LLM can autonomously handle.

This isn’t to say LLMs aren’t useful in the world of predictive modelling. They can be powerful assistants to data scientists and analysts planning on using these datasets, with utility in:

Code generation: LLMs can generate excellent boilerplate code for data loading, initial transformations, and even model training scripts in various languages. This can be a great time-saver - I regularly use Gemini to error check my code or complete simple tasks - but the code always needs careful review and adaptation by someone with both domain and ML expertise.
Exploring and understanding text: For text-heavy datasets, LLMs can help with summarisation, sentiment analysis, and identifying key themes. They can also help formulate hypotheses about the data.
Explaining concepts: LLMs can be great for understanding complex ML algorithms, or deciphering someone else’s code.
Documentation: They can assist in drafting documentation for your models and data.
Natural language interfacing: Some newer tools are using LLMs to create natural language interfaces for data platforms or to help explain model outputs.

AutoML: ML, made easy (ish)

Having said all this, if the goal is to empower people without extensive coding or data science backgrounds to build powerful predictive models on large datasets, there’s a whole category of tools designed specifically for this: Automated Machine Learning (AutoML) platforms.

AutoML platforms aim to automate the time-consuming, iterative tasks of machine learning model development. Such tools can do all or some of:

Data preprocessing and preparation
Feature engineering and selection
Algorithm selection
Hyperparameter optimisation (finding the best settings for those models)
Model evaluation and comparison
Even assistance with deployment and monitoring

Many of these platforms are designed to handle big data, and strive to produce SOTA or near-SOTA results. They offer graphical user interfaces or low-code APIs, making the power of machine learning much more accessible. Low-code libraries like PyCaret also significantly simplify the ML workflow for those comfortable with writing some of their own Python code.

Interestingly, some AutoML platforms are beginning to integrate LLMs to enhance the user experience, for example, by allowing users to describe their data or desired outcomes in natural language, or by generating more intuitive explanations of model results.

Ultimately, it goes without saying that LLMs are revolutionary tools that are transforming many industries. However, for the specific, complex task of building, training, and deploying SOTA predictive models on big data, they are not yet a replacement for dedicated data science expertise or specialised AutoML platforms.

Think of it this way: an LLM can help you write a recipe and even explain the chemistry of cooking, but a fully equipped professional kitchen (or a sophisticated automated food processor, in our AutoML analogy) is what you need to produce a Michelin-star meal at scale.

What analytics can LLMs do?

Having said all this, there are emerging examples and ongoing research into how LLMs can be applied to specific data analysis tasks like anomaly detection and certain types of prediction, even if they aren’t yet building full SOTA models on massive, diverse datasets independently.

Text-based anomaly detection: LLMs excel at understanding and processing text. This makes them promising for anomaly detection in textual data, such as monitoring system logs or network traffic, as well as identifying unusual language or patterns in financial narratives, insurance claims, or customer communications that could signal fraudulent activity.
Time-series anomaly detection (with a twist): LLMs are being explored to “describe” or “caption” time-series data segments. Anomalies can then be detected if the LLM generates a description that indicates a deviation from normal patterns, or if it struggles to describe a segment coherently using its learned patterns. LLMs can also be trained on large-scale, diverse time-series data to perform zero-shot time-series forecasting and anomaly detection by representing time-series as token sequences, and identify “unknown unknowns” or novel anomalies that rule-based systems might miss.
Prediction tasks: Text-based predictions are a natural fit for LLMs. This includes using sentiment analysis to predict trends, or even predicting the next action in a user session based on sequences of textual or text-like event logs. In addition, for certain types of classification or regression tasks, especially where the input can be re-framed in a way an LLM understands (e.g., tabular data described in text), LLMs can perform surprisingly well with minimal or no specific training examples for that tasks.

The main problem remains that while LLMs can process large amounts of textual data, directly feeding a massive, structured numerical dataset into a standard LLM prompt for it to autonomously build a predictive model is still generally impractical due to context window limits and computational architecture.

The examples above all involve the same pattern of behaviours:

Analysing summaries or metadata of large datasets.
Processing streams of textual data (logs, social media).
Fine-tuning of models on specific types of textual or sequential data for a particular task.
Frameworks designed to handle certain data types (like the time-series LLMs).

However, in many successful uses for prediction on larger datasets, the LLM isn’t the predictive model itself but rather a component in a larger pipeline, as discussed above (e.g., generating features from text that are then fed into a traditional ML model, or generating code for such a model).

One key part of this is how the data is represented to the LLM. Transforming numerical or tabular data into a format (like natural language descriptions or sequences of tokens) that an LLM can effectively process is an active area of research.

So although LLMs are definitely starting to show capabilities in data analysis tasks like anomaly detection and certain kinds of prediction, they are only particularly strong when text or sequential data is involved and when contextual understanding offers an advantage. However, for general-purpose predictive modelling on large, structured datasets such as large medical datasets, they are more often assistants or components within a broader workflow rather than standalone, end-to-end SOTA model builders… Although the field is rapidly evolving!