What Is Overfitting?
Memorisation is not learning. A model that memorises training examples can reproduce them perfectly but has learned nothing it can apply to new data. That is overfitting: strong training performance, poor generalisation.
To measure this, you split your data into two sets before training begins. The training set is what the model learns from. The test set is held back entirely — the model never sees it during training. After training, you measure error on both:
- Train error — how wrong the model is on the examples it was trained on. A low train error means the model fits the training data well.
- Test error — how wrong the model is on unseen examples. A low test error means the model generalises — it has learned something real, not just memorised the training data.
The gap between the two is the signal. A model with low train error and high test error has memorised rather than learned.
Every model faces a choice between two failure modes:
| Failure | Cause | Symptom | Name |
|---|---|---|---|
| Too simple | Ignored real patterns | High error on both train and test | Underfitting — high bias |
| Too complex | Memorised training noise | Low train error, high test error | Overfitting — high variance |
The goal is a model that sits between these extremes — one that learns the real pattern without memorising the noise.
Seeing It in Regression — Crop Yield
Say you have measurements from eight farms: rainfall (mm) and crop yield (t/ha). One farm recorded an unusually high yield at moderate rainfall — a fluke caused by an exceptional soil batch that season. You want a model that predicts yield for new farms you have not seen yet.
Three models are possible:
- Underfit (high bias) — a nearly flat line. The algorithm barely responds to rainfall. It fits the training farms poorly and fits new farms equally poorly — it is wrong everywhere, not just on unseen data.
- Good fit — a straight line with the right slope. It fits the training farms well, and it fits new farms well too — the train error and test error are both low. It correctly ignores the anomaly because that spike was a fluke, not a real pattern.
- Overfit (high variance) — a high-degree polynomial that bends to hit every training point, including the anomaly. It fits the training farms perfectly (near-zero train error), but it predicts ~6.8 t/ha at 130mm rainfall for new farms — wrong. Test error is much higher than train error. The model memorised the specific examples it was trained on, not the pattern.
The names come from the type of error each failure produces. Bias is a systematic error — an underfit model carries a built-in bias toward a fixed, oversimplified prediction. A flat line always predicts near the same value no matter how much rainfall there was. It is not responding to the input; it has already decided what to predict. That stubborn pull toward one answer is the "bias." Variance is sensitivity to the training set — an overfit model's predictions vary wildly when the training data changes slightly. Swap a few training examples and you get a completely different curve. The model learned which specific examples were in this training set, not the underlying pattern, so any change to the set changes the model.
Click each option to see what happens to the fitted curve:
The overfit model performs perfectly on the training farms. On any new farm, it performs worse than the simple straight line. Strong training performance is not the goal — generalisation is.
Why Overfitting Is Bad — High Variance
A model that overfits is highly sensitive to small changes in the training set. If you remove one farm from the dataset, re-train, and get a completely different curve — that is high variance. The model learned the specific noise in this particular sample of farms, not the relationship between rainfall and yield that holds across all farms.
| Train error | Test error | Changes a lot if you swap training data? | |
|---|---|---|---|
| Underfit | High | High | No — equally wrong everywhere |
| Good fit | Low | Low | No — learned the real pattern |
| Overfit | Very low | Much higher | Yes — memorised this specific sample |
You train two models on crop yield data. Model A has train error 0.4 and test error 0.42. Model B has train error 0.05 and test error 1.8. Which one is overfitting?
Overfitting in Classification — Logistic Regression
Overfitting is not limited to regression. It happens in classification too. In logistic regression, an overfit model draws a decision boundary that perfectly separates all training patients — including the ambiguous boundary cases — by bending the boundary in complex ways.
Consider the diabetes prediction problem with two features: glucose level and BMI. Most non-diabetic patients cluster in the low-glucose, low-BMI region. Most diabetic patients cluster in the high-glucose, high-BMI region. A few patients sit in the boundary zone — slightly elevated glucose but low BMI, or moderate glucose but higher BMI.
Three classifiers are possible:
- Underfit (high bias) — a horizontal boundary. It only uses BMI to classify patients and completely ignores glucose. It misclassifies most of the diabetic patients who have moderate BMI.
- Good fit — a diagonal boundary using both features. It handles the main clusters well. It misses a few boundary zone patients — correctly, because those cases are genuinely ambiguous.
- Overfit (high variance) — a wiggly boundary that perfectly separates all training patients, including the ambiguous ones. On a new patient near the boundary zone, it will give an unreliable prediction.
A perfect training accuracy in classification is a warning sign, not a goal. It almost always means the model has memorised the training labels rather than learning the pattern that generates them.
The Bias-Variance Tradeoff
As you increase model complexity, two things happen simultaneously. Bias falls — the model can fit more complex real patterns. Variance rises — the model becomes more sensitive to noise in the specific training sample. Total error is the sum of both. The sweet spot is the complexity level where total error is lowest.
What does 'high variance' mean in the context of overfitting?
