Overfitting in Logistic Regression

What Is Overfitting?

Memorisation is not learning. A model that memorises training examples can reproduce them perfectly but has learned nothing it can apply to new data. That is overfitting: strong training performance, poor generalisation.

To measure this, you split your data into two sets before training begins. The training set is what the model learns from. The test set is held back entirely — the model never sees it during training. After training, you measure error on both:

Train error — how wrong the model is on the examples it was trained on. A low train error means the model fits the training data well.
Test error — how wrong the model is on unseen examples. A low test error means the model generalises — it has learned something real, not just memorised the training data.

The gap between the two is the signal. A model with low train error and high test error has memorised rather than learned.

Every model faces a choice between two failure modes:

Failure	Cause	Symptom	Name
Too simple	Ignored real patterns	High error on both train and test	Underfitting — high bias
Too complex	Memorised training noise	Low train error, high test error	Overfitting — high variance

The goal is a model that sits between these extremes — one that learns the real pattern without memorising the noise.

Seeing It in Regression — Crop Yield

Say you have measurements from eight farms: rainfall (mm) and crop yield (t/ha). One farm recorded an unusually high yield at moderate rainfall — a fluke caused by an exceptional soil batch that season. You want a model that predicts yield for new farms you have not seen yet.

Three models are possible:

Underfit (high bias) — a nearly flat line. The algorithm barely responds to rainfall. It fits the training farms poorly and fits new farms equally poorly — it is wrong everywhere, not just on unseen data.
Good fit — a straight line with the right slope. It fits the training farms well, and it fits new farms well too — the train error and test error are both low. It correctly ignores the anomaly because that spike was a fluke, not a real pattern.
Overfit (high variance) — a high-degree polynomial that bends to hit every training point, including the anomaly. It fits the training farms perfectly (near-zero train error), but it predicts ~6.8 t/ha at 130mm rainfall for new farms — wrong. Test error is much higher than train error. The model memorised the specific examples it was trained on, not the pattern.

The names come from the type of error each failure produces. Bias is a systematic error — an underfit model carries a built-in bias toward a fixed, oversimplified prediction. A flat line always predicts near the same value no matter how much rainfall there was. It is not responding to the input; it has already decided what to predict. That stubborn pull toward one answer is the "bias." Variance is sensitivity to the training set — an overfit model's predictions vary wildly when the training data changes slightly. Swap a few training examples and you get a completely different curve. The model learned which specific examples were in this training set, not the underlying pattern, so any change to the set changes the model.

Click each option to see what happens to the fitted curve:

Diagram

Three fits to the same crop yield data. The overfit curve spikes to include the anomaly — a fluke data point. On new farms it will give wildly wrong predictions at 130mm rainfall.

The overfit model performs perfectly on the training farms. On any new farm, it performs worse than the simple straight line. Strong training performance is not the goal — generalisation is.

Why Overfitting Is Bad — High Variance

A model that overfits is highly sensitive to small changes in the training set. If you remove one farm from the dataset, re-train, and get a completely different curve — that is high variance. The model learned the specific noise in this particular sample of farms, not the relationship between rainfall and yield that holds across all farms.

	Train error	Test error	Changes a lot if you swap training data?
Underfit	High	High	No — equally wrong everywhere
Good fit	Low	Low	No — learned the real pattern
Overfit	Very low	Much higher	Yes — memorised this specific sample

Quick Check

You train two models on crop yield data. Model A has train error 0.4 and test error 0.42. Model B has train error 0.05 and test error 1.8. Which one is overfitting?

Overfitting in Classification — Logistic Regression

Overfitting is not limited to regression. It happens in classification too. In logistic regression, an overfit model draws a decision boundary that perfectly separates all training patients — including the ambiguous boundary cases — by bending the boundary in complex ways.

Consider the diabetes prediction problem with two features: glucose level and BMI. Most non-diabetic patients cluster in the low-glucose, low-BMI region. Most diabetic patients cluster in the high-glucose, high-BMI region. A few patients sit in the boundary zone — slightly elevated glucose but low BMI, or moderate glucose but higher BMI.

Three classifiers are possible:

Underfit (high bias) — a horizontal boundary. It only uses BMI to classify patients and completely ignores glucose. It misclassifies most of the diabetic patients who have moderate BMI.
Good fit — a diagonal boundary using both features. It handles the main clusters well. It misses a few boundary zone patients — correctly, because those cases are genuinely ambiguous.
Overfit (high variance) — a wiggly boundary that perfectly separates all training patients, including the ambiguous ones. On a new patient near the boundary zone, it will give an unreliable prediction.

Diagram

Three decision boundaries on the same diabetes training data. The overfit boundary contorts to perfectly classify every training patient — but its complex shape will misclassify new patients near the boundary zone.

A perfect training accuracy in classification is a warning sign, not a goal. It almost always means the model has memorised the training labels rather than learning the pattern that generates them.

The Bias-Variance Tradeoff

As you increase model complexity, two things happen simultaneously. Bias falls — the model can fit more complex real patterns. Variance rises — the model becomes more sensitive to noise in the specific training sample. Total error is the sum of both. The sweet spot is the complexity level where total error is lowest.

Diagram

As complexity increases, bias falls but variance rises. The sweet spot minimises total error — the point where the model is complex enough to capture the real pattern but not so complex that it memorises noise.

Quick Check

What does 'high variance' mean in the context of overfitting?

What Is Overfitting?

Seeing It in Regression — Crop Yield

Why Overfitting Is Bad — High Variance

Overfitting in Classification — Logistic Regression

The Bias-Variance Tradeoff

Test Your Knowledge