AAI Logo
Loading...
AAI Logo
Loading...
Deep Learning
Deep LearningIntermediate

Overfitting in Logistic Regression

overfittingunderfittingbias variance tradeofflogistic regressionregularization
No reviews yet — be the first!

What Is Overfitting?

Memorisation is not learning. A model that memorises training examples can reproduce them perfectly but has learned nothing it can apply to new data. That is overfitting: strong training performance, poor generalisation.

To measure this, you split your data into two sets before training begins. The training set is what the model learns from. The test set is held back entirely — the model never sees it during training. After training, you measure error on both:

  • Train error — how wrong the model is on the examples it was trained on. A low train error means the model fits the training data well.
  • Test error — how wrong the model is on unseen examples. A low test error means the model generalises — it has learned something real, not just memorised the training data.

The gap between the two is the signal. A model with low train error and high test error has memorised rather than learned.

Every model faces a choice between two failure modes:

FailureCauseSymptomName
Too simpleIgnored real patternsHigh error on both train and testUnderfitting — high bias
Too complexMemorised training noiseLow train error, high test errorOverfitting — high variance

The goal is a model that sits between these extremes — one that learns the real pattern without memorising the noise.

Seeing It in Regression — Crop Yield

Say you have measurements from eight farms: rainfall (mm) and crop yield (t/ha). One farm recorded an unusually high yield at moderate rainfall — a fluke caused by an exceptional soil batch that season. You want a model that predicts yield for new farms you have not seen yet.

Three models are possible:

  1. Underfit (high bias) — a nearly flat line. The algorithm barely responds to rainfall. It fits the training farms poorly and fits new farms equally poorly — it is wrong everywhere, not just on unseen data.
  2. Good fit — a straight line with the right slope. It fits the training farms well, and it fits new farms well too — the train error and test error are both low. It correctly ignores the anomaly because that spike was a fluke, not a real pattern.
  3. Overfit (high variance) — a high-degree polynomial that bends to hit every training point, including the anomaly. It fits the training farms perfectly (near-zero train error), but it predicts ~6.8 t/ha at 130mm rainfall for new farms — wrong. Test error is much higher than train error. The model memorised the specific examples it was trained on, not the pattern.

The names come from the type of error each failure produces. Bias is a systematic error — an underfit model carries a built-in bias toward a fixed, oversimplified prediction. A flat line always predicts near the same value no matter how much rainfall there was. It is not responding to the input; it has already decided what to predict. That stubborn pull toward one answer is the "bias." Variance is sensitivity to the training set — an overfit model's predictions vary wildly when the training data changes slightly. Swap a few training examples and you get a completely different curve. The model learned which specific examples were in this training set, not the underlying pattern, so any change to the set changes the model.

Click each option to see what happens to the fitted curve:

Diagram
Crop Yield vs Rainfall — Three Model Fits0246870100130160200Rainfall (mm)Yield (t/ha)anomalyJUST RIGHTA straight line captures the real trend. It correctly ignores the anomaly.New farm data will mostly fall near this line. Good generalisation.
Three fits to the same crop yield data. The overfit curve spikes to include the anomaly — a fluke data point. On new farms it will give wildly wrong predictions at 130mm rainfall.

The overfit model performs perfectly on the training farms. On any new farm, it performs worse than the simple straight line. Strong training performance is not the goal — generalisation is.

Why Overfitting Is Bad — High Variance

A model that overfits is highly sensitive to small changes in the training set. If you remove one farm from the dataset, re-train, and get a completely different curve — that is high variance. The model learned the specific noise in this particular sample of farms, not the relationship between rainfall and yield that holds across all farms.

Train errorTest errorChanges a lot if you swap training data?
UnderfitHighHighNo — equally wrong everywhere
Good fitLowLowNo — learned the real pattern
OverfitVery lowMuch higherYes — memorised this specific sample
Quick Check

You train two models on crop yield data. Model A has train error 0.4 and test error 0.42. Model B has train error 0.05 and test error 1.8. Which one is overfitting?

Overfitting in Classification — Logistic Regression

Overfitting is not limited to regression. It happens in classification too. In logistic regression, an overfit model draws a decision boundary that perfectly separates all training patients — including the ambiguous boundary cases — by bending the boundary in complex ways.

Consider the diabetes prediction problem with two features: glucose level and BMI. Most non-diabetic patients cluster in the low-glucose, low-BMI region. Most diabetic patients cluster in the high-glucose, high-BMI region. A few patients sit in the boundary zone — slightly elevated glucose but low BMI, or moderate glucose but higher BMI.

Three classifiers are possible:

  1. Underfit (high bias) — a horizontal boundary. It only uses BMI to classify patients and completely ignores glucose. It misclassifies most of the diabetic patients who have moderate BMI.
  2. Good fit — a diagonal boundary using both features. It handles the main clusters well. It misses a few boundary zone patients — correctly, because those cases are genuinely ambiguous.
  3. Overfit (high variance) — a wiggly boundary that perfectly separates all training patients, including the ambiguous ones. On a new patient near the boundary zone, it will give an unreliable prediction.
Diagram
Diabetes Classification — Three Decision Boundaries202530354080110140170200Glucose (mg/dL)BMINon-Diabetic (y=0)Diabetic (y=1)Crossover casesJUST RIGHTA glucose threshold cleanly separates the main clusters and uses the right feature.Misses the 6 crossover patients — they are genuine edge cases. Good generalisation.
Three decision boundaries on the same diabetes training data. The overfit boundary contorts to perfectly classify every training patient — but its complex shape will misclassify new patients near the boundary zone.

A perfect training accuracy in classification is a warning sign, not a goal. It almost always means the model has memorised the training labels rather than learning the pattern that generates them.

The Bias-Variance Tradeoff

As you increase model complexity, two things happen simultaneously. Bias falls — the model can fit more complex real patterns. Variance rises — the model becomes more sensitive to noise in the specific training sample. Total error is the sum of both. The sweet spot is the complexity level where total error is lowest.

Diagram
sweet spotUnderfitting(high bias)Overfitting(high variance)0.250.500.751.00ErrorModel Complexity →Bias²VarianceTotalThe sweet spot minimises total error — neither too simple nor too complex.
As complexity increases, bias falls but variance rises. The sweet spot minimises total error — the point where the model is complex enough to capture the real pattern but not so complex that it memorises noise.
Quick Check

What does 'high variance' mean in the context of overfitting?

Test Your Knowledge

Ready to check how much you remember? Take the quiz for Overfitting in Logistic Regression and see your score on the leaderboard.

Take the Quiz

Up next

Next, we cover regularization — the set of techniques that fix overfitting by constraining what the model is allowed to learn.

Regularization for Logistic Regression