The Bias-Variance Tradeoff

Train Error and Test Error

You already know that supervised learning splits data into a training set and a test set. The training set is what the model learns from. The test set is held back and used to measure real-world performance.

These two sets give you two error numbers:

Train error — the model's error on the examples it was trained on. Measures how well the model fits the data it has already seen.
Test error — the model's error on the held-out examples. Measures how well it generalises to new, unseen data.

The gap between the two is the most useful signal in model evaluation. A small gap means the model learned something real. A large gap means it memorised the training data without learning the underlying pattern.

What Is the Bias-Variance Tradeoff?

Every supervised model makes errors. Those errors come from two distinct sources: bias and variance. Understanding which one is hurting your model tells you exactly what to fix.

Bias — error from wrong assumptions. A high-bias model is too simple; it cannot capture the true pattern in the data and performs poorly even on training data.
Variance — error from sensitivity to noise. A high-variance model is too complex; it memorises the training data (including its noise) and fails on new examples.

The tradeoff is that reducing one tends to increase the other. Increasing model complexity lowers bias but raises variance. The goal is the sweet spot where total error is minimised.

Diagram

As model complexity grows, bias falls but variance rises. Total error is minimised at the sweet spot between underfitting and overfitting.

High Bias (underfitting): model too simple → high train error AND high test error High Variance (overfitting): model too complex → low train error BUT high test error

Diagnosing Bias vs Variance

The fastest diagnostic is to compare training error and test error side by side.

Symptom	Train Error	Test Error	Diagnosis
Both errors high	High	High	High bias — underfitting
Big gap between them	Low	High	High variance — overfitting
Both errors low	Low	Low	Well-fitted model
Both errors medium	Medium	Medium	May need more data or a better model

For e.g., a model that scores 99% on training data but only 61% on the test set has low bias but high variance — it memorised the training set instead of learning generalisable patterns.

Quick Check

A model scores 99% on training data but only 61% on the test set. What is the most likely problem?

What Causes Each?

High Bias (Underfitting)

A model underfits when it is not expressive enough to represent the true relationship in the data. Common causes include choosing a model that is too simple for the problem, insufficient training time, or over-aggressive regularisation that constrains the model too tightly.

For e.g., fitting a straight line (linear regression) to data that follows a curved pattern will always underfit — the model cannot bend to match the data no matter how much it trains.
Fixes: use a more complex model, add more features, reduce regularisation strength.

High Variance (Overfitting)

A model overfits when it learns the training data too well — including random noise that does not reflect the real-world pattern. This happens with very complex models trained on small datasets.

For e.g., a deep decision tree with no depth limit will perfectly classify every training example but fail badly on new data because it has essentially memorised the training set.
Fixes: regularisation (L1/L2, dropout), more training data, early stopping, cross-validation to detect it early.

Quick Check

A linear model achieves 58% accuracy on training data and 57% on the test set. What does this suggest?

The Total Error Equation

Total error can be decomposed into three parts:

Total Error = Bias² + Variance + Irreducible Noise

Bias² — how far the average prediction is from the truth.
Variance — how much predictions scatter around their average.
Irreducible noise — randomness inherent in the data that no model can remove.

The irreducible noise sets a floor on how well any model can do. Optimising means reducing bias² + variance — and because they pull in opposite directions, there is always a tradeoff.

Quick Check

Which component of total error cannot be reduced no matter how good the model is?

Train Error and Test Error

What Is the Bias-Variance Tradeoff?

Diagnosing Bias vs Variance

What Causes Each?

High Bias (Underfitting)

High Variance (Overfitting)

The Total Error Equation

Test Your Knowledge