AAI Logo
Loading...
AAI Logo
Loading...
Machine Learning
Machine LearningBeginner

The Bias-Variance Tradeoff

biasvarianceoverfittingunderfittingmodel evaluation
No reviews yet — be the first!

Train Error and Test Error

You already know that supervised learning splits data into a training set and a test set. The training set is what the model learns from. The test set is held back and used to measure real-world performance.

These two sets give you two error numbers:

  • Train error — the model's error on the examples it was trained on. Measures how well the model fits the data it has already seen.
  • Test error — the model's error on the held-out examples. Measures how well it generalises to new, unseen data.

The gap between the two is the most useful signal in model evaluation. A small gap means the model learned something real. A large gap means it memorised the training data without learning the underlying pattern.

What Is the Bias-Variance Tradeoff?

Every supervised model makes errors. Those errors come from two distinct sources: bias and variance. Understanding which one is hurting your model tells you exactly what to fix.

  • Bias — error from wrong assumptions. A high-bias model is too simple; it cannot capture the true pattern in the data and performs poorly even on training data.
  • Variance — error from sensitivity to noise. A high-variance model is too complex; it memorises the training data (including its noise) and fails on new examples.

The tradeoff is that reducing one tends to increase the other. Increasing model complexity lowers bias but raises variance. The goal is the sweet spot where total error is minimised.

Diagram
sweet spotUnderfitting(high bias)Overfitting(high variance)0.250.500.751.00ErrorModel Complexity →Bias²VarianceTotalThe sweet spot minimises total error — neither too simple nor too complex.
As model complexity grows, bias falls but variance rises. Total error is minimised at the sweet spot between underfitting and overfitting.

High Bias (underfitting): model too simple → high train error AND high test error High Variance (overfitting): model too complex → low train error BUT high test error

Diagnosing Bias vs Variance

The fastest diagnostic is to compare training error and test error side by side.

SymptomTrain ErrorTest ErrorDiagnosis
Both errors highHighHighHigh bias — underfitting
Big gap between themLowHighHigh variance — overfitting
Both errors lowLowLowWell-fitted model
Both errors mediumMediumMediumMay need more data or a better model

For e.g., a model that scores 99% on training data but only 61% on the test set has low bias but high variance — it memorised the training set instead of learning generalisable patterns.

Quick Check

A model scores 99% on training data but only 61% on the test set. What is the most likely problem?

What Causes Each?

High Bias (Underfitting)

A model underfits when it is not expressive enough to represent the true relationship in the data. Common causes include choosing a model that is too simple for the problem, insufficient training time, or over-aggressive regularisation that constrains the model too tightly.

  • For e.g., fitting a straight line (linear regression) to data that follows a curved pattern will always underfit — the model cannot bend to match the data no matter how much it trains.
  • Fixes: use a more complex model, add more features, reduce regularisation strength.

High Variance (Overfitting)

A model overfits when it learns the training data too well — including random noise that does not reflect the real-world pattern. This happens with very complex models trained on small datasets.

  • For e.g., a deep decision tree with no depth limit will perfectly classify every training example but fail badly on new data because it has essentially memorised the training set.
  • Fixes: regularisation (L1/L2, dropout), more training data, early stopping, cross-validation to detect it early.
Quick Check

A linear model achieves 58% accuracy on training data and 57% on the test set. What does this suggest?

The Total Error Equation

Total error can be decomposed into three parts:

Total Error = Bias² + Variance + Irreducible Noise

  • Bias² — how far the average prediction is from the truth.
  • Variance — how much predictions scatter around their average.
  • Irreducible noise — randomness inherent in the data that no model can remove.

The irreducible noise sets a floor on how well any model can do. Optimising means reducing bias² + variance — and because they pull in opposite directions, there is always a tradeoff.

Quick Check

Which component of total error cannot be reduced no matter how good the model is?

Test Your Knowledge

Ready to check how much you remember? Take the quiz for The Bias-Variance Tradeoff and see your score on the leaderboard.

Take the Quiz

Up next

In the next module, we cover overfitting and regularization — why models memorise training noise, the three approaches to fix it, and the regularized cost function.

Overfitting & Regularization