What Is Regression?
Regression is a type of supervised learning where the output y is a continuous numerical value — a real number rather than a category. The algorithm learns a function f(x) that maps inputs to a number anywhere on a scale.
The defining question for regression is: are we predicting a number? If yes, it's regression.
Regression predicts a number. The output y can be any value — 2.3 tonnes, 340,000 dollars, 21.7°C. There are no fixed categories, just a continuous range.
The Crop Yield Problem
Suppose a farmer wants to predict how many tonnes of wheat a field will produce before the harvest. Several factors are available as inputs.
The inputs — also called features — are the variables the algorithm uses to make its prediction:
| Feature (x) | What it measures |
|---|---|
| Soil quality | Nutrient content, pH level |
| Rainfall | Total mm of rain in the growing season |
| Temperature | Average °C during crop growth |
| Fertilizer | Amount applied in kg/hectare |
The output y is the crop yield in tonnes per hectare — a number. It could be 3.2, 5.7, or 8.1 depending on the conditions. This is why the problem is regression, not classification: we are not predicting a category like "good crop / bad crop" — we are predicting an exact number.
Why is crop yield prediction a regression problem and not a classification problem?
Which Function Fits the Data?
We have the inputs and the correct outputs from historical farm records. The algorithm's job is to find a function f(x) that maps those inputs to the yield.
But which kind of function? A straight line through the data? A curve? Something more complex?
This is the central question of regression: choosing the right function family, fitting it to the data, and evaluating how well it generalises to new fields the model has never seen. We will answer this in detail later in the track — covering linear regression, polynomial regression, and how to measure which fit is actually best.
A model is trained to predict house prices. The output is a dollar amount. Which type of supervised learning is this?
