From Learning Paradigms to a Real Model
In the previous modules you saw that supervised learning maps inputs to outputs using labeled examples. Regression is the type of supervised learning where the output is a continuous number. You have the theory, now let us build an actual model.
What Is Linear Regression?
Linear regression fits a straight line through your training data. It then uses that line to predict a number for any new input. It is the simplest regression model and the starting point for almost everything that follows.
Do not worry if you did not fully understand that definition. It is just the technical description. We will break it down piece by piece using interactive visuals to help you understand the core concepts of linear regression. Let's start with an example:
The Crop Yield Problem
Say you own farmland in your state. You have 20 years of rainfall and yield data from every major plot in your area. It is sitting right there — ready to be used.
Before the next harvest, you want to know: how much wheat will this plot produce? You want to plan your revenue. You know the rainfall forecast. You just need a model to turn that number into a prediction.
That is exactly what linear regression does. One input — rainfall in millimetres. One output — crop yield in tonnes per hectare.
Input x → Model f → Output ŷ (y-hat) • x: rainfall in millimetres (what you measure) • ŷ (y-hat): predicted crop yield in tonnes per hectare (what you want to know) One input, one output — the simplest form of a regression problem.
Let's go through your dataset. Each row is one growing season. It has one input — the rainfall — and one output — the yield you actually measured. Let's have a look at a small sample of the data below:
| Season | Rainfall x (mm) | Yield y (t/ha) |
|---|---|---|
| 1 | 80 | 2.2 |
| 2 | 120 | 3.9 |
| 3 | 145 | 5.0 |
| 4 | 160 | 5.4 |
| 5 | 185 | 6.1 |
| 6 | 100 | 3.1 |
This collection of (x, y) pairs is your input set. You hand it to the learning algorithm. You can already see a pattern — more rainfall means higher yield.
Looking at the dataset, what does each row represent?
Fitting a Line
So you have the data — what next? You try to fit the simplest possible function on it — a linear equation. We will mathematically represent the output in linear terms of the input.
The line will not pass through every point exactly. That is expected. The goal is not to memorise your past seasons — it is to predict new ones.
A Complete Prediction Example
You have a new plot. Forecast rainfall: 130 mm. Wheat is selling at $500 per tonne. You want a prediction before the harvest begins.
Feed x = 130 into your model: f(130) = 0.032 × 130 − 0.34 ≈ 3.82 t/ha.
| Input | Predicted Yield | Price per Tonne | Expected Revenue |
|---|---|---|---|
| x = 130 mm | ŷ (y-hat) = 3.82 t/ha | $500 | $1,910 |
You have a yield estimate and a revenue figure. Before a single grain is harvested. That is the value of a trained model — it takes your historical records and answers questions about inputs it has never seen.
Using the model f(x) = 0.032x − 0.34, what is the predicted yield for a field with 160 mm of rainfall?
Why Supervised? Why Regression?
Why supervised? Your records contain the actual measured yield for each rainfall value. Those correct labels supervise the learning. Without them, the algorithm has nothing to learn from.
Why regression? The output y is a continuous number. Yield can be 3.1, 4.78, 6.1 — any value on a scale. If you were predicting a category like "good harvest" or "poor harvest", it would be classification instead.
Supervised: you train on (x, y) pairs where y is the known correct answer. Regression: the output y is a continuous number, not a fixed category. These two facts together make this problem supervised regression.
Machine Learning Terminology
Every practitioner uses a standard vocabulary. Here it is, grounded in your crop yield records.
The full collection of (x, y) pairs you train on is called the training set. After training, you no longer need it. The model f captures everything the algorithm learned.
| Term | Symbol | Also called | Example |
|---|---|---|---|
| Input variable | x | feature, input feature | Rainfall in mm |
| Output variable | y | target variable, target | Crop yield in t/ha |
| Number of examples | m | — | hundreds (20 years of records) |
| One training example | (x, y) | — | (145, 5.0) |
| The i-th example | (x(i), y(i)) | — | (x(3), y(3)) = (145, 5.0) |
| Prediction | ŷ | y-hat | ŷ = 4.78 for x = 160 |
For e.g., (x(3), y(3)) = (145, 5.0) is the third row in your table — 145 mm of rain, 5.0 t/ha of yield. The superscript is an index, not a power.
In the notation (x(i), y(i)), what does the superscript i indicate?
How Training Leads to Prediction
All the pieces connect in one pipeline. You feed your training set to the learning algorithm — every (x(i), y(i)) pair. The algorithm produces a function f. That function is your trained model.
Once f is trained, you no longer need the training set. You pass a new x to f and get a prediction ŷ (y-hat) back.
y is the actual yield — the number you measured in a past season. ŷ (y-hat) is the model's prediction. y is real. ŷ is an estimate.
The Model: f(x) = wx + b
The simplest function f is a straight line. In machine learning, you write it as:
| Symbol | Name | Also called | Role |
|---|---|---|---|
| x | input feature | — | The value you feed in (e.g., rainfall in mm) |
| w | weight | slope, coefficient | Controls how steeply the line rises |
| b | bias | intercept | Shifts the entire line up or down |
| f(x) | model output | ŷ (y-hat) | The predicted value for input x |
| w, b | parameters | — | What the learning algorithm tunes from data |
For your crop yield records: w ≈ 0.032, b ≈ −0.34. So f(130) = 0.032 × 130 − 0.34 = 3.82 t/ha.
The algorithm's job is to find the w and b that best fit your data. Different (w, b) pairs give different lines. Some fit well. Others miss badly. This is linear regression with one variable — one input, one output, one line.
In fw,b(x) = wx + b, what role does b play?
Explore the Model: Adjust w and b
Use the sliders below to change w (slope) and b (intercept). Watch how the line shifts and tilts in real time. Try to pass the line as close to the purple data points as possible.
Notice: w and b work together. Changing w tilts the line. Changing b slides it up or down. Finding the right combination by hand takes trial and error. In the next module, you will learn a formal way to measure how wrong any given line is — the first step toward finding the best one automatically.
Implementing the Model in Python
In the next modules, we will dive deep into how the optimal values of w and b are found in machine learning — but first, let us look at the Python code for linear regression.
Below is a complete working implementation. It defines the model, runs it on your crop yield data, and prints a prediction.
def predict(x, w, b):
"""
Linear regression model: f_w,b(x) = wx + b
Parameters
----------
x : float — input feature (rainfall in mm)
w : float — weight / slope (learned parameter)
b : float — bias / intercept (learned parameter)
Returns
-------
float — predicted crop yield in tonnes per hectare
"""
return w * x + b
# Training set: (rainfall mm, yield t/ha)
training_set = [
(80, 2.2),
(100, 3.1),
(120, 3.9),
(145, 5.0),
(160, 5.4),
(185, 6.1),
]
# Parameters learned from the training set
w = 0.032 # slope
b = -0.34 # intercept
# --- Predict for a new field: 130 mm forecast rainfall ---
x_new = 130
y_hat = predict(x_new, w, b)
print(f"Predicted yield: {y_hat:.2f} t/ha")
# Expected revenue at $500 per tonne
price_per_tonne = 500
revenue = y_hat * price_per_tonne
print(f"Expected revenue: ${revenue:,.0f}")
# Output:
# Predicted yield: 3.82 t/ha
# Expected revenue: $1,910The function predict is our model f. The values w and b are the parameters the learning algorithm would find during training. In the next module, we define a cost function — a way to measure how well any given (w, b) pair fits the data — which is the key to finding these parameters automatically.
