What Is Classification?
Classification is a type of supervised learning where the output y is a label — one choice from a fixed, predefined set of categories. The algorithm does not predict a number; it predicts which group the input belongs to.
The defining question for classification is: are we predicting a category from a limited set? If yes, it's classification.
Classification predicts a category. The output y is always one label from a fixed set. For e.g., Spam or Not Spam — there is no value in between.
The Fixed Set of Categories
This is what separates classification from regression. In regression, the output y can be any number on a continuous scale. In classification, the output y must be exactly one of the pre-defined labels — nothing else is possible.
| Type | Output space | Example |
|---|---|---|
| Binary classification | Exactly 2 categories | Spam / Not Spam, Fraud / Legitimate |
| Multi-class classification | 3 or more categories | Cat / Dog / Bird, digit 0–9 |
For e.g., a spam filter does not output "47% spam" — it outputs either Spam or Not Spam. A digit recogniser does not output "between 3 and 4" — it outputs exactly one digit from 0 to 9. The fixed set is defined before training and never changes.
A model predicts whether a bank transaction is 'Fraud' or 'Legitimate'. How many output categories does this classifier have?
Real-World Examples
Every classification problem supplies labeled examples during training, then predicts the correct category for new inputs.
Spam detection
- Input x: email text, sender, subject line, links in the body.
- Output y:
SpamorNot Spam. - Training data: thousands of emails already labeled by humans.
Fraud detection
- Input x: transaction amount, location, merchant, time of day, spending history.
- Output y:
FraudorLegitimate. - Training data: historical transactions labeled by fraud analysts.
Recommendation systems
- Input x: user viewing history, ratings, demographics.
- Output y:
Will clickorWill not click(or a ranked list of categories). - Training data: past user interactions with known outcomes.
A model classifies images into: Cat, Dog, or Bird. A new image is fed in. What are the possible outputs?
