What Is Machine Learning?
The key word in that definition is teaches itself. No programmer writes the rules explicitly — the algorithm finds them in the data.
In traditional software, a developer writes rules and the computer applies them. In supervised machine learning, the developer provides data and labeled outputs, and the algorithm learns a function that maps inputs to predictions. In unsupervised learning, there are no labels — the algorithm finds structure on its own.
This course covers how machines learn and the three main paradigms. It pairs with Gradient Descent & Optimization and Overfitting & Regularization in this track.
Traditional programming: Data + Rules → Output Supervised ML: Data + Labels → Model (learned function) Unsupervised ML: Data (no labels) → Patterns
The Three Learning Paradigms
Supervised Learning
In supervised learning, the model is trained on labeled examples — input-output pairs where the correct answer is known. The model learns to map inputs to outputs by minimising the error between its predictions and the true labels.
- Classification: predict a category. For e.g., spam or not spam, cat or dog.
- Regression: predict a continuous value. For e.g., house price or temperature forecast.
- Common algorithms: Linear Regression, Logistic Regression, Decision Trees, Support Vector Machines, Neural Networks.
Unsupervised Learning
Unsupervised learning discovers structure in unlabeled data. There are no correct answers provided — the algorithm finds patterns on its own.
- Clustering: group similar data points together. For e.g., K-Means grouping customers by purchase behaviour.
- Dimensionality reduction: compress data to fewer dimensions while preserving structure. For e.g., PCA and t-SNE.
- Generative models: learn the data distribution to generate new samples. For e.g., VAEs and GANs for image synthesis. Whether these belong under unsupervised or their own generative category is debated — the field has not settled on a single classification.
Reinforcement Learning
Reinforcement learning trains an agent to make sequential decisions in an environment. The agent receives a reward signal after each action and learns a policy that maximises long-term reward.
- Key components: Agent, Environment, State, Action, Reward, Policy.
- Famous applications: AlphaZero (pure self-play RL), ChatGPT fine-tuned via RLHF, robot locomotion. Note: the original AlphaGo combined supervised learning from human game records with RL — it was not pure reinforcement learning.
- Core algorithms: Q-Learning, PPO, A3C.
A spam filter is trained on 10,000 emails labelled 'spam' or 'not spam'. Which learning paradigm is this?
The Bias-Variance Tradeoff
A model with high bias is too simple — it fails to capture the true pattern and performs poorly on both training and new data. A model with high variance is too complex — it fits training data perfectly but fails on unseen examples. The goal is a model complex enough to capture true patterns but not so complex that it fits noise.
A model scores 99% on training data but only 61% on the test set. What is the most likely problem?
Key Concepts
- Training set: data used to fit model parameters.
- Validation set: data used to tune hyperparameters and catch overfitting during development.
- Test set: held-out data used once at the end to estimate real-world performance.
- Hyperparameters: configuration values set before training. For e.g., learning rate, number of layers, regularization strength.
- Cross-validation: estimate generalisation by rotating which portion of data is held out.
- Feature engineering: creating or transforming input variables to improve model performance.
Classification Metrics
Accuracy alone can be misleading. For e.g., a classifier that always predicts "not fraud" on a dataset where only 5% of cases are fraud will score 95% accuracy while catching zero fraud cases.
- Precision: of all predicted positives, what fraction are truly positive? TP / (TP + FP).
- Recall: of all actual positives, what fraction did the model catch? TP / (TP + FN).
- F1 Score: harmonic mean of precision and recall — balances both.
- ROC-AUC: area under the ROC curve, measuring overall discrimination ability.
A fraud detection model flags 100 transactions as fraudulent. 80 are genuinely fraudulent, 20 are not. What is the model's precision?