AAI Logo
Loading...
AAI Logo
Loading...
Machine Learning
Machine LearningBeginner

Introduction to Machine Learning

machine learningsupervised learningunsupervised learningreinforcement learningAI basics
No reviews yet — be the first!

What Is Machine Learning?

Diagram
CAMBRIDGE DICTIONARYmachinelearning/məˈʃiːn ˈlɜːnɪŋ/nounthe process by which a computerteaches itself to do something,usually by finding patterns in data.— Cambridge DictionaryWHAT THIS MEANS1Data InExamples provided, not rules2Patterns FoundAlgorithm learns a function from data3Predictions OutModel generalises to new inputs
Cambridge Dictionary's definition of machine learning, and what that means in practice.

The key word in that definition is teaches itself. No programmer writes the rules explicitly — the algorithm finds them in the data.

In traditional software, a developer writes rules and the computer applies them. In supervised machine learning, the developer provides data and labeled outputs, and the algorithm learns a function that maps inputs to predictions. In unsupervised learning, there are no labels — the algorithm finds structure on its own.

This course covers how machines learn and the three main paradigms. It pairs with Gradient Descent & Optimization and Overfitting & Regularization in this track.

Diagram
TRADITIONAL PROGRAMMINGDataRulesProgramOutputProgrammer writes the rulesSUPERVISED MLDataLabelsAlgorithmModel(function)Algorithm learns the rules from labeled data
Traditional programming gives the computer rules; supervised ML lets the algorithm discover them from labeled data.

Traditional programming: Data + Rules → Output Supervised ML: Data + Labels → Model (learned function) Unsupervised ML: Data (no labels) → Patterns

The Three Learning Paradigms

Supervised Learning

In supervised learning, the model is trained on labeled examples — input-output pairs where the correct answer is known. The model learns to map inputs to outputs by minimising the error between its predictions and the true labels.

  • Classification: predict a category. For e.g., spam or not spam, cat or dog.
  • Regression: predict a continuous value. For e.g., house price or temperature forecast.
  • Common algorithms: Linear Regression, Logistic Regression, Decision Trees, Support Vector Machines, Neural Networks.

Unsupervised Learning

Unsupervised learning discovers structure in unlabeled data. There are no correct answers provided — the algorithm finds patterns on its own.

  • Clustering: group similar data points together. For e.g., K-Means grouping customers by purchase behaviour.
  • Dimensionality reduction: compress data to fewer dimensions while preserving structure. For e.g., PCA and t-SNE.
  • Generative models: learn the data distribution to generate new samples. For e.g., VAEs and GANs for image synthesis. Whether these belong under unsupervised or their own generative category is debated — the field has not settled on a single classification.

Reinforcement Learning

Reinforcement learning trains an agent to make sequential decisions in an environment. The agent receives a reward signal after each action and learns a policy that maximises long-term reward.

  • Key components: Agent, Environment, State, Action, Reward, Policy.
  • Famous applications: AlphaZero (pure self-play RL), ChatGPT fine-tuned via RLHF, robot locomotion. Note: the original AlphaGo combined supervised learning from human game records with RL — it was not pure reinforcement learning.
  • Core algorithms: Q-Learning, PPO, A3C.
Diagram
MachineLearningSupervisedLearningLabeled data → predictionsspam filter · house pricesUnsupervisedLearningUnlabeled data → patternsclustering · dimensionality reductionReinforcementLearningAgent + reward → policyAlphaZero · ChatGPT RLHFCLASSIFICATION · REGRESSIONCLUSTERING · GENERATIONPOLICY LEARNING
The three core machine learning paradigms and their typical use cases.
Quick Check

A spam filter is trained on 10,000 emails labelled 'spam' or 'not spam'. Which learning paradigm is this?

The Bias-Variance Tradeoff

A model with high bias is too simple — it fails to capture the true pattern and performs poorly on both training and new data. A model with high variance is too complex — it fits training data perfectly but fails on unseen examples. The goal is a model complex enough to capture true patterns but not so complex that it fits noise.

Diagram
sweet spotUnderfitting(high bias)Overfitting(high variance)0.250.500.751.00ErrorModel Complexity →Bias²VarianceTotal
As model complexity grows, bias falls but variance rises. Total error is minimised at the sweet spot between underfitting and overfitting.
Quick Check

A model scores 99% on training data but only 61% on the test set. What is the most likely problem?

Key Concepts

  • Training set: data used to fit model parameters.
  • Validation set: data used to tune hyperparameters and catch overfitting during development.
  • Test set: held-out data used once at the end to estimate real-world performance.
  • Hyperparameters: configuration values set before training. For e.g., learning rate, number of layers, regularization strength.
  • Cross-validation: estimate generalisation by rotating which portion of data is held out.
  • Feature engineering: creating or transforming input variables to improve model performance.
Diagram
01Defineproblem & metric02Collectgather & label data03Prepareclean & transform04Trainfit parameters05Evaluatemeasure performance06Deployserve predictions07Monitortrack & retrainMost projects cycle through steps 2 – 5 several times before deployment
A standard ML workflow: raw data flows through preprocessing and training before evaluation and deployment.

Classification Metrics

Accuracy alone can be misleading. For e.g., a classifier that always predicts "not fraud" on a dataset where only 5% of cases are fraud will score 95% accuracy while catching zero fraud cases.

  • Precision: of all predicted positives, what fraction are truly positive? TP / (TP + FP).
  • Recall: of all actual positives, what fraction did the model catch? TP / (TP + FN).
  • F1 Score: harmonic mean of precision and recall — balances both.
  • ROC-AUC: area under the ROC curve, measuring overall discrimination ability.
Quick Check

A fraud detection model flags 100 transactions as fraudulent. 80 are genuinely fraudulent, 20 are not. What is the model's precision?

Test Your Knowledge

Ready to check how much you remember? Take the quiz for Introduction to Machine Learning and see your score on the leaderboard.

Take the Quiz