Why Ethics Matters in AI

A spreadsheet formula that contains an error affects a single output. An AI model with a systematic flaw is deployed to millions of decisions — loan applications, hiring filters, medical diagnoses, bail risk scores — and replicates that flaw at scale before anyone notices.

The speed and scale of AI deployment means that getting it wrong has larger consequences than most software failures. The technical foundations behind these failures — how training data shapes model behaviour, why models fail to generalise, and why neural networks are difficult to explain — are covered in detail in Introduction to Machine Learning, Overfitting and Regularisation, and Neural Networks Explained. This course focuses on what those failures mean in practice.

What AI Actually Cannot Do

Before discussing ethics, it helps to be precise about the technical limitations that underpin many ethical failures.

Diagram

The four core limitations of current AI systems — each one a source of real-world ethical risk.

AI does not understand. Language models predict statistically likely next tokens. Image classifiers match pixel patterns to categories. Neither has any model of the world. For e.g., a chatbot that confidently states a wrong medical fact is not lying — it has no concept of truth. It is completing a pattern.

AI fails outside its training distribution. Models are reliable within the conditions they were trained on and unreliable beyond them. For e.g., a fraud model trained on 2019 transactions will struggle with 2024 fraud patterns, and a model trained on data from one country may fail in another. When the world changes, models degrade silently unless monitored.

AI cannot reason reliably about novel situations. Humans apply general principles to problems they have never seen before. Current AI systems cannot — they interpolate between examples they have encountered. Push them into genuinely new territory and performance collapses.

AI has no common sense. Without common sense, models exploit spurious correlations rather than learning genuine concepts. For e.g., an image classifier once labelled a photo of a husky in a snowy background as a wolf — it had learned that snow correlates with wolves in training data, not what a wolf actually looks like.

Bias: Where It Comes From

Bias in AI does not appear from nowhere. It enters through specific, traceable mechanisms:

Biased training data. Models learn from historical decisions made by humans. For e.g., a hiring algorithm trained on ten years of a company's past decisions will encode whoever that company historically hired — if it historically hired fewer women for engineering roles, the model will learn to downrank women's applications.
Unrepresentative data. For e.g., a skin cancer detection model trained predominantly on images of lighter skin tones will perform worse on darker skin tones — not because of intent, but because the training set did not represent all patients.
Label bias. Labels are assigned by humans, and humans apply subjective judgements inconsistently. Those inconsistencies are baked into every model trained on those labels.
Proxy discrimination. Removing a protected attribute does not prevent a model from learning a proxy for it. For e.g., a model prohibited from using race may still discriminate via zip code, which correlates strongly with race due to historical housing segregation.

Quick Check

A bank trains a loan approval model on 20 years of historical loan decisions. The model is found to approve loans at lower rates for applicants from certain neighbourhoods. What is the most likely cause?

The Black-Box Problem

Many high-performing AI models — particularly large neural networks — are opaque. You can observe their inputs and outputs but cannot easily explain why a specific decision was made. This creates real problems in high-stakes domains:

A rejected loan applicant has a legal right in many jurisdictions to know why
A doctor cannot safely act on a diagnosis if they do not understand its reasoning
A company cannot audit a system it cannot explain

Interpretability research attempts to address this. For e.g., some techniques highlight which input features most influenced a decision; others generate counterfactual explanations such as "your application would have been approved if your income were 10% higher." These techniques are approximate and contested — the explanations may not reflect how the model actually works internally.

High accuracy and interpretability often trade off against each other. Simpler models are easy to explain. Complex models are often more accurate but harder to explain. Choosing between them is partly an ethical decision, not just a technical one.

Privacy

AI models require data to train, and in many domains that data is sensitive. Privacy concerns arise in several ways:

Data collection. Collecting data users did not meaningfully consent to, or repurposing data collected for one purpose for another, violates user trust and often the law.
Memorisation. Large models can memorise and later reproduce specific training examples. For e.g., a model trained on healthcare records may reveal patient details when prompted in a certain way.
Re-identification. Even anonymised datasets can sometimes be de-anonymised. For e.g., a dataset of movie ratings with names removed was famously re-identified by cross-referencing it with publicly available reviews.

Quick Check

A large language model is found to reproduce verbatim excerpts from private emails that appeared in its training data. What property of the model does this demonstrate?

Responsible AI in Practice

None of these problems are fully solved, but responsible development teams address them systematically:

Diverse data collection. Actively audit training datasets for representation across demographic groups, use cases, and edge cases. If data is missing for a group, either collect it or explicitly limit the model's scope to exclude that group.
Fairness metrics alongside accuracy. Define fairness criteria before training — equal accuracy across groups, equal false positive rates, equal opportunity — and measure them explicitly. Aggregate accuracy can hide group-level failures.
Red-teaming. Before deployment, dedicated teams attempt to find failure modes, elicit harmful outputs, and probe edge cases. This is standard practice for AI products with large user bases.
Human oversight. High-stakes decisions in medical, legal, and financial domains should keep humans meaningfully in the loop. AI as a filter or a second opinion is far safer than AI as the sole decision-maker.
Monitoring in production. Model performance on fairness metrics can degrade as the population of users changes. Continuous monitoring — not just at launch — is necessary.
Clear documentation. Model cards and datasheets document what a model was trained on, where it performs poorly, and what uses it was not designed for. They allow downstream users to make informed decisions.

The Broader Question

Technology amplifies human intent — for good and for bad. AI does not introduce new ethical problems so much as it accelerates and scales existing ones: discrimination, surveillance, misinformation, concentration of power.

The harder questions — who AI is built for, who bears the risks, who has recourse when it goes wrong — are social and political. They require broad participation, not just engineers deciding among themselves.

Understanding AI's limitations and ethical dimensions is not separate from understanding AI technically. It is part of the same discipline.

AI Ethics and Limitations