Content

Content#

This notebook explains Machine Learning (ML) and Deep Learning (DL) in a simple, intuitive way.

You do not need advanced math.
The goal is to understand ideas, not formulas.

📺 Watch first: Hiker in the Fog — ML Analogy Video (recommended)

One Big Idea to Remember#

Machine learning means adjusting numbers to make predictions less wrong.

Companion Resources#

Hiker’s Cheat Sheet — Maps analogy terms to technical terms
Knowledge Checks — Test your understanding

Part 0 — The AI Family Tree: AI → ML → DL → LLMs#

Before diving in, let’s understand how these terms relate:

┌─────────────────────────────────────────────────────────────┐
│  ARTIFICIAL INTELLIGENCE (AI)                               │
│  Any system that mimics human-like intelligence             │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  MACHINE LEARNING (ML)                              │   │
│  │  AI that learns patterns from data                  │   │
│  │                                                     │   │
│  │  ┌─────────────────────────────────────────────┐   │   │
│  │  │  DEEP LEARNING (DL)                         │   │   │
│  │  │  ML using neural networks with many layers  │   │   │
│  │  │                                             │   │   │
│  │  │  ┌─────────────────────────────────────┐   │   │   │
│  │  │  │  LLMs (Large Language Models)       │   │   │   │
│  │  │  │  DL models trained on text          │   │   │   │
│  │  │  │  Examples: GPT, Claude, Llama       │   │   │   │
│  │  │  └─────────────────────────────────────┘   │   │   │
│  │  └─────────────────────────────────────────────┘   │   │
│  └─────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

Key Relationships#

Term	What it is	Example
AI	Broad field of intelligent systems	Chess engines, Siri, self-driving cars
ML	Subset of AI that learns from data	Spam filters, recommendation systems
DL	Subset of ML using neural networks	Image recognition, speech-to-text
LLMs	DL models for language (Generative AI)	ChatGPT, Claude, code assistants

Why This Matters in Enterprise#

In banking and enterprise settings, understanding this hierarchy helps you:

Choose the right tool: Not every problem needs an LLM
Understand limitations: Each layer inherits limitations from the ones above
Manage risk: LLMs add language-specific risks (hallucinations) on top of ML risks (overfitting)
Communicate clearly: Executives often confuse these terms

Part 1 — The Hiker in the Fog#

Imagine a hiker standing on a mountain covered in thick fog.

The hiker cannot see far.
The hiker does not know where the lowest point is.
The hiker can only feel whether the ground goes up or down.

The hiker’s goal is simple:

Reach the lowest point.

This is how machine learning works:

Start with wrong guesses
Make small changes
Slowly improve

What the model does NOT know

It does not know the global optimum

It does not know whether a better solution exists elsewhere

It only reacts to local feedback (loss and gradient)

Story	Meaning
Hiker	The model
Height	How wrong the model is
Fog	Not knowing the right answer
Step	Small change to the model
Lowest point	Best possible model

Part 2 — What Is a Model?#

A model is a rule that turns inputs into outputs.

Example:

Input: hours studied
Output: exam score

def predict(hours, weight, bias):
    return weight * hours + bias

weight = 1.0
bias = 0.0

print("Prediction for 5 hours of study:", predict(5, weight, bias))

Prediction for 5 hours of study: 5.0

Part 3 — Weights#

Weights are numbers inside the model.

They control predictions
They start as guesses
Learning means changing them

Let’s see how different weights change predictions:

# Same input, different weights = different predictions
hours = 5

# Try different weights
for weight in [1.0, 5.0, 10.0, 15.0]:
    prediction = weight * hours
    print(f"Weight = {weight:4.1f}  →  Prediction = {prediction:5.1f}")

print("\nThe RIGHT weight depends on the actual data!")
print("If students who study 5 hours score ~75, weight ≈ 15 is best.")

Weight =  1.0  →  Prediction =   5.0
Weight =  5.0  →  Prediction =  25.0
Weight = 10.0  →  Prediction =  50.0
Weight = 15.0  →  Prediction =  75.0

The RIGHT weight depends on the actual data!
If students who study 5 hours score ~75, weight ≈ 15 is best.

Part 4 — Loss#

Loss tells us how wrong a prediction is.

Big loss = very wrong
Small loss = almost right

The most common loss is squared error: (predicted - actual)²

# Calculate loss for different predictions
actual_score = 80

print("If actual score is 80:")
print("-" * 40)

for predicted in [60, 70, 75, 80, 85]:
    loss = (predicted - actual_score) ** 2
    print(f"Predicted: {predicted}  →  Loss: {loss:4d}  {'← Perfect!' if loss == 0 else ''}")

If actual score is 80:
----------------------------------------
Predicted: 60  →  Loss:  400  
Predicted: 70  →  Loss:  100  
Predicted: 75  →  Loss:   25  
Predicted: 80  →  Loss:    0  ← Perfect!
Predicted: 85  →  Loss:   25  

Part 5 — Learning by Small Steps#

The model changes its weights a little at a time.

If the loss gets smaller, the change was good. If the loss gets bigger, try a different direction.

If you’ve watched the “Hiker in the Fog” video for this module, this is exactly what’s happening: the model can’t see the best solution, only the slope right under its feet.

# Gradient Descent: Finding the best weight step by step
# Goal: predict exam scores from hours studied

# Our "training data" - one student
actual_hours = 5
actual_score = 75

# Start with a wrong guess
weight = 1.0
learning_rate = 0.01  # Small steps! (0.1 would overshoot badly)

print("Gradient Descent in Action")
print("=" * 60)
print(f"Goal: Find weight so that {actual_hours} hours → {actual_score} points")
print(f"Perfect weight would be: {actual_score/actual_hours} (since {actual_score}/{actual_hours} = {actual_score/actual_hours})")
print(f"Starting weight: {weight} (way too low!)")
print()

for step in range(8):
    # 1. Make prediction with current weight
    prediction = weight * actual_hours
    
    # 2. Calculate loss (how wrong are we?)
    loss = (prediction - actual_score) ** 2
    
    # 3. Calculate gradient (which direction to go, and how steep)
    #    Negative gradient means we need to INCREASE the weight
    gradient = 2 * (prediction - actual_score) * actual_hours
    
    # 4. Update weight (take a small step in the right direction)
    old_weight = weight
    weight = weight - learning_rate * gradient
    
    direction = "↑" if weight > old_weight else "↓"
    print(f"Step {step}: pred={prediction:5.1f}, loss={loss:8.1f}, weight {old_weight:.2f}→{weight:.2f} {direction}")

print()
print(f"Final weight: {weight:.2f} (target was {actual_score/actual_hours})")
print(f"Final prediction: {weight * actual_hours:.1f} (target was {actual_score})")
print("✓ Loss decreased at every step - the model improved!")

Gradient Descent in Action
============================================================
Goal: Find weight so that 5 hours → 75 points
Perfect weight would be: 15.0 (since 75/5 = 15.0)
Starting weight: 1.0 (way too low!)

Step 0: pred=  5.0, loss=  4900.0, weight 1.00→8.00 ↑
Step 1: pred= 40.0, loss=  1225.0, weight 8.00→11.50 ↑
Step 2: pred= 57.5, loss=   306.2, weight 11.50→13.25 ↑
Step 3: pred= 66.2, loss=    76.6, weight 13.25→14.12 ↑
Step 4: pred= 70.6, loss=    19.1, weight 14.12→14.56 ↑
Step 5: pred= 72.8, loss=     4.8, weight 14.56→14.78 ↑
Step 6: pred= 73.9, loss=     1.2, weight 14.78→14.89 ↑
Step 7: pred= 74.5, loss=     0.3, weight 14.89→14.95 ↑

Final weight: 14.95 (target was 15.0)
Final prediction: 74.7 (target was 75)
✓ Loss decreased at every step - the model improved!

Backpropagation: How Errors Flow Backward#

In the example above, we had one weight. But real networks have millions of weights across many layers. How do we know which weights to change?

Backpropagation (“backward propagation of errors”):

Forward pass: Input flows through the network → prediction
Calculate loss: Compare prediction to actual answer
Backward pass: Error signal flows backward through each layer
Update weights: Each weight gets adjusted based on how much it contributed to the error

FORWARD PASS (make prediction):
Input → Layer 1 → Layer 2 → Layer 3 → Prediction

BACKWARD PASS (assign blame):
          ← Layer 1 ← Layer 2 ← Layer 3 ← Loss
          (how much did each layer contribute to the error?)

Key insight: Weights that contributed more to the error get changed more.

This is why frameworks like PyTorch are valuable—they compute backpropagation automatically!

Convergence: When Training Stops Improving#

Convergence means the model has stopped improving—the loss has settled to a stable value.

But convergence has an important limitation:

                    Global Minimum
                         ↓
Loss                     ★
  │    ╱╲               
  │   ╱  ╲      Local   
  │  ╱    ╲    Minimum  
  │ ╱      ╲     ↓      
  │╱        ╲    •      
  └──────────────────── Weights

Term	Meaning	Implication
Local minimum	A low point with higher points on both sides	Model might get “stuck” here
Global minimum	The absolute lowest point	What we ideally want
Convergence	Loss stopped decreasing	Does NOT mean we found the best solution!

Why this matters for enterprise:

A “converged” model might still be suboptimal
Different random starting weights can lead to different final models
This is why ML teams train multiple models and compare them

Part 6 — Training#

Training means:

Make a prediction (forward pass)
Measure how wrong it is (loss)
Calculate gradients (backward pass)
Adjust weights
Repeat

One full pass through all training data is called an epoch.

Multiple epochs = multiple passes through the same data, refining the model each time.

Part 7 — Common Problems and Mitigations#

Overfitting#

The model memorizes the training data instead of learning general patterns.

Signs of overfitting:

Training loss is very low
Validation/test loss is much higher
Model fails on new data it hasn’t seen

Underfitting#

The model is too simple and fails to capture the patterns in the data.

Signs of underfitting:

Both training and test loss are high
Model makes poor predictions on everything

Mitigations#

Problem	Mitigation	How it helps
Overfitting	More training data	Harder to memorize larger datasets
	Regularization (L1/L2)	Penalizes large weights, forces simplicity
	Dropout	Randomly ignores neurons during training
	Early stopping	Stop training when validation loss stops improving
	Data augmentation	Create variations of training data
Underfitting	More complex model	Add more layers/neurons
	Train longer	More epochs to learn patterns
	Better features	Provide more relevant input data
	Reduce regularization	Allow model more flexibility

Why This Matters in Financial Models#

In banking and finance, overfitting is particularly dangerous:

A model might appear to predict market movements perfectly on historical data
But fail completely when deployed on new, real-world data
This can lead to significant financial losses
Regulatory bodies (like the Fed, PRA) require model validation to detect overfitting

# Complete Training Loop with Multiple Data Points
# Training data: hours studied → exam scores
training_data = [
    (1, 20),   # 1 hour  → 20 points
    (2, 35),   # 2 hours → 35 points
    (3, 50),   # 3 hours → 50 points
    (5, 75),   # 5 hours → 75 points
    (7, 90),   # 7 hours → 90 points
]

# Initialize weight
weight = 0.0
learning_rate = 0.01

print("Training over 3 epochs (3 passes through all data)")
print("=" * 55)

for epoch in range(3):
    total_loss = 0
    
    for hours, actual in training_data:
        # Forward pass: make prediction
        prediction = weight * hours
        
        # Calculate loss
        loss = (prediction - actual) ** 2
        total_loss += loss
        
        # Backward pass: calculate gradient and update
        gradient = 2 * (prediction - actual) * hours
        weight = weight - learning_rate * gradient
    
    avg_loss = total_loss / len(training_data)
    print(f"Epoch {epoch + 1}: avg_loss = {avg_loss:8.1f}, weight = {weight:.2f}")

print(f"\nFinal model: score = {weight:.1f} × hours")
print(f"Prediction for 4 hours: {weight * 4:.0f} points")

Training over 3 epochs (3 passes through all data)
=======================================================
Epoch 1: avg_loss =   1366.2, weight = 12.79
Epoch 2: avg_loss =     78.3, weight = 12.89
Epoch 3: avg_loss =     76.8, weight = 12.89

Final model: score = 12.9 × hours
Prediction for 4 hours: 52 points

import matplotlib.pyplot as plt

# Demonstrate overfitting vs good fit
# Training data (what the model sees)
train_hours = [1, 2, 3, 5, 7]
train_scores = [20, 35, 50, 75, 90]

# Test data (new, unseen data)
test_hours = [4, 6]
test_scores = [62, 82]  # Actual scores

# Good model: simple linear fit (generalizes well)
good_weight = 12.5

# Overfit model: memorized exact training points with complex formula
# (simulated - in reality this would be a high-degree polynomial)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Plot 1: Good Fit
ax1 = axes[0]
ax1.scatter(train_hours, train_scores, color='blue', s=100, label='Training data', zorder=5)
ax1.scatter(test_hours, test_scores, color='green', s=100, marker='s', label='Test data (unseen)', zorder=5)
x_line = range(0, 9)
y_line = [good_weight * x for x in x_line]
ax1.plot(x_line, y_line, 'b-', linewidth=2, label=f'Model: {good_weight}×hours')
ax1.set_xlabel('Hours Studied')
ax1.set_ylabel('Exam Score')
ax1.set_title('GOOD FIT\n(Generalizes to new data)')
ax1.legend()
ax1.grid(True, alpha=0.3)
ax1.set_xlim(0, 8)
ax1.set_ylim(0, 100)

# Plot 2: Overfitting
ax2 = axes[1]
ax2.scatter(train_hours, train_scores, color='blue', s=100, label='Training data', zorder=5)
ax2.scatter(test_hours, test_scores, color='green', s=100, marker='s', label='Test data (unseen)', zorder=5)
# Wiggly line that hits all training points but misses test points
ax2.plot(train_hours, train_scores, 'r-', linewidth=2, label='Overfit model')
ax2.scatter([4], [45], color='red', s=100, marker='x', label='Bad prediction!', zorder=5)
ax2.set_xlabel('Hours Studied')
ax2.set_ylabel('Exam Score')
ax2.set_title('OVERFITTING\n(Memorized training, fails on new data)')
ax2.legend()
ax2.grid(True, alpha=0.3)
ax2.set_xlim(0, 8)
ax2.set_ylim(0, 100)

plt.tight_layout()
plt.show()

print("Key insight: Overfitting = perfect on training, poor on new data")

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[6], line 1
----> 1 import matplotlib.pyplot as plt
      3 # Demonstrate overfitting vs good fit
      4 # Training data (what the model sees)
      5 train_hours = [1, 2, 3, 5, 7]

ModuleNotFoundError: No module named 'matplotlib'

Part 8 — Deep Learning#

Deep Learning uses many simple models together.

Each small part is called a neuron.

Together, they can learn complex patterns.

Neural Network Architecture: Layers#

A neural network organizes neurons into layers:

INPUT LAYER          HIDDEN LAYER(S)         OUTPUT LAYER
    ○                    ○                       ○
    ○  ──────────────►   ○   ──────────────►    
    ○                    ○                       
                         ○

Layer	Purpose	Example
Input Layer	Receives raw data	Pixel values, hours studied, sensor readings
Hidden Layer(s)	Learns patterns	Combines inputs in useful ways
Output Layer	Produces final answer	Classification, score prediction

“Deep” Learning = Many Hidden Layers

1-2 hidden layers = “shallow” network
10+ hidden layers = “deep” network
GPT-4 has ~120 layers!

A Single Neuron#

A neuron does three things:

Multiply inputs by weights
Add a bias
Apply an activation function (adds non-linearity)

import matplotlib.pyplot as plt

# A single neuron with ReLU activation
def relu(x):
    """ReLU: if negative, output 0. Otherwise, output x."""
    return max(0, x)

def neuron(inputs, weights, bias):
    """A single neuron: weighted sum + bias + activation"""
    # Step 1: weighted sum
    weighted_sum = sum(i * w for i, w in zip(inputs, weights))
    
    # Step 2: add bias
    with_bias = weighted_sum + bias
    
    # Step 3: apply activation (ReLU)
    output = relu(with_bias)
    
    return output

# Example: 2 inputs (like hours studied, hours slept)
inputs = [5, 8]  # 5 hours studied, 8 hours slept
weights = [10, 5]  # studying matters more than sleep
bias = -20

result = neuron(inputs, weights, bias)
print(f"Inputs: {inputs}")
print(f"Weights: {weights}")
print(f"Bias: {bias}")
print(f"Weighted sum: {inputs[0]}×{weights[0]} + {inputs[1]}×{weights[1]} = {sum(i*w for i,w in zip(inputs, weights))}")
print(f"With bias: {sum(i*w for i,w in zip(inputs, weights))} + {bias} = {sum(i*w for i,w in zip(inputs, weights)) + bias}")
print(f"After ReLU: {result}")

# Visualize ReLU
print("\n--- ReLU Activation Function ---")
x_vals = list(range(-5, 6))
y_vals = [relu(x) for x in x_vals]

plt.figure(figsize=(8, 3))
plt.plot(x_vals, y_vals, 'b-', linewidth=2)
plt.axhline(y=0, color='gray', linestyle='--', alpha=0.5)
plt.axvline(x=0, color='gray', linestyle='--', alpha=0.5)
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU: Negative → 0, Positive → unchanged')
plt.grid(True, alpha=0.3)
plt.show()

Part 9 — Training vs Using a Model#

Training: weights change
Using the model: weights stay the same

# Training vs Inference: The Two Phases

class SimpleModel:
    def __init__(self):
        self.weight = 1.0  # Start with a guess
        
    def predict(self, hours):
        return self.weight * hours
    
    def train_step(self, hours, actual_score, learning_rate=0.01):
        """Training: weights CHANGE"""
        prediction = self.predict(hours)
        gradient = 2 * (prediction - actual_score) * hours
        self.weight = self.weight - learning_rate * gradient
        return prediction

# Create model
model = SimpleModel()

print("=" * 50)
print("PHASE 1: TRAINING (weights change)")
print("=" * 50)
training_examples = [(3, 45), (5, 75), (7, 105)]

for hours, score in training_examples:
    old_weight = model.weight
    pred = model.train_step(hours, score)
    print(f"Input: {hours}h → Actual: {score}, Predicted: {pred:.0f}")
    print(f"  Weight changed: {old_weight:.2f} → {model.weight:.2f}")

print()
print("=" * 50)
print("PHASE 2: INFERENCE (weights frozen)")
print("=" * 50)
print(f"Final trained weight: {model.weight:.2f}")
print()

# Now use the trained model (no more training)
for hours in [1, 4, 6, 10]:
    prediction = model.predict(hours)
    print(f"Inference: {hours} hours → predicted score: {prediction:.0f}")

Part 10 — LLMs: How Language Models Work (and Why They Hallucinate)#

Now that you understand ML/DL fundamentals, let’s see how they apply to Large Language Models (LLMs) like GPT and Claude.

How LLMs Are Trained#

LLMs learn through self-supervised learning on massive text datasets:

Training data: Billions of documents from the internet, books, code
Task: Predict the next word (token) given previous words
Process: Same gradient descent we learned—adjust weights to reduce prediction error

Input:  "The capital of France is ___"
Target: "Paris"

Model predicts → Calculates loss → Backpropagation → Updates weights

Next-Token Prediction#

LLMs don’t “understand” like humans. They learn statistical patterns:

What LLM sees	What it learns
“The sky is ___”	“blue” often follows
“Once upon a ___”	“time” often follows
“SELECT * FROM ___”	Table names often follow

Key insight: LLMs are pattern completion machines. They predict what text typically follows, based on training data.

Why Hallucinations Are Expected#

Hallucination = LLM produces confident-sounding but incorrect information.

This isn’t a bug—it’s a direct consequence of how LLMs work:

ML Concept	LLM Behavior
Pattern completion	Generates plausible-sounding text even when facts are wrong
Training data bias	Repeats errors or biases present in training data
No fact-checking	Model doesn’t verify claims—just predicts likely text
Over-generalisation	Applies patterns to situations where they don’t apply

Example: If asked about a fictional event, an LLM might generate a detailed, confident-sounding description—because that’s what text about events typically looks like.

Why This Matters for Enterprise#

In banking and enterprise settings, hallucinations are a serious risk:

Compliance: LLM might cite non-existent regulations
Financial advice: LLM might invent statistics or market data
Legal: LLM might fabricate case law (this has happened!)
Reputation: Incorrect information damages trust

The Solution: Grounding and RAG#

RAG (Retrieval-Augmented Generation) mitigates hallucinations by:

Retrieving relevant documents from a trusted knowledge base
Providing these documents as context to the LLM
Grounding the response in actual evidence

WITHOUT RAG:
User question → LLM → Potentially hallucinated answer

WITH RAG:
User question → Search knowledge base → Retrieve relevant docs 
             → LLM + docs → Answer grounded in evidence

Think of it as giving the hiker (model) a map and signposts, rather than relying only on memory of terrain from past walks.

Key Takeaways#

LLMs are pattern completion machines, not knowledge databases
Hallucinations are expected behavior, not bugs
Training data quality directly affects output quality
Enterprise use requires grounding (RAG) and human oversight
Never trust LLM outputs for facts without verification

Bonus: PyTorch Preview#

In real ML projects, you use frameworks like PyTorch or TensorFlow/Keras.

They do the same things we did above, but:

Handle gradients automatically (no manual math!)
Run on GPUs for speed
Provide building blocks for complex models

Here’s what our training loop looks like in PyTorch:

# PyTorch version of our training loop
# (This is what real ML code looks like!)

try:
    import torch
    import torch.nn as nn
    
    # Training data as PyTorch tensors
    X = torch.tensor([[1.0], [2.0], [3.0], [5.0], [7.0]])  # hours
    y = torch.tensor([[20.0], [35.0], [50.0], [75.0], [90.0]])  # scores
    
    # Define a simple model (1 input → 1 output)
    model = nn.Linear(1, 1)
    
    # Loss function and optimizer (same concepts!)
    loss_fn = nn.MSELoss()  # Mean Squared Error
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01)  # Gradient Descent
    
    print("PyTorch Training Loop")
    print("=" * 40)
    
    # Training loop - same 4 steps!
    for epoch in range(100):
        # 1. Forward pass (predict)
        predictions = model(X)
        
        # 2. Calculate loss
        loss = loss_fn(predictions, y)
        
        # 3. Backward pass (gradients calculated automatically!)
        optimizer.zero_grad()
        loss.backward()
        
        # 4. Update weights
        optimizer.step()
        
        if epoch % 20 == 0:
            print(f"Epoch {epoch:3d}: loss = {loss.item():.2f}")
    
    # Show learned parameters
    weight = model.weight.item()
    bias = model.bias.item()
    print(f"\nLearned: score = {weight:.1f} × hours + {bias:.1f}")
    
    # Inference
    with torch.no_grad():  # No gradients needed for inference
        test_hours = torch.tensor([[4.0]])
        prediction = model(test_hours)
        print(f"Prediction for 4 hours: {prediction.item():.0f} points")

except ImportError:
    print("PyTorch not installed in this environment.")
    print("This is just a preview - the same concepts apply!")
    print()
    print("Key PyTorch concepts:")
    print("  - torch.tensor() → data containers")
    print("  - nn.Linear() → a layer with weights")
    print("  - loss.backward() → automatic gradient calculation")
    print("  - optimizer.step() → update weights")

Part 11 — Final Summary#

The Core ML Loop#

Machine learning works like this:

Start with random guesses (weights)
Make predictions
Measure how wrong they are (loss)
Adjust weights using gradients (backpropagation)
Repeat until convergence

Machine learning is learning by gradual improvement.

Key Concepts Covered#

Concept	What it means
AI → ML → DL → LLM	Nested hierarchy of technologies
Weights & Biases	The learnable numbers in a model
Loss Function	Measures how wrong predictions are
Gradient Descent	Method to minimize loss
Backpropagation	How errors flow backward to update weights
Convergence	When training stabilizes (but may be local minimum)
Overfitting	Memorizing training data instead of learning
Layers	Input → Hidden → Output structure
Activation Functions	Add non-linearity (e.g., ReLU)
Hallucinations	LLMs generating plausible but false information
RAG	Grounding LLM outputs with retrieved evidence

Enterprise Implications#

Understanding these concepts helps you:

Evaluate ML/AI vendors critically
Identify risks in AI-powered systems
Communicate with technical teams
Make informed decisions about AI adoption
Ensure compliance with regulatory requirements

End-of-Module Resources#

Hiker’s Cheat Sheet — Quick reference
Knowledge Checks — Test yourself