Module 7 Assessment — RAG Pipelines

Module 7 Assessment — RAG Pipelines#

This assessment tests both your practical skills (coding tasks) and conceptual understanding (written task).

Assessment Structure#

5 Coding Tasks (80 points): Implement RAG pipeline components
1 Written Task (20 points): Explain RAG failure modes and guardrails

Instructions#

Coding tasks: Complete the code cells with the exact variable names shown
Written task: Fill in the string variable with full sentences
Do not rename variables
Ensure the notebook runs top-to-bottom without errors
You may use the module content for reference

Setup#

Run this cell first to install required packages and set up the environment.

!pip -q install sentence-transformers scikit-learn faiss-cpu

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import faiss
from typing import List, Tuple
from dataclasses import dataclass

# Load the embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
print("Setup complete!")

^C
ERROR: Operation cancelled by user

[notice] A new release of pip is available: 23.0.1 -> 25.3
[notice] To update, run: pip install --upgrade pip

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 3
      1 get_ipython().system('pip -q install sentence-transformers scikit-learn faiss-cpu')
----> 3 from sentence_transformers import SentenceTransformer
      4 from sklearn.metrics.pairwise import cosine_similarity
      5 import numpy as np

ModuleNotFoundError: No module named 'sentence_transformers'

Knowledge Base for All Tasks#

Use this knowledge base for all coding tasks. Do not modify it.

# Knowledge base (do not modify)
knowledge_base = [
    {
        "id": "doc_001",
        "text": "The central bank raised interest rates by 25 basis points to combat inflation.",
        "source": "monetary_policy.pdf"
    },
    {
        "id": "doc_002",
        "text": "Higher borrowing costs are expected to slow consumer spending and reduce inflationary pressure.",
        "source": "monetary_policy.pdf"
    },
    {
        "id": "doc_003",
        "text": "Mortgage rates have risen to their highest level in two decades.",
        "source": "housing_report.pdf"
    },
    {
        "id": "doc_004",
        "text": "The Federal Reserve's dual mandate requires balancing employment with price stability.",
        "source": "fed_overview.pdf"
    },
    {
        "id": "doc_005",
        "text": "Bank earnings improved as net interest margins widened due to higher rates.",
        "source": "earnings_summary.pdf"
    },
    {
        "id": "doc_006",
        "text": "The championship football match ended in a dramatic penalty shootout.",
        "source": "sports_news.pdf"
    }
]

# Pre-computed embeddings (provided for consistency)
texts = [doc["text"] for doc in knowledge_base]
doc_embeddings = model.encode(texts, normalize_embeddings=True)

print(f"Knowledge base loaded: {len(knowledge_base)} documents")
print(f"Embedding shape: {doc_embeddings.shape}")

Data Class Definition#

Use this data class for Tasks 1-4.

@dataclass
class RetrievedChunk:
    """A chunk retrieved from the knowledge base."""
    text: str
    score: float
    source: str
    doc_id: str

print("RetrievedChunk class defined")

Task 1 — Implement Retriever Function (20 points) [Coding]#

Implement a retriever function that returns the top-k most similar documents.

Your function must:

Encode the query using the provided model with normalize_embeddings=True
Calculate cosine similarity between the query and all document embeddings
Return a list of RetrievedChunk objects, sorted by score (highest first)

Function signature:

def retrieve_top_k(query: str, k: int = 3) -> List[RetrievedChunk]:

Hints:

Use model.encode(query, normalize_embeddings=True) to get the query embedding
Use cosine_similarity() from sklearn to compute similarities
Use np.argsort() to get indices sorted by similarity
Remember to access knowledge_base for document metadata

# Task 1: Implement the retriever function

def retrieve_top_k(query: str, k: int = 3) -> List[RetrievedChunk]:
    """Retrieve top-k most similar documents for the query.
    
    Args:
        query: The user's question
        k: Number of documents to retrieve
        
    Returns:
        List of RetrievedChunk objects, sorted by score (highest first)
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if retrieve_top_k is not None:
    test_chunks = retrieve_top_k("Why did the central bank raise rates?", k=3)
    if test_chunks:
        print(f"Retrieved {len(test_chunks)} chunks")
        print(f"Top chunk score: {test_chunks[0].score:.3f}")
        print(f"Top chunk text: {test_chunks[0].text[:50]}...")

Task 2 — Build RAG Prompt (15 points) [Coding]#

Implement a function that builds an evidence-first RAG prompt.

Your function must:

Format retrieved chunks with numbered labels (e.g., [1], [2], [3])
Include explicit grounding instructions
Include permission to refuse if context is insufficient
Structure: Instructions → Context → Question → Answer prompt

Function signature:

def build_rag_prompt(chunks: List[RetrievedChunk], question: str) -> str:

Required elements in the prompt:

“CONTEXT” section with numbered chunks
“QUESTION” section with the user’s question
Instruction to answer ONLY using the provided context
Permission to say “I don’t have enough information” if needed

# Task 2: Build RAG prompt

def build_rag_prompt(chunks: List[RetrievedChunk], question: str) -> str:
    """Build an evidence-first RAG prompt.
    
    Args:
        chunks: List of retrieved chunks
        question: The user's question
        
    Returns:
        A formatted prompt string
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if build_rag_prompt is not None and test_chunks:
    test_prompt = build_rag_prompt(test_chunks, "Why did the central bank raise rates?")
    if test_prompt:
        print("Prompt built successfully!")
        print(f"Prompt length: {len(test_prompt)} characters")
        print(f"Contains CONTEXT: {'CONTEXT' in test_prompt}")
        print(f"Contains QUESTION: {'QUESTION' in test_prompt}")

Task 3 — Implement Guardrails (20 points) [Coding]#

Implement a retrieval function with guardrails that can refuse to answer.

Your function must:

Retrieve top-k chunks using your retrieve_top_k function
Filter chunks to only include those with score >= min_score
If fewer than min_chunks remain after filtering, return a refusal
Otherwise, return the filtered chunks

Function signature:

def retrieve_with_guardrails(
    query: str, 
    k: int = 5, 
    min_score: float = 0.3, 
    min_chunks: int = 2
) -> dict:

Return format:

# If sufficient chunks:
{"chunks": [...], "refused": False}

# If insufficient chunks:
{"chunks": [], "refused": True, "reason": "..."}

# Task 3: Implement guardrails

def retrieve_with_guardrails(
    query: str, 
    k: int = 5, 
    min_score: float = 0.3, 
    min_chunks: int = 2
) -> dict:
    """Retrieve with score threshold and minimum chunk guardrails.
    
    Args:
        query: The user's question
        k: Number of chunks to initially retrieve
        min_score: Minimum similarity score threshold
        min_chunks: Minimum number of chunks required after filtering
        
    Returns:
        dict with 'chunks', 'refused', and optionally 'reason'
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if retrieve_with_guardrails is not None:
    # Test with a relevant query (should NOT refuse)
    result1 = retrieve_with_guardrails("Why did interest rates increase?", min_score=0.3)
    if result1:
        print(f"Test 1 - Relevant query: refused={result1.get('refused', 'N/A')}")
    
    # Test with an irrelevant query (should refuse with high threshold)
    result2 = retrieve_with_guardrails("What is the best pizza topping?", min_score=0.5)
    if result2:
        print(f"Test 2 - Irrelevant query: refused={result2.get('refused', 'N/A')}")

Task 4 — Complete RAG Pipeline (15 points) [Coding]#

Implement a complete RAG pipeline that combines retrieval, prompt building, and guardrails.

Your function must:

Use retrieve_with_guardrails to get chunks (or refuse)
If refused, return a dict with refused=True and a refusal message
If not refused, build a prompt using build_rag_prompt
Return the prompt (we won’t call an actual LLM in this assessment)

Function signature:

def rag_pipeline(question: str, min_score: float = 0.3) -> dict:

Return format:

# If refused:
{"refused": True, "message": "I don't have enough information..."}

# If successful:
{"refused": False, "prompt": "...", "num_chunks": 3}

# Task 4: Complete RAG pipeline

def rag_pipeline(question: str, min_score: float = 0.3) -> dict:
    """Complete RAG pipeline with guardrails.
    
    Args:
        question: The user's question
        min_score: Minimum similarity score threshold
        
    Returns:
        dict with 'refused', and either 'message' (if refused) or 'prompt' and 'num_chunks'
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if rag_pipeline is not None:
    # Test with a relevant query
    result1 = rag_pipeline("Why did the central bank raise interest rates?")
    if result1:
        print(f"Test 1: refused={result1.get('refused', 'N/A')}, num_chunks={result1.get('num_chunks', 'N/A')}")
    
    # Test with an irrelevant query
    result2 = rag_pipeline("What is quantum computing?", min_score=0.5)
    if result2:
        print(f"Test 2: refused={result2.get('refused', 'N/A')}")

Task 5 — Evaluate Retrieval Quality (10 points) [Coding]#

Implement a function that calculates Precision@k for retrieval evaluation.

Precision@k = (number of relevant chunks in top-k) / k

A chunk is considered “relevant” if its text contains ANY of the expected keywords (case-insensitive).

Function signature:

def precision_at_k(query: str, expected_keywords: List[str], k: int = 3) -> float:

Example:

Query: “Why did rates increase?”
Expected keywords: [“rate”, “interest”, “inflation”]
If 2 out of 3 retrieved chunks contain at least one keyword, Precision@3 = 2/3 = 0.667

# Task 5: Evaluate retrieval quality

def precision_at_k(query: str, expected_keywords: List[str], k: int = 3) -> float:
    """Calculate Precision@k for retrieval evaluation.
    
    Args:
        query: The query to evaluate
        expected_keywords: Keywords that indicate relevance
        k: Number of chunks to evaluate
        
    Returns:
        Precision@k as a float between 0.0 and 1.0
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if precision_at_k is not None:
    p_at_k = precision_at_k(
        "Why did interest rates increase?",
        ["rate", "interest", "inflation", "central bank"],
        k=3
    )
    if p_at_k is not None:
        print(f"Precision@3: {p_at_k:.2%}")

Task 6 — RAG Failure Modes and Guardrails Explanation (20 points) [Written]#

Prompt: Explain RAG failure modes and why guardrails are essential for production systems.

Include in your response:

What is a “near-miss” in RAG and why is it dangerous?
Why should RAG systems sometimes refuse to answer?
What guardrails would you implement in a production RAG system?
How does RAG shift (not eliminate) hallucination risk?

Write 6–10 sentences in your own words.

# Task 6: Written explanation

rag_failure_explanation = """

"""

Submission#

Before submitting:

Restart kernel and Run All Cells to ensure everything works
Verify all coding tasks produce the expected outputs
Verify your written explanation is complete and in your own words
Save the notebook

How to Download from Colab#

Go to File → Download → Download .ipynb
The file will download to your computer
Do not rename the file — keep it as Module7_Assessment.ipynb

Submit#

Upload your completed notebook via the Module 7 Assessment Form.

Submission Checklist#

All coding functions are implemented and working
Written explanation is thoughtful and in your own words
Notebook runs top-to-bottom without errors
Downloaded as .ipynb (not edited in a text editor)
File not renamed

Module 7 Assessment — RAG Pipelines

Contents

Module 7 Assessment — RAG Pipelines#

Assessment Structure#

Instructions#

Setup#

Knowledge Base for All Tasks#

Data Class Definition#

Task 1 — Implement Retriever Function (20 points) [Coding]#

Task 2 — Build RAG Prompt (15 points) [Coding]#

Task 3 — Implement Guardrails (20 points) [Coding]#

Task 4 — Complete RAG Pipeline (15 points) [Coding]#

Task 5 — Evaluate Retrieval Quality (10 points) [Coding]#

Task 6 — RAG Failure Modes and Guardrails Explanation (20 points) [Written]#

Submission#

How to Download from Colab#

Submit#

Submission Checklist#