Module 7 Assessment — RAG Pipelines#

This assessment tests both your practical skills (coding tasks) and conceptual understanding (written task).

Assessment Structure#

  • 5 Coding Tasks (80 points): Implement RAG pipeline components

  • 1 Written Task (20 points): Explain RAG failure modes and guardrails

Instructions#

  • Coding tasks: Complete the code cells with the exact variable names shown

  • Written task: Fill in the string variable with full sentences

  • Do not rename variables

  • Ensure the notebook runs top-to-bottom without errors

  • You may use the module content for reference


Setup#

Run this cell first to install required packages and set up the environment.

!pip -q install sentence-transformers scikit-learn faiss-cpu

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import faiss
from typing import List, Tuple
from dataclasses import dataclass

# Load the embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
print("Setup complete!")
^C
ERROR: Operation cancelled by user

[notice] A new release of pip is available: 23.0.1 -> 25.3
[notice] To update, run: pip install --upgrade pip
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 3
      1 get_ipython().system('pip -q install sentence-transformers scikit-learn faiss-cpu')
----> 3 from sentence_transformers import SentenceTransformer
      4 from sklearn.metrics.pairwise import cosine_similarity
      5 import numpy as np

ModuleNotFoundError: No module named 'sentence_transformers'

Knowledge Base for All Tasks#

Use this knowledge base for all coding tasks. Do not modify it.

# Knowledge base (do not modify)
knowledge_base = [
    {
        "id": "doc_001",
        "text": "The central bank raised interest rates by 25 basis points to combat inflation.",
        "source": "monetary_policy.pdf"
    },
    {
        "id": "doc_002",
        "text": "Higher borrowing costs are expected to slow consumer spending and reduce inflationary pressure.",
        "source": "monetary_policy.pdf"
    },
    {
        "id": "doc_003",
        "text": "Mortgage rates have risen to their highest level in two decades.",
        "source": "housing_report.pdf"
    },
    {
        "id": "doc_004",
        "text": "The Federal Reserve's dual mandate requires balancing employment with price stability.",
        "source": "fed_overview.pdf"
    },
    {
        "id": "doc_005",
        "text": "Bank earnings improved as net interest margins widened due to higher rates.",
        "source": "earnings_summary.pdf"
    },
    {
        "id": "doc_006",
        "text": "The championship football match ended in a dramatic penalty shootout.",
        "source": "sports_news.pdf"
    }
]

# Pre-computed embeddings (provided for consistency)
texts = [doc["text"] for doc in knowledge_base]
doc_embeddings = model.encode(texts, normalize_embeddings=True)

print(f"Knowledge base loaded: {len(knowledge_base)} documents")
print(f"Embedding shape: {doc_embeddings.shape}")

Data Class Definition#

Use this data class for Tasks 1-4.

@dataclass
class RetrievedChunk:
    """A chunk retrieved from the knowledge base."""
    text: str
    score: float
    source: str
    doc_id: str

print("RetrievedChunk class defined")

Task 1 — Implement Retriever Function (20 points) [Coding]#

Implement a retriever function that returns the top-k most similar documents.

Your function must:

  1. Encode the query using the provided model with normalize_embeddings=True

  2. Calculate cosine similarity between the query and all document embeddings

  3. Return a list of RetrievedChunk objects, sorted by score (highest first)

Function signature:

def retrieve_top_k(query: str, k: int = 3) -> List[RetrievedChunk]:

Hints:

  • Use model.encode(query, normalize_embeddings=True) to get the query embedding

  • Use cosine_similarity() from sklearn to compute similarities

  • Use np.argsort() to get indices sorted by similarity

  • Remember to access knowledge_base for document metadata

# Task 1: Implement the retriever function

def retrieve_top_k(query: str, k: int = 3) -> List[RetrievedChunk]:
    """Retrieve top-k most similar documents for the query.
    
    Args:
        query: The user's question
        k: Number of documents to retrieve
        
    Returns:
        List of RetrievedChunk objects, sorted by score (highest first)
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if retrieve_top_k is not None:
    test_chunks = retrieve_top_k("Why did the central bank raise rates?", k=3)
    if test_chunks:
        print(f"Retrieved {len(test_chunks)} chunks")
        print(f"Top chunk score: {test_chunks[0].score:.3f}")
        print(f"Top chunk text: {test_chunks[0].text[:50]}...")

Task 2 — Build RAG Prompt (15 points) [Coding]#

Implement a function that builds an evidence-first RAG prompt.

Your function must:

  1. Format retrieved chunks with numbered labels (e.g., [1], [2], [3])

  2. Include explicit grounding instructions

  3. Include permission to refuse if context is insufficient

  4. Structure: Instructions → Context → Question → Answer prompt

Function signature:

def build_rag_prompt(chunks: List[RetrievedChunk], question: str) -> str:

Required elements in the prompt:

  • “CONTEXT” section with numbered chunks

  • “QUESTION” section with the user’s question

  • Instruction to answer ONLY using the provided context

  • Permission to say “I don’t have enough information” if needed

# Task 2: Build RAG prompt

def build_rag_prompt(chunks: List[RetrievedChunk], question: str) -> str:
    """Build an evidence-first RAG prompt.
    
    Args:
        chunks: List of retrieved chunks
        question: The user's question
        
    Returns:
        A formatted prompt string
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if build_rag_prompt is not None and test_chunks:
    test_prompt = build_rag_prompt(test_chunks, "Why did the central bank raise rates?")
    if test_prompt:
        print("Prompt built successfully!")
        print(f"Prompt length: {len(test_prompt)} characters")
        print(f"Contains CONTEXT: {'CONTEXT' in test_prompt}")
        print(f"Contains QUESTION: {'QUESTION' in test_prompt}")

Task 3 — Implement Guardrails (20 points) [Coding]#

Implement a retrieval function with guardrails that can refuse to answer.

Your function must:

  1. Retrieve top-k chunks using your retrieve_top_k function

  2. Filter chunks to only include those with score >= min_score

  3. If fewer than min_chunks remain after filtering, return a refusal

  4. Otherwise, return the filtered chunks

Function signature:

def retrieve_with_guardrails(
    query: str, 
    k: int = 5, 
    min_score: float = 0.3, 
    min_chunks: int = 2
) -> dict:

Return format:

# If sufficient chunks:
{"chunks": [...], "refused": False}

# If insufficient chunks:
{"chunks": [], "refused": True, "reason": "..."}
# Task 3: Implement guardrails

def retrieve_with_guardrails(
    query: str, 
    k: int = 5, 
    min_score: float = 0.3, 
    min_chunks: int = 2
) -> dict:
    """Retrieve with score threshold and minimum chunk guardrails.
    
    Args:
        query: The user's question
        k: Number of chunks to initially retrieve
        min_score: Minimum similarity score threshold
        min_chunks: Minimum number of chunks required after filtering
        
    Returns:
        dict with 'chunks', 'refused', and optionally 'reason'
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if retrieve_with_guardrails is not None:
    # Test with a relevant query (should NOT refuse)
    result1 = retrieve_with_guardrails("Why did interest rates increase?", min_score=0.3)
    if result1:
        print(f"Test 1 - Relevant query: refused={result1.get('refused', 'N/A')}")
    
    # Test with an irrelevant query (should refuse with high threshold)
    result2 = retrieve_with_guardrails("What is the best pizza topping?", min_score=0.5)
    if result2:
        print(f"Test 2 - Irrelevant query: refused={result2.get('refused', 'N/A')}")

Task 4 — Complete RAG Pipeline (15 points) [Coding]#

Implement a complete RAG pipeline that combines retrieval, prompt building, and guardrails.

Your function must:

  1. Use retrieve_with_guardrails to get chunks (or refuse)

  2. If refused, return a dict with refused=True and a refusal message

  3. If not refused, build a prompt using build_rag_prompt

  4. Return the prompt (we won’t call an actual LLM in this assessment)

Function signature:

def rag_pipeline(question: str, min_score: float = 0.3) -> dict:

Return format:

# If refused:
{"refused": True, "message": "I don't have enough information..."}

# If successful:
{"refused": False, "prompt": "...", "num_chunks": 3}
# Task 4: Complete RAG pipeline

def rag_pipeline(question: str, min_score: float = 0.3) -> dict:
    """Complete RAG pipeline with guardrails.
    
    Args:
        question: The user's question
        min_score: Minimum similarity score threshold
        
    Returns:
        dict with 'refused', and either 'message' (if refused) or 'prompt' and 'num_chunks'
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if rag_pipeline is not None:
    # Test with a relevant query
    result1 = rag_pipeline("Why did the central bank raise interest rates?")
    if result1:
        print(f"Test 1: refused={result1.get('refused', 'N/A')}, num_chunks={result1.get('num_chunks', 'N/A')}")
    
    # Test with an irrelevant query
    result2 = rag_pipeline("What is quantum computing?", min_score=0.5)
    if result2:
        print(f"Test 2: refused={result2.get('refused', 'N/A')}")

Task 5 — Evaluate Retrieval Quality (10 points) [Coding]#

Implement a function that calculates Precision@k for retrieval evaluation.

Precision@k = (number of relevant chunks in top-k) / k

A chunk is considered “relevant” if its text contains ANY of the expected keywords (case-insensitive).

Function signature:

def precision_at_k(query: str, expected_keywords: List[str], k: int = 3) -> float:

Example:

  • Query: “Why did rates increase?”

  • Expected keywords: [“rate”, “interest”, “inflation”]

  • If 2 out of 3 retrieved chunks contain at least one keyword, Precision@3 = 2/3 = 0.667

# Task 5: Evaluate retrieval quality

def precision_at_k(query: str, expected_keywords: List[str], k: int = 3) -> float:
    """Calculate Precision@k for retrieval evaluation.
    
    Args:
        query: The query to evaluate
        expected_keywords: Keywords that indicate relevance
        k: Number of chunks to evaluate
        
    Returns:
        Precision@k as a float between 0.0 and 1.0
    """
    # YOUR CODE HERE
    pass


# Verification (do not modify)
if precision_at_k is not None:
    p_at_k = precision_at_k(
        "Why did interest rates increase?",
        ["rate", "interest", "inflation", "central bank"],
        k=3
    )
    if p_at_k is not None:
        print(f"Precision@3: {p_at_k:.2%}")

Task 6 — RAG Failure Modes and Guardrails Explanation (20 points) [Written]#

Prompt: Explain RAG failure modes and why guardrails are essential for production systems.

Include in your response:

  • What is a “near-miss” in RAG and why is it dangerous?

  • Why should RAG systems sometimes refuse to answer?

  • What guardrails would you implement in a production RAG system?

  • How does RAG shift (not eliminate) hallucination risk?

Write 6–10 sentences in your own words.

# Task 6: Written explanation

rag_failure_explanation = """

"""

Submission#

Before submitting:

  1. Restart kernel and Run All Cells to ensure everything works

  2. Verify all coding tasks produce the expected outputs

  3. Verify your written explanation is complete and in your own words

  4. Save the notebook

How to Download from Colab#

  1. Go to File → Download → Download .ipynb

  2. The file will download to your computer

  3. Do not rename the file — keep it as Module7_Assessment.ipynb

Submit#

Upload your completed notebook via the Module 7 Assessment Form.

Submission Checklist#

  • All coding functions are implemented and working

  • Written explanation is thoughtful and in your own words

  • Notebook runs top-to-bottom without errors

  • Downloaded as .ipynb (not edited in a text editor)

  • File not renamed