Module 7 Assessment — RAG Pipelines#
This assessment tests both your practical skills (coding tasks) and conceptual understanding (written task).
Assessment Structure#
5 Coding Tasks (80 points): Implement RAG pipeline components
1 Written Task (20 points): Explain RAG failure modes and guardrails
Instructions#
Coding tasks: Complete the code cells with the exact variable names shown
Written task: Fill in the string variable with full sentences
Do not rename variables
Ensure the notebook runs top-to-bottom without errors
You may use the module content for reference
Setup#
Run this cell first to install required packages and set up the environment.
!pip -q install sentence-transformers scikit-learn faiss-cpu
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import faiss
from typing import List, Tuple
from dataclasses import dataclass
# Load the embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
print("Setup complete!")
^C
ERROR: Operation cancelled by user
[notice] A new release of pip is available: 23.0.1 -> 25.3
[notice] To update, run: pip install --upgrade pip
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 3
1 get_ipython().system('pip -q install sentence-transformers scikit-learn faiss-cpu')
----> 3 from sentence_transformers import SentenceTransformer
4 from sklearn.metrics.pairwise import cosine_similarity
5 import numpy as np
ModuleNotFoundError: No module named 'sentence_transformers'
Knowledge Base for All Tasks#
Use this knowledge base for all coding tasks. Do not modify it.
# Knowledge base (do not modify)
knowledge_base = [
{
"id": "doc_001",
"text": "The central bank raised interest rates by 25 basis points to combat inflation.",
"source": "monetary_policy.pdf"
},
{
"id": "doc_002",
"text": "Higher borrowing costs are expected to slow consumer spending and reduce inflationary pressure.",
"source": "monetary_policy.pdf"
},
{
"id": "doc_003",
"text": "Mortgage rates have risen to their highest level in two decades.",
"source": "housing_report.pdf"
},
{
"id": "doc_004",
"text": "The Federal Reserve's dual mandate requires balancing employment with price stability.",
"source": "fed_overview.pdf"
},
{
"id": "doc_005",
"text": "Bank earnings improved as net interest margins widened due to higher rates.",
"source": "earnings_summary.pdf"
},
{
"id": "doc_006",
"text": "The championship football match ended in a dramatic penalty shootout.",
"source": "sports_news.pdf"
}
]
# Pre-computed embeddings (provided for consistency)
texts = [doc["text"] for doc in knowledge_base]
doc_embeddings = model.encode(texts, normalize_embeddings=True)
print(f"Knowledge base loaded: {len(knowledge_base)} documents")
print(f"Embedding shape: {doc_embeddings.shape}")
Data Class Definition#
Use this data class for Tasks 1-4.
@dataclass
class RetrievedChunk:
"""A chunk retrieved from the knowledge base."""
text: str
score: float
source: str
doc_id: str
print("RetrievedChunk class defined")
Task 1 — Implement Retriever Function (20 points) [Coding]#
Implement a retriever function that returns the top-k most similar documents.
Your function must:
Encode the query using the provided model with
normalize_embeddings=TrueCalculate cosine similarity between the query and all document embeddings
Return a list of
RetrievedChunkobjects, sorted by score (highest first)
Function signature:
def retrieve_top_k(query: str, k: int = 3) -> List[RetrievedChunk]:
Hints:
Use
model.encode(query, normalize_embeddings=True)to get the query embeddingUse
cosine_similarity()from sklearn to compute similaritiesUse
np.argsort()to get indices sorted by similarityRemember to access
knowledge_basefor document metadata
# Task 1: Implement the retriever function
def retrieve_top_k(query: str, k: int = 3) -> List[RetrievedChunk]:
"""Retrieve top-k most similar documents for the query.
Args:
query: The user's question
k: Number of documents to retrieve
Returns:
List of RetrievedChunk objects, sorted by score (highest first)
"""
# YOUR CODE HERE
pass
# Verification (do not modify)
if retrieve_top_k is not None:
test_chunks = retrieve_top_k("Why did the central bank raise rates?", k=3)
if test_chunks:
print(f"Retrieved {len(test_chunks)} chunks")
print(f"Top chunk score: {test_chunks[0].score:.3f}")
print(f"Top chunk text: {test_chunks[0].text[:50]}...")
Task 2 — Build RAG Prompt (15 points) [Coding]#
Implement a function that builds an evidence-first RAG prompt.
Your function must:
Format retrieved chunks with numbered labels (e.g., [1], [2], [3])
Include explicit grounding instructions
Include permission to refuse if context is insufficient
Structure: Instructions → Context → Question → Answer prompt
Function signature:
def build_rag_prompt(chunks: List[RetrievedChunk], question: str) -> str:
Required elements in the prompt:
“CONTEXT” section with numbered chunks
“QUESTION” section with the user’s question
Instruction to answer ONLY using the provided context
Permission to say “I don’t have enough information” if needed
# Task 2: Build RAG prompt
def build_rag_prompt(chunks: List[RetrievedChunk], question: str) -> str:
"""Build an evidence-first RAG prompt.
Args:
chunks: List of retrieved chunks
question: The user's question
Returns:
A formatted prompt string
"""
# YOUR CODE HERE
pass
# Verification (do not modify)
if build_rag_prompt is not None and test_chunks:
test_prompt = build_rag_prompt(test_chunks, "Why did the central bank raise rates?")
if test_prompt:
print("Prompt built successfully!")
print(f"Prompt length: {len(test_prompt)} characters")
print(f"Contains CONTEXT: {'CONTEXT' in test_prompt}")
print(f"Contains QUESTION: {'QUESTION' in test_prompt}")
Task 3 — Implement Guardrails (20 points) [Coding]#
Implement a retrieval function with guardrails that can refuse to answer.
Your function must:
Retrieve top-k chunks using your
retrieve_top_kfunctionFilter chunks to only include those with score >=
min_scoreIf fewer than
min_chunksremain after filtering, return a refusalOtherwise, return the filtered chunks
Function signature:
def retrieve_with_guardrails(
query: str,
k: int = 5,
min_score: float = 0.3,
min_chunks: int = 2
) -> dict:
Return format:
# If sufficient chunks:
{"chunks": [...], "refused": False}
# If insufficient chunks:
{"chunks": [], "refused": True, "reason": "..."}
# Task 3: Implement guardrails
def retrieve_with_guardrails(
query: str,
k: int = 5,
min_score: float = 0.3,
min_chunks: int = 2
) -> dict:
"""Retrieve with score threshold and minimum chunk guardrails.
Args:
query: The user's question
k: Number of chunks to initially retrieve
min_score: Minimum similarity score threshold
min_chunks: Minimum number of chunks required after filtering
Returns:
dict with 'chunks', 'refused', and optionally 'reason'
"""
# YOUR CODE HERE
pass
# Verification (do not modify)
if retrieve_with_guardrails is not None:
# Test with a relevant query (should NOT refuse)
result1 = retrieve_with_guardrails("Why did interest rates increase?", min_score=0.3)
if result1:
print(f"Test 1 - Relevant query: refused={result1.get('refused', 'N/A')}")
# Test with an irrelevant query (should refuse with high threshold)
result2 = retrieve_with_guardrails("What is the best pizza topping?", min_score=0.5)
if result2:
print(f"Test 2 - Irrelevant query: refused={result2.get('refused', 'N/A')}")
Task 4 — Complete RAG Pipeline (15 points) [Coding]#
Implement a complete RAG pipeline that combines retrieval, prompt building, and guardrails.
Your function must:
Use
retrieve_with_guardrailsto get chunks (or refuse)If refused, return a dict with
refused=Trueand a refusal messageIf not refused, build a prompt using
build_rag_promptReturn the prompt (we won’t call an actual LLM in this assessment)
Function signature:
def rag_pipeline(question: str, min_score: float = 0.3) -> dict:
Return format:
# If refused:
{"refused": True, "message": "I don't have enough information..."}
# If successful:
{"refused": False, "prompt": "...", "num_chunks": 3}
# Task 4: Complete RAG pipeline
def rag_pipeline(question: str, min_score: float = 0.3) -> dict:
"""Complete RAG pipeline with guardrails.
Args:
question: The user's question
min_score: Minimum similarity score threshold
Returns:
dict with 'refused', and either 'message' (if refused) or 'prompt' and 'num_chunks'
"""
# YOUR CODE HERE
pass
# Verification (do not modify)
if rag_pipeline is not None:
# Test with a relevant query
result1 = rag_pipeline("Why did the central bank raise interest rates?")
if result1:
print(f"Test 1: refused={result1.get('refused', 'N/A')}, num_chunks={result1.get('num_chunks', 'N/A')}")
# Test with an irrelevant query
result2 = rag_pipeline("What is quantum computing?", min_score=0.5)
if result2:
print(f"Test 2: refused={result2.get('refused', 'N/A')}")
Task 5 — Evaluate Retrieval Quality (10 points) [Coding]#
Implement a function that calculates Precision@k for retrieval evaluation.
Precision@k = (number of relevant chunks in top-k) / k
A chunk is considered “relevant” if its text contains ANY of the expected keywords (case-insensitive).
Function signature:
def precision_at_k(query: str, expected_keywords: List[str], k: int = 3) -> float:
Example:
Query: “Why did rates increase?”
Expected keywords: [“rate”, “interest”, “inflation”]
If 2 out of 3 retrieved chunks contain at least one keyword, Precision@3 = 2/3 = 0.667
# Task 5: Evaluate retrieval quality
def precision_at_k(query: str, expected_keywords: List[str], k: int = 3) -> float:
"""Calculate Precision@k for retrieval evaluation.
Args:
query: The query to evaluate
expected_keywords: Keywords that indicate relevance
k: Number of chunks to evaluate
Returns:
Precision@k as a float between 0.0 and 1.0
"""
# YOUR CODE HERE
pass
# Verification (do not modify)
if precision_at_k is not None:
p_at_k = precision_at_k(
"Why did interest rates increase?",
["rate", "interest", "inflation", "central bank"],
k=3
)
if p_at_k is not None:
print(f"Precision@3: {p_at_k:.2%}")
Task 6 — RAG Failure Modes and Guardrails Explanation (20 points) [Written]#
Prompt: Explain RAG failure modes and why guardrails are essential for production systems.
Include in your response:
What is a “near-miss” in RAG and why is it dangerous?
Why should RAG systems sometimes refuse to answer?
What guardrails would you implement in a production RAG system?
How does RAG shift (not eliminate) hallucination risk?
Write 6–10 sentences in your own words.
# Task 6: Written explanation
rag_failure_explanation = """
"""
Submission#
Before submitting:
Restart kernel and Run All Cells to ensure everything works
Verify all coding tasks produce the expected outputs
Verify your written explanation is complete and in your own words
Save the notebook
How to Download from Colab#
Go to File → Download → Download .ipynb
The file will download to your computer
Do not rename the file — keep it as
Module7_Assessment.ipynb
Submit#
Upload your completed notebook via the Module 7 Assessment Form.
Submission Checklist#
All coding functions are implemented and working
Written explanation is thoughtful and in your own words
Notebook runs top-to-bottom without errors
Downloaded as .ipynb (not edited in a text editor)
File not renamed