Module 5 Assessment — Embeddings & Vector Databases#

This assessment tests both your practical skills (coding tasks) and conceptual understanding (written task).

Assessment Structure#

  • 5 Coding Tasks (80 points): Implement embedding and retrieval operations

  • 1 Written Task (20 points): Explain RAG and grounding

Instructions#

  • Coding tasks: Complete the code cells with the exact variable names shown

  • Written task: Fill in the string variable with full sentences

  • Do not rename variables

  • Ensure the notebook runs top-to-bottom without errors

  • You may use the module content for reference


Setup#

Run this cell first to install required packages and set up the environment.

!pip -q install sentence-transformers scikit-learn faiss-cpu

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
import faiss

# Load the embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")
print("Setup complete!")
^C
ERROR: Operation cancelled by user

[notice] A new release of pip is available: 23.0.1 -> 25.3
[notice] To update, run: pip install --upgrade pip
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 3
      1 get_ipython().system('pip -q install sentence-transformers scikit-learn faiss-cpu')
----> 3 from sentence_transformers import SentenceTransformer
      4 from sklearn.metrics.pairwise import cosine_similarity
      5 import numpy as np

ModuleNotFoundError: No module named 'sentence_transformers'

Corpus for All Tasks#

Use this corpus for all coding tasks. Do not modify it.

# Corpus of documents (do not modify)
corpus = [
    "Interest rates were increased by the central bank to control inflation.",
    "The central bank raised borrowing costs to fight rising prices.",
    "Quarterly earnings improved as net interest margin widened.",
    "The Federal Reserve announced a 25 basis point rate hike.",
    "Mortgage rates have reached their highest level in 20 years.",
    "Football is a popular sport played across Europe.",
    "The team won the championship after a dramatic penalty shootout.",
    "Basketball players competed in the international tournament."
]

print(f"Corpus loaded: {len(corpus)} documents")

Task 1 — Generate Embeddings (15 points) [Coding]#

Generate normalized embeddings for the entire corpus.

The embedding model is already loaded in the Setup cell as model. Use it to encode the corpus.

Store the result in corpus_embeddings. It should be a numpy array with shape (8, 384).

Hint:

corpus_embeddings = model.encode(corpus, normalize_embeddings=True)

The normalize_embeddings=True parameter ensures each vector has length 1.0 (required for cosine similarity).

# Task 1: Generate normalized embeddings for the corpus
# Store the result in corpus_embeddings

corpus_embeddings = None  # YOUR CODE HERE

# Verification (do not modify)
if corpus_embeddings is not None:
    print(f"Shape: {corpus_embeddings.shape}")
    print(f"First vector norm: {np.linalg.norm(corpus_embeddings[0]):.4f} (should be ~1.0)")

Task 2 — Calculate Similarity Scores (15 points) [Coding]#

Calculate cosine similarity between the query and all corpus documents.

Query: "Why did the central bank increase rates?"

Store the similarity scores in similarity_scores. It should be a 1D array of length 8.

Hint:

  1. First encode the query with normalize_embeddings=True

  2. Use cosine_similarity() from sklearn

# Task 2: Calculate similarity scores
query = "Why did the central bank increase rates?"

similarity_scores = None  # YOUR CODE HERE

# Verification (do not modify)
if similarity_scores is not None:
    print(f"Scores shape: {similarity_scores.shape}")
    print(f"Score range: {similarity_scores.min():.3f} to {similarity_scores.max():.3f}")

Task 3 — Top-K Retrieval (15 points) [Coding]#

Retrieve the indices of the top 3 most similar documents.

Store the result in top_3_indices. It should be an array of 3 indices, sorted by similarity (highest first).

Hint: Use np.argsort() and remember to reverse the order (highest similarity first)

# Task 3: Get indices of top 3 most similar documents
# Use the similarity_scores from Task 2

top_3_indices = None  # YOUR CODE HERE

# Verification (do not modify)
if top_3_indices is not None:
    print(f"Top 3 indices: {top_3_indices}")
    print("\nTop 3 documents:")
    for i, idx in enumerate(top_3_indices):
        print(f"  {i+1}. [{similarity_scores[idx]:.3f}] {corpus[idx][:50]}...")

Task 4 — FAISS Index and Search (20 points) [Coding]#

Build a FAISS index and search for the top 3 documents.

Steps:

  1. Create a FAISS IndexFlatIP index with dimension 384

  2. Add the corpus embeddings to the index (convert to float32 first)

  3. Search for the query and get top 3 results

Store the search results in:

  • faiss_distances: The similarity scores from FAISS (shape: (1, 3))

  • faiss_indices: The document indices from FAISS (shape: (1, 3))

Hints:

# Step 1: Create index
faiss_index = faiss.IndexFlatIP(384)

# Step 2: Add embeddings (must be float32)
faiss_index.add(corpus_embeddings.astype('float32'))

# Step 3: Search (query must be 2D array with shape (1, 384))
faiss_distances, faiss_indices = faiss_index.search(query_embedding, k=3)

Important: FAISS requires the query to be a 2D array. The query embedding is already prepared for you below.

# Task 4: Build FAISS index and search

# Step 1: Create FAISS index (dimension 384 for all-MiniLM-L6-v2)
faiss_index = None  # YOUR CODE HERE

# Step 2: Add embeddings to index (must be float32)
# YOUR CODE HERE

# Step 3: Search for top 3
# Query embedding is prepared for you (2D array required by FAISS)
query_embedding = model.encode(
    query,
    convert_to_numpy=True,
    normalize_embeddings=True
).astype('float32').reshape(1, -1)  # reshape to (1, 384)

faiss_distances, faiss_indices = None, None  # YOUR CODE HERE

# Verification (do not modify)
if faiss_indices is not None:
    print(f"FAISS indices shape: {faiss_indices.shape}")
    print(f"Top 3 indices from FAISS: {faiss_indices[0]}")

Task 5 — Build RAG Prompt (15 points) [Coding]#

Build a RAG prompt that could be sent to an LLM for a grounded answer.

Steps:

  1. Create a prompt that instructs the LLM to answer based ONLY on the provided context

  2. Include the top 3 retrieved documents as context (use top_3_indices from Task 3)

  3. Include the original query

Store the result in:

  • rag_prompt: The prompt string you build

Prompt format:

Answer the question based ONLY on the following context.
If the context doesn't contain the answer, say "I don't have enough information."

Context:
- [document 1]
- [document 2]
- [document 3]

Question: [query]

Answer:
# Task 5: Build RAG prompt
# Use top_3_indices from Task 3 to build a prompt for the LLM

rag_prompt = None  # YOUR CODE HERE

# Verification (do not modify)
if rag_prompt is not None:
    print("=" * 60)
    print("RAG PROMPT:")
    print("=" * 60)
    print(rag_prompt)

Bonus — See RAG in Action (Not Graded)#

Run this cell to see a complete RAG pipeline working. This demonstrates how your code from Tasks 1-5 comes together.

Note: This cell is for enrichment only and is not graded. LLM responses vary each time.

# BONUS: Complete RAG Pipeline Demo (Not Graded)
# This shows how all the pieces fit together

import requests

# --- LLM Configuration (same as Task 5) ---
# ------ OPTION A: Pinggy Tunnel (for Colab) ------
# LLM_BASE_URL = "https://your-pinggy-url.a.pinggy.io"
# LLM_API_KEY = None

# ------ OPTION B: JBChat Server ------
LLM_BASE_URL = "https://jbchat.jonbowden.com.ngrok.app"
LLM_API_KEY = "<provided-by-instructor>"  # Get from instructor
DEFAULT_MODEL = "llama3.1:8b"

def call_llm(prompt, model=DEFAULT_MODEL):
    """Send prompt to LLM. Auto-detects Ollama vs JBChat."""
    headers = {
        "Content-Type": "application/json",
        "ngrok-skip-browser-warning": "true",
    }

    use_jbchat = LLM_API_KEY and LLM_API_KEY != "<provided-by-instructor>"

    if use_jbchat:
        headers["X-API-Key"] = LLM_API_KEY
        endpoint = f"{LLM_BASE_URL}/chat/direct"
        payload = {
            "model": model,
            "messages": [{"role": "user", "content": prompt}],
            "temperature": 0.0,
            "stream": False
        }
    else:
        endpoint = f"{LLM_BASE_URL}/api/chat"
        payload = {
            "model": model,
            "messages": [{"role": "user", "content": prompt}],
            "stream": False
        }

    try:
        response = requests.post(endpoint, headers=headers, json=payload, timeout=60)
        response.raise_for_status()
        return response.json()["message"]["content"]
    except Exception as e:
        return f"LLM Error: {e}"

# --- Run the RAG Demo ---
if top_3_indices is not None:
    demo_prompt = """Answer the question based ONLY on the following context.
If the context doesn't contain the answer, say "I don't have enough information."

Context:
"""
    for idx in top_3_indices:
        demo_prompt += f"- {corpus[idx]}\n"
    
    demo_prompt += f"\nQuestion: {query}\n\nAnswer:"
    
    print("=" * 60)
    print("SENDING TO LLM...")
    print("=" * 60)
    
    demo_answer = call_llm(demo_prompt)
    
    print("\n" + "=" * 60)
    print("LLM RESPONSE:")
    print("=" * 60)
    print(demo_answer)
    print("\n" + "=" * 60)
    print("This is RAG: Retrieved documents + LLM = Grounded answer!")
    print("=" * 60)
else:
    print("Complete Tasks 1-3 first to retrieve documents.")

Task 6 — RAG and Grounding Explanation (20 points) [Written]#

Prompt: Explain why the top result is the most similar and how retrieval enables RAG.

Include:

  • Why embeddings place semantically similar text close together

  • Why the top result matches the query (shared concepts, not just keywords)

  • How RAG uses retrieval to ground LLM answers in real documents

  • Why grounding reduces hallucinations in enterprise settings

Write 6–10 sentences.

# Task 6: Written explanation

rag_explanation = """

"""

Submission#

Before submitting:

  1. Restart kernel and Run All Cells to ensure everything works

  2. Verify all coding tasks produce the expected outputs

  3. Verify your written explanation is complete and in your own words

  4. Save the notebook

How to Download from Colab#

  1. Go to File → Download → Download .ipynb

  2. The file will download to your computer

  3. Do not rename the file — keep it as Module5_Assessment.ipynb

Submit#

Upload your completed notebook via the Module 5 Assessment Form.

Submission Checklist#

  • All coding variables are filled with working code

  • Written explanation is thoughtful and in your own words

  • Notebook runs top-to-bottom without errors

  • Downloaded as .ipynb (not edited in a text editor)

  • File not renamed