Module 5: Embeddings & Vector Databases#

CodeVision Python Training

Contents#

  • Group 1: Understanding Embeddings (Sections 5.1-5.5)

  • Group 2: Vector Databases (Sections 5.6-5.10)

  • Group 3: Similarity Search (Sections 5.11-5.15)

  • Group 4: RAG & Grounding (Sections 5.16-5.20)


Welcome to Module 5#

This module explains how embeddings and vector databases work, and how they enable semantic search and Retrieval-Augmented Generation (RAG).

It is conceptual first, with real code to ground ideas. By the end of this module, you will understand how modern AI systems retrieve and ground information.

This module builds directly on:

  • Module 1: Python fundamentals (functions, JSON, notebooks)

  • Module 2: Data work with Pandas and visualisation

  • Module 3: LLM Fundamentals (inference, hallucinations, constraints)

  • Module 4: ML & Deep Learning Foundations (neural networks, training)


What You Will Learn#

Topic

Why It Matters

What embeddings are

Understand how text becomes vectors

Generating embeddings

Use SentenceTransformers and APIs

Vector similarity

Grasp cosine similarity and distance metrics

Vector databases

Understand FAISS, Pinecone, ChromaDB

Indexing strategies

Know how to scale vector search

RAG fundamentals

Connect retrieval to generation

Grounding techniques

Reduce hallucinations with context

Enterprise RAG patterns

Build reliable AI applications


Prerequisites#

Before starting this module, ensure you have:

  • Completed Module 1 (Python Foundations)

  • Completed Module 2 (Python for Data Work)

  • Completed Module 3 (LLM Fundamentals)

  • Completed Module 4 (ML & Deep Learning Foundations)


Before You Start: Hugging Face Token Required#

This module downloads models from Hugging Face. To avoid download limits and warnings, you must set up your own free Hugging Face token before running the notebook.

Quick Setup (2-3 minutes):

  1. Go to huggingface.co/settings/tokens

  2. Click New token → Name it anything → Select Read access → Create

  3. Copy the token (you won’t see it again)

  4. In Google Colab, click the 🔑 Secrets icon in the left sidebar

  5. Add a secret named exactly HF_TOKEN with your token as the value

  6. Turn ON “Available to all notebooks” → Restart the runtime

⚠️ Do not share your token or hard-code it in notebooks.

See Resources for detailed instructions and troubleshooting.


Module 5 Learning Path#

  1. Content - Work through the interactive notebook

  2. Quiz - Test your understanding (auto-graded)

  3. Assessment - Coding tasks (embeddings, similarity, FAISS, RAG prompt) + written explanation (auto-graded)

  4. Resources - Additional learning materials


End of Module 5 Introduction#

Click Content in the navigation to begin the interactive lesson.