Module 5: Embeddings & Vector Databases#
CodeVision Python Training
Contents#
Group 1: Understanding Embeddings (Sections 5.1-5.5)
Group 2: Vector Databases (Sections 5.6-5.10)
Group 3: Similarity Search (Sections 5.11-5.15)
Group 4: RAG & Grounding (Sections 5.16-5.20)
Welcome to Module 5#
This module explains how embeddings and vector databases work, and how they enable semantic search and Retrieval-Augmented Generation (RAG).
It is conceptual first, with real code to ground ideas. By the end of this module, you will understand how modern AI systems retrieve and ground information.
This module builds directly on:
Module 1: Python fundamentals (functions, JSON, notebooks)
Module 2: Data work with Pandas and visualisation
Module 3: LLM Fundamentals (inference, hallucinations, constraints)
Module 4: ML & Deep Learning Foundations (neural networks, training)
What You Will Learn#
Topic |
Why It Matters |
|---|---|
What embeddings are |
Understand how text becomes vectors |
Generating embeddings |
Use SentenceTransformers and APIs |
Vector similarity |
Grasp cosine similarity and distance metrics |
Vector databases |
Understand FAISS, Pinecone, ChromaDB |
Indexing strategies |
Know how to scale vector search |
RAG fundamentals |
Connect retrieval to generation |
Grounding techniques |
Reduce hallucinations with context |
Enterprise RAG patterns |
Build reliable AI applications |
Prerequisites#
Before starting this module, ensure you have:
Completed Module 1 (Python Foundations)
Completed Module 2 (Python for Data Work)
Completed Module 3 (LLM Fundamentals)
Completed Module 4 (ML & Deep Learning Foundations)
Before You Start: Hugging Face Token Required#
This module downloads models from Hugging Face. To avoid download limits and warnings, you must set up your own free Hugging Face token before running the notebook.
Quick Setup (2-3 minutes):
Click New token → Name it anything → Select Read access → Create
Copy the token (you won’t see it again)
In Google Colab, click the 🔑 Secrets icon in the left sidebar
Add a secret named exactly
HF_TOKENwith your token as the valueTurn ON “Available to all notebooks” → Restart the runtime
⚠️ Do not share your token or hard-code it in notebooks.
See Resources for detailed instructions and troubleshooting.
Module 5 Learning Path#
Content - Work through the interactive notebook
Quiz - Test your understanding (auto-graded)
Assessment - Coding tasks (embeddings, similarity, FAISS, RAG prompt) + written explanation (auto-graded)
Resources - Additional learning materials
End of Module 5 Introduction#
Click Content in the navigation to begin the interactive lesson.