Module 6 Resources — LLM APIs (Python)#
CodeVision Academy
This page provides additional resources for deepening your understanding of building robust LLM API integrations.
Official Documentation#
Python HTTP Libraries#
Requests Library Documentation — The de facto standard for HTTP in Python. Essential reading for understanding timeouts, sessions, and error handling.
HTTPX Documentation — Modern async-capable HTTP client. Good for high-concurrency LLM applications.
LLM Provider APIs#
OpenAI API Reference — The most widely-used LLM API format. Many providers follow this schema.
Ollama API Documentation — Local LLM server API used in this course.
Anthropic Claude API — Alternative LLM provider with similar patterns.
Reliability Patterns#
Retry & Backoff#
Exponential Backoff (Wikipedia) — Conceptual overview of the retry strategy.
Tenacity Library — Production-ready retry library for Python with decorators.
Backoff Library — Lightweight alternative for retry with backoff.
Circuit Breakers#
Circuit Breaker Pattern (Martin Fowler) — Pattern for preventing cascade failures in distributed systems.
PyBreaker Library — Python implementation of the circuit breaker pattern.
JSON & Validation#
JSON Schema#
JSON Schema — Formal specification for validating JSON structure.
jsonschema Library — Python library for JSON Schema validation.
Pydantic Documentation — Data validation using Python type hints. Excellent for LLM response validation.
Structured Output from LLMs#
OpenAI JSON Mode — Native JSON mode support in OpenAI API.
Instructor Library — Library for extracting structured data from LLMs using Pydantic.
Testing#
Mocking in Python#
unittest.mock Documentation — Python’s built-in mocking library.
pytest-mock — Pytest plugin for mocking.
responses Library — Mock HTTP requests in Python tests.
Testing Best Practices#
Testing External APIs (Real Python) — Guide to testing code that calls external APIs.
Production Concerns#
Logging#
Python Logging Documentation — Standard library logging module.
structlog Library — Structured logging for Python applications.
Monitoring & Observability#
OpenTelemetry Python — Observability framework for tracing API calls.
Prometheus Python Client — Metrics collection for monitoring.
Cost Management#
OpenAI Tokenizer — Tool for counting tokens to estimate costs.
tiktoken Library — Fast BPE tokenizer for token counting in Python.
Enterprise Patterns#
API Design#
REST API Design Best Practices — Principles that apply to LLM API client design.
API Gateway Patterns — Patterns for managing API access at scale.
Security#
OWASP API Security — Security considerations for API integrations.
Python Secrets Management — Secure handling of API keys.
Books & In-Depth Reading#
“Designing Data-Intensive Applications” by Martin Kleppmann — Chapter on distributed systems reliability applies directly to LLM integration.
“Release It!” by Michael Nygard — Patterns for building resilient production systems.
“Building LLM Apps” by Chip Huyen — Modern guide to LLM application development.
Tools Used in This Module#
Tool |
Purpose |
Installation |
|---|---|---|
|
HTTP client |
|
|
JSON parsing |
Built-in |
|
Structured data |
Built-in (Python 3.7+) |
|
Type hints |
Built-in |
Practice Projects#
Build a CLI LLM client — Create a command-line tool that takes prompts and returns structured responses with proper error handling.
Implement a caching layer — Build a response cache with TTL (time-to-live) expiration for LLM calls.
Create a mock server — Build a simple Flask/FastAPI mock server that simulates LLM API responses for testing.
Add monitoring — Instrument an LLM client with metrics (latency histograms, error rates, token counts).
Build a RAG pipeline — Combine Module 5 (embeddings) and Module 6 (LLM API) skills to build a complete question-answering system.
Quick Reference#
Retry Logic Template#
def retry_with_backoff(fn, max_retries=3, base_delay=1.0):
for attempt in range(max_retries + 1):
try:
return fn()
except Exception as e:
if attempt == max_retries:
raise
time.sleep(base_delay * (2 ** attempt))
JSON Validation Template#
def validate_response(text, required_fields):
data = json.loads(text.strip())
missing = [f for f in required_fields if f not in data]
if missing:
raise ValueError(f"Missing fields: {missing}")
return data
Timeout Pattern#
response = requests.post(
url,
json=payload,
timeout=(5, 60) # (connect, read)
)