Module 6 Quiz — LLM APIs (Python)#
CodeVision Academy
This quiz tests your understanding of building reliable Python systems around LLM APIs.
Format: 32 Multiple Choice Questions + 18 Written Questions Randomized: 3 MCQ + 2 Written per attempt
Multiple Choice Questions (32)#
Group 1: Service Mindset (6.1-6.3)#
MCQ-01. What is the fundamental difference between calling a local function and calling an LLM API?
A) LLM APIs are faster than local functions B) LLM APIs are remote, rate-limited, and non-deterministic services C) Local functions cannot process text D) LLM APIs always return the same result for the same input
Answer: B
MCQ-02. Which of the following is NOT a typical challenge when working with LLM APIs?
A) Latency (1-30+ seconds per call) B) Rate limiting C) Guaranteed structured output D) Non-deterministic responses
Answer: C
MCQ-03. What is the recommended mindset when engineering LLM systems?
A) Assume success, handle errors as exceptions B) Assume failure, engineer for correctness C) Ignore errors since LLMs are highly reliable D) Only test in production
Answer: B
MCQ-04. Why should configuration values like API keys and base URLs not be hardcoded?
A) It makes the code run slower B) Hardcoded values cannot be changed without code edits, and secrets may leak C) Python cannot read hardcoded strings D) Hardcoded values use more memory
Answer: B
MCQ-05. What is the typical structure of an LLM API request?
A) Plain text over FTP B) Binary data over WebSocket C) JSON over HTTP D) XML over SMTP
Answer: C
MCQ-06. Which component in an LLM API payload controls the randomness of the output?
A) max_tokens B) model C) temperature D) messages
Answer: C
MCQ-07. What does temperature=0.0 typically mean for LLM output?
A) The model will refuse to respond B) The output will be maximally random C) The output will be more deterministic/predictable D) The model will generate longer responses
Answer: C
Group 2: Building Robust Clients (6.4-6.8)#
MCQ-08. Why is wrapping LLM calls in a client class recommended over raw requests?
A) Classes are always faster than functions B) Encapsulation, centralized config, retry logic, and testability C) Python requires classes for HTTP requests D) Raw requests cannot handle JSON
Answer: B
MCQ-09. What is the purpose of the timeout parameter (5, 60) in a requests call?
A) Retry 5 times with 60 second delays B) Connect timeout of 5 seconds, read timeout of 60 seconds C) Wait 5 minutes then fail after 60 attempts D) Send request at 5:60 PM
Answer: B
MCQ-10. Which HTTP status code indicates rate limiting?
A) 200 B) 401 C) 429 D) 500
Answer: C
MCQ-11. What should you do when receiving HTTP 401 or 403 errors?
A) Retry immediately B) Retry with exponential backoff C) Don’t retry - fix the authentication configuration D) Increase the timeout
Answer: C
MCQ-12. What is exponential backoff?
A) Retrying immediately without any delay B) Waiting progressively longer between each retry attempt C) Sending multiple requests simultaneously D) Reducing the request payload size exponentially
Answer: B
MCQ-13. In exponential backoff with base delay of 1 second, what is the delay before the 4th attempt?
A) 1 second B) 3 seconds C) 4 seconds D) 8 seconds
Answer: C (2^(4-2) = 4 seconds, since attempt 1 has no delay, attempt 2 waits 1s, attempt 3 waits 2s, attempt 4 waits 4s)
MCQ-14. Which type of failure should trigger a retry?
A) HTTP 401 Unauthorized B) HTTP 403 Forbidden C) HTTP 429 Too Many Requests D) Invalid API key configuration
Answer: C
MCQ-15. What is a transient failure?
A) A permanent configuration error B) A temporary issue that may resolve itself (network hiccup, brief overload) C) An authentication failure D) A syntax error in the code
Answer: B
Group 3: Structured Output & Validation (6.9-6.11)#
MCQ-16. Why is structured output (JSON) preferred over free-form text from LLMs?
A) JSON is smaller than text B) LLMs cannot produce free-form text C) JSON is parseable, predictable, and can be validated programmatically D) Free-form text is always incorrect
Answer: C
MCQ-17. Which prompt pattern is MOST likely to get reliable JSON output from an LLM?
A) “Return JSON” B) “Return ONLY valid JSON. No other text. Schema: {…}” C) “Maybe return some JSON if you want” D) “Output data”
Answer: B
MCQ-18. What is the first step in validating LLM JSON output?
A) Check if the values make business sense B) Verify required fields are present C) Parse the text as valid JSON syntax D) Check the data types
Answer: C
MCQ-19. Why might you need to strip markdown code blocks from LLM responses?
A) Markdown is illegal in JSON B) LLMs often wrap JSON in ```json blocks even when asked not to C) Code blocks make responses longer D) Python cannot read markdown
Answer: B
MCQ-20. Which validation layer checks if values are within acceptable ranges?
A) Syntax validation B) Schema validation C) Type validation D) Value/business logic validation
Answer: D
MCQ-21. What should happen when LLM output fails validation?
A) Use the output anyway B) Silently ignore the error C) Raise an exception or trigger error handling D) Delete the LLM client
Answer: C
MCQ-22. What is schema validation?
A) Checking if JSON is syntactically correct B) Checking if required fields exist in the response C) Checking if the server is online D) Checking the API key format
Answer: B
Group 4: Testing & Determinism (6.12-6.13)#
MCQ-23. Why should unit tests NOT call live LLM APIs?
A) LLMs don’t work with test frameworks B) Cost, speed, flakiness, availability issues, and rate limits C) Tests are not important for LLM code D) Python cannot test API calls
Answer: B
MCQ-24. What is mocking in the context of LLM testing?
A) Making fun of the LLM’s responses B) Replacing the real LLM call with a predictable fake C) Calling the LLM multiple times D) Testing with production data
Answer: B
MCQ-25. Which technique provides PERFECT determinism for testing LLM-integrated code?
A) Setting temperature to 0 B) Using the seed parameter C) Mocking the LLM client D) Requesting JSON output
Answer: C
MCQ-26. What is the benefit of caching LLM responses?
A) Makes LLMs smarter B) Determinism for repeated calls, reduced costs, faster responses C) Improves the quality of responses D) Increases rate limits
Answer: B
MCQ-27. When testing a function that parses LLM JSON output, what should you test?
A) Only that the LLM returns valid JSON B) The parsing logic with mock data, without calling the LLM C) Nothing - parsing cannot be tested D) Only production responses
Answer: B
MCQ-28. What should a mock LLM client track for test verification?
A) The weather B) The calls it received (prompts, parameters) C) The user’s location D) Network latency
Answer: B
Group 5: Production Concerns (6.14-6.15)#
MCQ-29. What should be logged for every LLM API call in production?
A) Only successful responses B) Request ID, timestamp, latency, success/failure, and content previews C) Nothing - logging is not important D) Only the full prompt text
Answer: B
MCQ-30. Why is audit logging important for enterprise LLM applications?
A) It makes the LLM faster B) Cost tracking, debugging, compliance, and security C) It is required by Python D) It improves response quality
Answer: B
MCQ-31. In a RAG system, what role does Module 6 (LLM APIs) play?
A) Generating embeddings B) Vector search C) Calling the LLM with context and handling the response D) Storing documents
Answer: C
MCQ-32. What is the purpose of a request ID in LLM call logging?
A) To make requests faster B) To uniquely identify and trace each request through the system C) To authenticate the user D) To increase rate limits
Answer: B
Written Questions (18)#
Group 1: Service Mindset (6.1-6.3)#
WQ-01. Explain why the statement “An LLM is not a function call” is important for software engineers. What are three key differences between calling a local function and calling an LLM API?
Expected themes: Remote service, latency, non-determinism, rate limits, cost, potential failures, no guaranteed output format
WQ-02. What does “failure is normal, correctness is engineered” mean in the context of LLM integration? Give an example of how you would apply this principle.
Expected themes: Defensive programming, expecting failures, validation, retry logic, graceful degradation
WQ-03. Why is configuration discipline important when building LLM clients? What problems can occur if API keys and URLs are hardcoded?
Expected themes: Security risks, environment flexibility, secrets exposure, maintainability, testability
Group 2: Building Robust Clients (6.4-6.8)#
WQ-04. Describe the benefits of encapsulating LLM API calls in a client class rather than making raw HTTP requests throughout your codebase.
Expected themes: Encapsulation, single responsibility, easier testing, centralized configuration, retry logic, logging
WQ-05. Explain exponential backoff and why it’s the preferred retry strategy for API calls. Include an example of the timing for 4 retry attempts.
Expected themes: Progressive delays (1s, 2s, 4s, 8s), giving server time to recover, not overwhelming, max delay caps
WQ-06. What is the difference between a transient failure and a persistent failure? How should your code handle each type differently?
Expected themes: Transient = temporary (retry helps), persistent = config/auth errors (retry won’t help), different handling strategies
WQ-07. Why do LLM API clients typically use two separate timeout values (connect timeout and read timeout)? What would appropriate values be and why?
Expected themes: Connect = network issues (should be short, ~5s), read = response generation (can be long, ~60s), LLM responses take time
Group 3: Structured Output & Validation (6.9-6.11)#
WQ-08. Why is requesting JSON output from LLMs important for software systems? What problems arise from using free-form text responses?
Expected themes: Parseability, predictability, type safety, automation, avoiding regex parsing, validation
WQ-09. Describe the layers of validation you should apply to LLM JSON responses, from basic to comprehensive.
Expected themes: Syntax (json.loads), schema (required fields), types (isinstance), values (ranges), semantic (business logic)
WQ-10. Write a prompt pattern that maximizes the likelihood of getting valid JSON from an LLM. Explain why each part of your pattern helps.
Expected themes: Clear instruction, schema specification, “no other text” constraint, example format, rules section
WQ-11. What common artifacts might appear in LLM responses that need to be cleaned before JSON parsing? How would you handle them?
Expected themes: Markdown code blocks (```json), extra whitespace, trailing explanations, strip/clean functions
Group 4: Testing & Determinism (6.12-6.13)#
WQ-12. Explain why unit tests should never call live LLM APIs. What are at least four problems this causes?
Expected themes: Cost, speed, flakiness/non-determinism, availability, rate limits, CI/CD issues
WQ-13. Describe how you would test a function that uses an LLM to classify sentiment, without calling the actual LLM.
Expected themes: Mock client, predefined responses, test the classification logic separately, verify prompts
WQ-14. What is response caching for LLM calls and what benefits does it provide? Are there any situations where caching would be inappropriate?
Expected themes: Cache by prompt hash, determinism, cost savings, speed; inappropriate for time-sensitive or user-specific data
Group 5: Production Concerns (6.14-6.15)#
WQ-15. Design a logging strategy for production LLM calls. What fields would you capture and why?
Expected themes: Request ID, timestamp, model, prompt preview, response preview, latency, success/error, tokens used
WQ-16. How do the skills from Module 5 (embeddings, vector search) and Module 6 (LLM APIs) combine to create a RAG system? Describe the flow.
Expected themes: Embed question, vector search for context, build prompt with context, call LLM with retry, validate response
WQ-17. What enterprise concerns (beyond just “getting an answer”) must be addressed when deploying LLM-based systems? Name at least four.
Expected themes: Cost control, reliability/uptime, compliance/audit, security, testability, monitoring, rate limiting
WQ-18. You’re building a customer support bot that uses an LLM. Describe how you would implement graceful degradation when the LLM API is unavailable.
Expected themes: Fallback responses, cached FAQ answers, escalation to human, status page/messaging, retry queues
Answer Key Summary#
MCQ Answers#
Q |
A |
Q |
A |
Q |
A |
Q |
A |
|---|---|---|---|---|---|---|---|
01 |
B |
09 |
B |
17 |
B |
25 |
C |
02 |
C |
10 |
C |
18 |
C |
26 |
B |
03 |
B |
11 |
C |
19 |
B |
27 |
B |
04 |
B |
12 |
B |
20 |
D |
28 |
B |
05 |
C |
13 |
C |
21 |
C |
29 |
B |
06 |
C |
14 |
C |
22 |
B |
30 |
B |
07 |
C |
15 |
B |
23 |
B |
31 |
C |
08 |
B |
16 |
C |
24 |
B |
32 |
B |
Written Question Themes#
All written questions should demonstrate understanding of:
LLM APIs as unreliable external services
Defensive programming and validation
Testing without live API calls
Production-grade logging and monitoring
Enterprise concerns (cost, compliance, reliability)