Challenge 002: Fix a Broken RAG Pipeline¶

Level: L200 Type: Challenge Time: ~75 min 💰 Cost: Free (local)

Scenario¶

OutdoorGear has a local RAG prototype for support questions. It should retrieve the right policy or product guide, then answer using only retrieved context. The current prototype is broken: retrieval ranks poor chunks first, answers ignore evidence, and evaluation reports misleading metrics.

Your job is to fix the pipeline without using an LLM, vector database, or RAG framework.

Objective¶

Implement the missing or broken logic in starter_rag_pipeline.py so the RAG pipeline retrieves the right source documents, produces grounded answers, reports correct evaluation metrics, and generates a validation code.

Your final pipeline should:

Normalize query/document text for retrieval
Chunk documents while preserving source metadata
Rank chunks by relevance
Produce concise answers from retrieved evidence
Evaluate top-1 retrieval accuracy and required-term answer coverage

Starter Files¶

Save these files in one folder named challenge-002/:

File	Purpose	Download
`documents.json`	Mock OutdoorGear knowledge base	Download
`queries.json`	Evaluation queries and expected evidence	Download
`starter_rag_pipeline.py`	Broken RAG pipeline	Download
`test_rag_pipeline.py`	Acceptance tests	Download
`validate_rag_pipeline.py`	Generates the final completion code	Download

Challenge Brief¶

You receive a tiny knowledge base, a set of evaluation queries, and a broken local RAG pipeline. There is no walkthrough: decide how to chunk, score, retrieve, answer, and evaluate so the system behaves like a reliable grounded support assistant.

Constraints¶

Use only the Python standard library in starter_rag_pipeline.py.
Do not call an LLM API.
Do not use embeddings or a vector database.
Do not hardcode answers for individual query IDs.
Use retrieved evidence in answer_question().
Preserve the public function names used by the tests.

Acceptance Criteria¶

Your solution is complete when:

python -m pytest test_rag_pipeline.py passes
Chunk metadata preserves chunk_id, doc_id, title, and text
The top document for each fixture query is correct
Answers include the required evidence terms
Evaluation reports top1_accuracy == 1.0
Evaluation reports required_coverage == 1.0

Validation¶

When your implementation is ready, run:

python -m pytest test_rag_pipeline.py
python validate_rag_pipeline.py

Enter the completion code printed by validate_rag_pipeline.py:

Hints¶

Hint 1 — Retrieval quality starts with normalization

Punctuation, case, and common stop words can dominate a small lexical retriever if you do not normalize them.

Hint 2 — Chunking is part of retrieval

A chunk should be small enough to score precisely but still carry enough source metadata to explain where the answer came from.

Hint 3 — Answer from evidence, not from the query

If a required term is not present in the retrieved context, the answer should not invent it.

Hint 4 — Metrics need the right denominator

Top-1 accuracy and coverage are per-query metrics. Check what you are dividing by.

Rubric¶

Area	Points	What good looks like
Retrieval	35	Correct top document for each query
Chunking	20	Metadata preserved and chunk sizes controlled
Grounded answers	20	Answers include evidence from retrieved chunks
Evaluation	15	Metrics reflect query-level performance
Simplicity	10	No framework or hardcoded query-specific answers

Stretch Goals¶

Add reciprocal rank fusion over title and body scores
Return citations with chunk IDs in the answer
Add a "not enough evidence" answer when retrieval confidence is low
Add one new query to queries.json and update the validator payload locally