Challenge 004: Optimize a Token BudgetΒΆ
ScenarioΒΆ
OutdoorGear has a support assistant that sends too much context to the model. It includes long legal text and brand-story content even when a concise policy summary would answer the question. The team wants shorter prompts without losing answer quality.
Your job is to fix a local prompt-budget optimizer that selects compact relevant context and keeps every prompt under a strict token limit.
ObjectiveΒΆ
Fix starter_token_budget.py so the optimizer selects the right context document for each request, avoids irrelevant context, builds compact prompts, reports budget compliance, and generates a validation code.
Your final optimizer should:
- Estimate prompt tokens deterministically
- Normalize keywords for lexical matching
- Prefer concise relevant summaries over long noisy documents
- Select context within a context-token budget
- Build prompts that stay under each request's total budget
- Report selected documents and budget compliance
Starter FilesΒΆ
Save these files in one folder named challenge-004/:
| File | Purpose | Download |
|---|---|---|
context_docs.json |
Compact and verbose support context | Download |
requests.json |
Budgeted support requests | Download |
starter_token_budget.py |
Broken budget optimizer | Download |
test_token_budget.py |
Acceptance tests | Download |
validate_token_budget.py |
Generates the final completion code | Download |
Challenge BriefΒΆ
You receive compact docs, verbose docs, budgeted requests, and a broken optimizer. There is no walkthrough: decide how to estimate tokens, rank context, select evidence, and build prompts that stay within the budget.
ConstraintsΒΆ
- Use only the Python standard library in
starter_token_budget.py. - Do not call an LLM API.
- Do not hardcode behavior by request ID.
- Do not include irrelevant docs just because there is room.
- Keep prompt structure short and grounded.
Acceptance CriteriaΒΆ
Your solution is complete when:
python -m pytest test_token_budget.pypasses- The concise expected document is selected for each request
- Verbose or irrelevant documents are not selected over concise matches
- Every prompt is within its
max_prompt_tokens - Evaluation reports
within_budget is True - Evaluation reports
all_prompts_under_budget is True
ValidationΒΆ
When your implementation is ready, run:
Enter the completion code printed by validate_token_budget.py:
HintsΒΆ
Hint 1 β Token estimation does not need to be perfect
It only needs to be deterministic and close enough to compare prompt sizes.
Hint 2 β Relevance is not enough
A long document with the right words may be less useful than a concise summary with the same evidence.
Hint 3 β Budget at selection time
It is easier to stay under budget if you reject context before building the final prompt.
Hint 4 β Prompt template matters
A verbose instruction template can consume the same budget you are trying to save.
RubricΒΆ
| Area | Points | What good looks like |
|---|---|---|
| Token estimation | 20 | Deterministic, word-like, useful for budgets |
| Context ranking | 30 | Selects concise relevant docs |
| Prompt construction | 25 | Stays grounded and under budget |
| Evaluation | 15 | Reports correct selected docs and compliance |
| Simplicity | 10 | Local deterministic logic, no over-engineering |
Stretch GoalsΒΆ
- Add separate budgets for system prompt, context, and answer
- Report token savings versus the verbose baseline
- Add a fallback when no document fits the budget
- Add a new request with a tighter budget and update the validator payload locally