Challenge 006: Investigate an Agent Observability IncidentΒΆ
ScenarioΒΆ
OutdoorGear's support agent had a latency spike and one failed request. You receive a small trace export with root agent spans and child tool/LLM spans. The current incident analyzer calculates metrics over the wrong spans and reports the wrong root cause.
Your job is to fix the analyzer so an on-call engineer can identify the failed trace, dependency root cause, error rate, and p95 latency.
ObjectiveΒΆ
Fix starter_observability.py so it summarizes the incident correctly and generates a validation code.
Your final analyzer should:
- Isolate root agent request spans
- Compute error rate over root requests only
- Compute nearest-rank p95 latency over root requests
- Identify the failed root trace
- Attribute root cause to the failing dependency span
Starter FilesΒΆ
Save these files in one folder named challenge-006/:
| File | Purpose | Download |
|---|---|---|
traces.json |
Mock agent trace export | Download |
starter_observability.py |
Broken incident analyzer | Download |
test_observability.py |
Acceptance tests | Download |
validate_observability.py |
Generates the final completion code | Download |
Challenge BriefΒΆ
You receive trace spans and a broken analyzer. There is no walkthrough: decide which spans count as requests, how to compute SRE metrics, and how to attribute the incident to the right child dependency.
ConstraintsΒΆ
- Use only the Python standard library in
starter_observability.py. - Do not hardcode the final summary dictionary.
- Metrics must be computed from spans.
- Child spans can explain root cause, but they should not inflate request counts.
Acceptance CriteriaΒΆ
Your solution is complete when:
python -m pytest test_observability.pypasses- Root requests are
tr-001,tr-002,tr-003, andtr-004 - Error rate is
25.0 - p95 latency is
2200 - Incident trace is
tr-003 - Root cause is
inventory_api_timeout
ValidationΒΆ
When your implementation is ready, run:
Enter the completion code printed by validate_observability.py:
HintsΒΆ
Hint 1 β Separate request spans from child spans
Root request metrics should not count every tool or LLM span.
Hint 2 β Root cause is usually below the root
The agent span tells you the request failed; the child dependency often tells you why.
Hint 3 β p95 has a definition
Use nearest-rank p95 for this challenge.
RubricΒΆ
| Area | Points | What good looks like |
|---|---|---|
| Span filtering | 25 | Root agent requests are isolated correctly |
| Metrics | 30 | Error rate and p95 use the right denominator |
| Root cause | 25 | Dependency failure is identified correctly |
| Incident summary | 10 | Output is concise and actionable |
| Simplicity | 10 | Local deterministic analysis |