Skip to content

Challenge 006: Investigate an Agent Observability IncidentΒΆ

Level: L300 Type: Challenge Time: ~60 min πŸ’° Cost: Free (local)

ScenarioΒΆ

OutdoorGear's support agent had a latency spike and one failed request. You receive a small trace export with root agent spans and child tool/LLM spans. The current incident analyzer calculates metrics over the wrong spans and reports the wrong root cause.

Your job is to fix the analyzer so an on-call engineer can identify the failed trace, dependency root cause, error rate, and p95 latency.


ObjectiveΒΆ

Fix starter_observability.py so it summarizes the incident correctly and generates a validation code.

Your final analyzer should:

  • Isolate root agent request spans
  • Compute error rate over root requests only
  • Compute nearest-rank p95 latency over root requests
  • Identify the failed root trace
  • Attribute root cause to the failing dependency span

Starter FilesΒΆ

Save these files in one folder named challenge-006/:

File Purpose Download
traces.json Mock agent trace export Download
starter_observability.py Broken incident analyzer Download
test_observability.py Acceptance tests Download
validate_observability.py Generates the final completion code Download

Challenge BriefΒΆ

You receive trace spans and a broken analyzer. There is no walkthrough: decide which spans count as requests, how to compute SRE metrics, and how to attribute the incident to the right child dependency.


ConstraintsΒΆ

  • Use only the Python standard library in starter_observability.py.
  • Do not hardcode the final summary dictionary.
  • Metrics must be computed from spans.
  • Child spans can explain root cause, but they should not inflate request counts.

Acceptance CriteriaΒΆ

Your solution is complete when:

  • python -m pytest test_observability.py passes
  • Root requests are tr-001, tr-002, tr-003, and tr-004
  • Error rate is 25.0
  • p95 latency is 2200
  • Incident trace is tr-003
  • Root cause is inventory_api_timeout

ValidationΒΆ

When your implementation is ready, run:

python -m pytest test_observability.py
python validate_observability.py

Enter the completion code printed by validate_observability.py:


HintsΒΆ

Hint 1 β€” Separate request spans from child spans

Root request metrics should not count every tool or LLM span.

Hint 2 β€” Root cause is usually below the root

The agent span tells you the request failed; the child dependency often tells you why.

Hint 3 β€” p95 has a definition

Use nearest-rank p95 for this challenge.


RubricΒΆ

Area Points What good looks like
Span filtering 25 Root agent requests are isolated correctly
Metrics 30 Error rate and p95 use the right denominator
Root cause 25 Dependency failure is identified correctly
Incident summary 10 Output is concise and actionable
Simplicity 10 Local deterministic analysis