Why-Engine trace

Every Legal surface that calls the AI stamps a Why trace — retrieval breakdown, model, per-event timing, cost, payload diagnostics. Open the trace panel by clicking Why? on any assistant message.

What gets recorded

WhyTrace
├─ id                  unique uuid
├─ tenant_id           tenant scope (404 if not yours)
├─ satellite           "legal"
├─ surface             "assistant" | "workflows" | "templates" |
│                      "custom_workflows" | "analytics" | "matter_dashboard"
├─ status              open | ok | error | aborted
├─ total_latency_ms
├─ total_cost_usd
└─ events[]
    ├─ kind            "retrieval" | "llm_call" | "citation"
    ├─ name            "legal_qa.retrieve", "analytics.synthesize", ...
    ├─ model           model id (or fine-tuned id if active)
    ├─ latency_ms
    ├─ payload         {chunks_returned, by_source: {vault, caselaw, ...},
    │                   scoped_classifications, fine_tuned, ...}
    └─ error           non-null on failed events

Reading the panel

The drawer opens on the right edge with the assistant message clicked. Three sections:

Header — satellite/surface/status, total latency, total cost.
Events list — each event with kind, name, model, latency, and a collapsible payload JSON.
Click events to expand the payload — the retrieval event shows how many chunks came from which source, whether the practice agent’s scope was broadened, whether a fine-tuned model was used.

API

GET /api/v1/admin/legal/why-traces/{trace_id}

Tenant-scoped: a trace from a different tenant returns 404 (not 403) to avoid revealing existence. Useful for ad-hoc audits or bulk-export into your own observability stack.

Retention

In-memory store with a configurable rolling window (why_trace_retention_in_memory in settings — default ~10k traces). Old traces are evicted FIFO. For long-term retention, sink to your audit pipeline by polling the getter endpoint.

Common diagnostics

Symptom	Look for
”No relevant material” answer	`payload.chunks_returned = 0` — Vault may be empty or query mis-embedded
Slow response	Compare `latency_ms` per event — usually `llm_call` dominates; if `retrieval` is slow, check pgvector index health
Wrong corpus answered	`payload.scoped_classifications` shows the practice-agent filter; `broadened=true` means scope returned empty and we widened
Caselaw not appearing	`payload.by_source.caselaw = 0` — CourtListener may be unreachable; check `payload.error`