Documentation Index
Fetch the complete documentation index at: https://docs.avocadostudio.dev/llms.txt
Use this file to discover all available pages before exploring further.
Observability as Correctness: End-to-End Correlation Plan
Date: 2026-03-03Why this change
The current telemetry is useful but event-centric. We can debug failures, but we do not yet have a full lifecycle trace for each chat/edit request. AI editing failures are often non-obvious:- schema rejects
- partial apply failures
- repair/retry loops
- model behavior drift
- preview sync mismatches
Current state in repo
Already implemented
- Structured chat telemetry phases in orchestrator:
receivedforced_plandeterministic_plan_generatedplan_attempt_failedplan_generatedplan_apply_failedrepair_attemptrepair_generatedresult
- NDJSON persistence + API endpoints:
GET /telemetry/chatGET /telemetry/chat/review
- Per-request
traceIdalready included in chat debug payloads. - Preview patch transport includes
txId+patchAckhandshake.
Gaps
- No parent/child span model (flat events only).
- No standardized duration per lifecycle stage.
- No shared trace context across orchestrator + editor + preview bridge.
- Preview patch ack latency is not linked to server request trace.
- No first-class rollback span when progressive apply fails.
Target model
Every chat/edit request is one root trace:chat.request.
Child spans:
intent.detectplan.generate(attempt-aware)plan.normalizerepair.attemptrepair.generateplan.validateops.applyops.rollback(when needed)preview.sync(patch ack timing)response.finalize
traceIdspanIdparentSpanIdsessionsiteIdrequestedSlugeffectiveSlugprovidermodelKeymodelUsedpromptHash
Proposed implementation
Phase 1: Span model on top of existing telemetry
Goal: no behavior change, just richer telemetry.- Add telemetry tracing helper in orchestrator (example:
src/telemetry/trace.ts):startChatTrace(...)startSpan(name, attrs)endSpan(status, attrs?)recordException(error, attrs?)
- Extend telemetry entry shape with optional:
traceIdspanIdparentSpanIddurationMsattempt
- Keep existing phase events for backward compatibility.
- In
runChatPipeline(...), wrap each existing stage with span boundaries.
Phase 2: OpenTelemetry exporter and resource context
Goal: interoperable telemetry backend support.- Add dependencies in orchestrator:
@opentelemetry/api@opentelemetry/sdk-node@opentelemetry/exporter-trace-otlp-http@opentelemetry/resources
- Configure resource attributes:
service.name=ai-site-editor-orchestratorservice.version=<git sha or package version>deployment.environment=<env>
- Enable with env switch:
OTEL_ENABLED=1OTEL_EXPORTER_OTLP_ENDPOINT=...
Phase 3: Cross-app correlation to preview ack
Goal: close the loop from plan/apply to user-visible preview sync.- Include
traceId+ operation index inop_appliedSSE payload. - In editor
usePreviewBridge, measure:- patch send timestamp
- patch ack timestamp
ackMs
- Add endpoint
POST /telemetry/preview-ackin orchestrator to ingest:traceIdtxIdopIndexackMsacceptedreason
- Emit
preview.syncchild span from this payload.
Phase 4: Correctness-oriented metrics and SLOs
Metrics:- Histograms:
chat.plan.duration_mschat.apply.duration_mschat.preview_ack.duration_ms
- Counters:
chat.retry.countchat.repair.countchat.rollback.countchat.schema_reject.countchat.partial_apply.count
- p95
plan.generatelatency - p95
preview.syncack latency - repair rate
- rollback rate
- schema rejection rate
Suggested code touchpoints
apps/orchestrator/src/chat/chat-pipeline.ts- root trace and child spans around each lifecycle stage
apps/orchestrator/src/telemetry/chat-telemetry.ts- entry schema enrichment for span metadata and durations
apps/orchestrator/src/routes/chat.ts- include trace context in SSE op events
apps/editor/src/hooks/useChatEngine.ts- carry trace context through streaming apply path
apps/editor/src/hooks/usePreviewBridge.ts- measure and report patch ack timing
packages/preview-adapter/src/preview-bridge.tsx- keep ack semantics stable; optional payload enrichment
Rollout strategy
- Ship Phase 1 behind
CHAT_TRACE_SPANS=1and keep existing telemetry output unchanged. - Validate in local + integration tests (
chat-pipeline-integration.test.ts). - Enable Phase 2 in staging only; verify trace volume and cardinality.
- Add Phase 3 preview ack ingestion; ensure no UI regression when endpoint unavailable.
- Start alerting on correctness metrics (repair/rollback/schema reject trends).
Risks and mitigations
- Risk: telemetry cardinality explosion.
- Mitigation: cap high-cardinality attributes; hash long text; avoid raw prompts.
- Risk: frontend reporting failures.
- Mitigation: fire-and-forget preview ack endpoint; never block user flow.
- Risk: migration breaks existing telemetry consumers.
- Mitigation: additive schema only; keep old fields/phases.
Definition of done
- Every chat/edit request has one root trace with child spans for planning, validation, apply, and preview sync.
- Failed edits are searchable by
traceIdwith clear stage failure location. - Repair/retry/rollback rates are measurable over time.
- Preview ack latency is visible and attributable to specific edit traces.