Documentation Index
Fetch the complete documentation index at: https://docs.avocadostudio.dev/llms.txt
Use this file to discover all available pages before exploring further.
Chat Behavior Troubleshooting Playbook
This playbook is for investigating prompt failures, wrong operations, and regressions in the site editor assistant. For background on the editing pipeline, see How It Works. For the full telemetry event reference, see Chat Telemetry Events.What telemetry is captured
The orchestrator now emits structured chat telemetry events for each request:receivedforced_planplan_attempt_failedplan_generatedplan_apply_failedrepair_attemptrepair_generatedresult
- request id
- session
- requested/effective slug
- model key/model used
- planner source (
openaiordemo) - prompt hash (stable fingerprint)
- prompt excerpt (short preview)
- prompt length
- intent/op types/op count (when available)
- outcome + error category (when available)
Persistence
Telemetry is persisted to NDJSON so it survives restarts.- Default file:
.data/chat-telemetry.ndjson - Env override:
CHAT_TELEMETRY_FILE - Disable persistence:
CHAT_TELEMETRY_PERSIST=0 - In-memory buffer size:
CHAT_TELEMETRY_LIMIT(default500)
APIs for debugging
1) Raw telemetry stream (filtered)
GET /telemetry/chat
Query params:
limit(default100, max1000)sessionphaseoutcome
2) Review summary for manual test runs
GET /telemetry/chat/review
Query params:
limit(default300, max2000)session
- analyzed count
- applied/failed counts
- failure rate
- failure breakdown by outcome
- failure breakdown by reason category
- top failed prompts (grouped by prompt hash)
- automatic recommendations
UI debug mode (for screenshots)
Enable debug metadata directly in assistant response cards:- Open Content Studio settings (gear icon).
- Enable
Debug mode. - Run the prompt and capture screenshot.
Debug panel with:
traceIdpromptHashoutcomereasoncategoryintentopCountandops- prompt excerpt
Standard workflow after manual UI testing
- Use a dedicated session for a test batch (for example:
manual-2026-03-01-a). - Run your manual prompts in UI.
- Pull review summary:
- Inspect top failures:
- high
schema_violation: normalization/repair gaps - high
not_found: slug/block resolution gaps - high
ambiguity: clarification prompts too weak
- high
- Drill into raw events for one problematic prompt hash:
- Convert top failed prompts into regression tests in:
apps/orchestrator/src/nlp-ops.test.ts
Recommended operating rules
- Always run manual tests with an explicit session id.
- Keep a fixed prompt suite for weekly regression checks.
- Add a test for every new recurring failed prompt family.
- Track failure rate trend (
/telemetry/chat/review) before/after changes.
Quick checklist when behavior is wrong
- Did the model fail to produce valid schema?
- Did normalization repair aliases correctly?
- Was deterministic fallback expected but not triggered?
- Was the apply step blocked by not-found or no-effective-change?
- Did clarification context leak across unrelated intents?