Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.avocadostudio.dev/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Avocado Studio has two entry points for images: the Asset Manager modal in the editor, and the image.generate tool that the chat planner can call during natural-language edits. Both paths share the same providers and the same on-disk storage. The modal is defined in apps/editor/src/components/ImagePickerModal.tsx. The multi-turn chat view in apps/editor/src/components/ImageGenerateChat.tsx is an assistant-style chat that talks to POST /image/generate/chat when Gemini is configured. The image.generate tool manifest lives in apps/orchestrator/src/tools/builtins/image-generate.ts.

Image sources (modal tabs)

Which tabs appear depends on orchestrator env vars and the CMS media config the site passes to the editor. Tab selection is computed in ImagePickerModal.tsx (availableTabs).
TabEnabled whenBacking source
GenerateOPENAI_API_KEY or GOOGLE_GENAI_API_KEY setAI image generation (OpenAI or Gemini)
DriveGOOGLE_DRIVE_FOLDER_ID (or a Google service-account key) setGoogle Drive folder listing
UnsplashUNSPLASH_ACCESS_KEY setUnsplash search API
ContentfulCONTENTFUL_SPACE_ID + CONTENTFUL_DELIVERY_TOKEN set, or site passes a matching cmsMedia configCMS asset library
Sanity / StrapiSite passes a matching cmsMedia config to the editorCMS asset library
UploadAlways availableLocal file → POST /image/upload
Fastest way to configure the env-var sources (Generate / Drive / Unsplash / Contentful) is the first-time setup script. Running pnpm dev:setup walks you through each prompt, opens the provider’s signup/console page in your browser, and validates the key before writing it to .env. Sanity and Strapi are configured per-site in the editor’s Site Config drawer — there are no env vars for them.
The feature flags the editor reads (imageGenerate, imageGenerateChat, googleDrive, unsplash, contentful) are published by the orchestrator at GET /status/planner and are set purely from env vars — there is no per-session toggle today.

CMS asset libraries

Three CMS providers can appear as asset-picker tabs: Contentful, Sanity, and Strapi. They have two different configuration mechanisms, and the choice depends on the provider:
ProviderEnv-var path (orchestrator default)Per-site path (editor Site Config)
ContentfulCONTENTFUL_SPACE_ID + CONTENTFUL_DELIVERY_TOKEN (+ optional CONTENTFUL_ENVIRONMENT)cmsMedia: { provider: "contentful", spaceId, deliveryToken, environment? }
Sanity❌ not supportedcmsMedia: { provider: "sanity", projectId, dataset? }
Strapi❌ not supportedcmsMedia: { provider: "strapi", url, token? }
When to pick which for Contentful. Use the env-var path for single-space, single-tenant setups where every site in the orchestrator should share the same Contentful space — it’s a quick .env change. Use the per-site cmsMedia path when different sites point at different spaces, when end users need to supply their own credentials without redeploying the orchestrator, or when you want the asset picker to switch spaces as the active site switches. How the env-var path works. When CONTENTFUL_SPACE_ID and CONTENTFUL_DELIVERY_TOKEN are set, the orchestrator exposes GET /contentful/assets (see apps/orchestrator/src/routes/media.ts) as a proxy to the Contentful CDA. The editor’s asset picker calls that endpoint directly — no site-side code needed. Without those vars, the endpoint returns 404 Contentful not configured. How the per-site cmsMedia path works. CmsMediaConfig (defined in apps/editor/src/lib/editor-types.ts) is a discriminated union stored on SiteConfig.cmsMedia. The editor calls the CMS API directly from the browser using whatever credentials the site’s cmsMedia entry carries — there is no orchestrator proxy involved. Fetch logic lives in apps/editor/src/lib/cms-media.ts (fetchContentfulMedia, fetchSanityMedia, fetchStrapiMedia). Click path to configure a site’s cmsMedia:
  1. Open the editor at http://localhost:4100.
  2. Click the gear icon next to the site name in the chat header (aria-label “Site settings”), or open the Sites page and click Configure on a site card.
  3. In the drawer that slides in, scroll to the CMS media section.
  4. Pick a Provider (Contentful / Sanity / Strapi / none) and fill in the provider-specific fields:
    • Contentful — Space ID and Delivery token (environment defaults to master).
    • Sanity — Project ID and optional dataset (defaults to production). No token needed for public datasets.
    • Strapi — Base URL (e.g. https://cms.example.com) and optional API token. Public upload endpoints work without a token.
  5. Close the drawer. The CMS tab appears in the asset picker on next open.
The CMS config is stored per-site in the orchestrator’s site registry, not in the browser — it persists across sessions and across machines, and the AI planner sees it too.

Unsplash: licensing & attribution

Why Unsplash is in the picker. The Unsplash tab ships primarily as a demo / placeholder convenience — it lets stakeholders, evaluators, and new sites populate a site with real-looking imagery during a live session without leaving the editor or wiring up a DAM. Production sites should usually graduate to branded photography, generated imagery, or a CMS-backed asset library. If you do ship Unsplash photos to end users, the rules below apply — attribution and API compliance are the adopter’s responsibility, not the orchestrator’s.
Photos returned by the Unsplash tab (and by the unsplash.search tool) are served under the Unsplash License. The license is permissive — photos are free for commercial and non-commercial use and no permission from the photographer is required — but the Unsplash API Terms impose a few concrete obligations that adopters are responsible for meeting:
  • Credit the photographer and Unsplash wherever a selected photo is rendered. The unsplash.search response includes author (photographer name) and sourceUrl (photo page on unsplash.com) specifically so you can surface attribution in your UI — for example: Photo by Photographer Name on Unsplash.
  • Do not resell, redistribute, or host unmodified Unsplash photos as a stock-photo service, wallpaper pack, or competing search product.
  • Do not imply endorsement by photographers or by Unsplash of your product, brand, or customers.
  • Track downloads when building your own Unsplash-powered integration. The API requires a GET /photos/:id/download trigger per selected photo; the unsplash.search tool in this repo does not do this automatically — adopters integrating Unsplash into a custom flow outside of the built-in tab should implement it to stay within Unsplash’s API guidelines.
Generated images (OpenAI, Gemini) and assets returned from CMS tabs (Contentful, Sanity, Strapi) have their own licensing and usage rules; consult the relevant provider terms before shipping content publicly.

AI providers

OpenAI (default)

  • Models: OPENAI_IMAGE_MODEL (default gpt-image-1) for quality: "final", OPENAI_IMAGE_MODEL_DRAFT (default gpt-image-1-mini) for quality: "draft".
  • Sizes map from aspect ratio: landscape → 1536x1024, square → 1024x1024, portrait → 1024x1536.
  • Native transparency via the background parameter (transparent, opaque, auto).
  • Output formats: png, webp, jpeg.

Google Gemini — “nano-banana”

Gemini 2.5 Flash Image is the model publicly nicknamed “nano-banana”. Avocado calls it via the @google/genai SDK.
  • Model: GOOGLE_GENAI_IMAGE_MODEL, defaulting to gemini-2.5-flash-image.
  • Aspect ratios are mapped to Gemini’s imageConfig.aspectRatio strings: landscape → 3:2, square → 1:1, portrait → 2:3. 16:9 and 9:16 can be triggered from prompt hints (e.g. “wide 16:9”).
  • Quality tiers map to Gemini’s imageSize: draft → 1K, final → 2K.
  • No native transparency. When background: "transparent" is requested, the orchestrator appends a prompt hint telling the model to render on a fully transparent background — results vary.
  • Supports multi-turn chat and reference images (see next section).

Provider selection

The image.generate tool (the one the AI planner calls) picks its backend from IMAGE_GEN_PROVIDERopenai (default) or gemini. The editor’s Generate tab picks OpenAI for single-shot generation and routes to POST /image/generate/chat when imageGenerateChat is enabled (i.e. GOOGLE_GENAI_API_KEY is set).

Multi-turn image chat (Gemini only)

When GOOGLE_GENAI_API_KEY is set, the Generate tab becomes a full assistant chat built on @assistant-ui/react, streaming against POST /image/generate/chat (apps/orchestrator/src/routes/media.ts).

Request body

{
  "prompt": "...",                 // required
  "chatId": "…",                   // optional — resume an existing session
  "aspectRatio": "3:2",            // optional — "landscape" | "square" | "portrait" | explicit ratio
  "stream": true,                  // optional — SSE vs JSON response
  "referenceImageUrl": "https://…",// optional — primary reference
  "referenceImageUrls": ["…"]      // optional — up to 14 additional references, ≤5 MB each
}

SSE event stream (stream: true)

event: chatId   → { chatId, aspectRatio }
event: status   → { stage: "Generating image…" }
event: text     → { text }        // streamed text tokens from Gemini
event: image    → { url, alt }    // generated image ready
event: error    → { error }
event: done     → {}
If stream is omitted, the endpoint returns a single JSON payload with { chatId, url, alt, text, aspectRatio }.

UI modes

ImageGenerateChat.tsx switches between three modes depending on whether an image already exists in the field:
  • choose — current image exists; user picks “Edit this image” or “Generate a new one”.
  • edit — re-generate using the current image as reference context, so Gemini can honor existing composition, palette, or subjects.
  • new — fresh generation from the prompt alone.
The modal also measures the current image’s natural dimensions and snaps to the closest Gemini-supported aspect ratio (1:1, 3:2, 2:3, 16:9, 9:16) so the new image matches the slot it will fill.

Session management

  • Sessions are kept in an in-memory map on the orchestrator, capped at 200 concurrent sessions.
  • Idle sessions are evicted after 30 minutes (LRU).
  • Changing aspect ratio mid-session creates a new session. Gemini’s imageConfig is immutable once a chat is created, so the orchestrator detects the change, deletes the old session, and starts a new one.
  • Sessions do not survive an orchestrator restart.

Reference images

Up to 14 reference images can be attached to the first message in a session. Each is fetched server-side, validated against a 5 MB size cap, and forwarded to Gemini as base64 inlineData parts alongside the prompt. Failed fetches are logged and skipped — the request does not fail if some references are unreachable.

image.generate tool (chat pipeline path)

When the user asks the chat to produce or replace an image, the planner can call the image.generate tool. The full manifest is in apps/orchestrator/src/tools/builtins/image-generate.ts.
{
  "name": "image.generate",
  "capability": "read",
  "timeoutMs": 90000,
  "retryPolicy": { "maxAttempts": 1 },
  "idempotent": false,
  "inputSchema": {
    "prompt":      "string (required)",
    "aspectRatio": "landscape | square | portrait",
    "quality":     "draft | final",
    "style":       "string (optional style guidance)",
    "background":  "transparent | opaque | auto",
    "outputFormat":"png | webp | jpeg",
    "blockType":   "Hero | Card | …",
    "blockId":     "string",
    "pageSlug":    "string"
  },
  "outputSchema": {
    "imageUrl": "string",
    "alt":      "string",
    "width":    "number",
    "height":   "number"
  }
}

Prompt enrichment

When blockType, blockId, or pageSlug are provided, the tool enriches the raw prompt with the block’s composition hint, the page title, and the block’s existing heading / subheading / title props. It also appends the default constraints: no text overlays, no logos, no watermarks. Known composition hints include Hero, Banner, CTA, Card, CardGrid, FeatureGrid, Gallery, Carousel, and TwoColumn.

Progress streaming

The handler emits five progress stages via context.onImageProgress, and those events surface on the chat SSE stream as image_progress events (see How It Works › Step 5):
 0%  Understanding prompt…
15%  Composing scene…
40%  Rendering image…
75%  Finalizing details…
95%  Almost there…
100% Done

Deferred image resolution

With CHAT_DEFER_IMAGE_RESOLUTION=1 (the default), the chat pipeline applies text and structural ops immediately and resolves image tool calls in the background. The editor receives the text updates at once and the image URLs patch in via follow-up SSE events once generation completes.

Other endpoints

All from apps/orchestrator/src/routes/media.ts.
EndpointPurpose
POST /image/generateSingle-shot generation proxy. Uses IMAGE_GEN_PROVIDER (OpenAI fallback). Returns { url, alt }.
POST /image/generate/chatMulti-turn Gemini chat (documented above). Requires GOOGLE_GENAI_API_KEY.
POST /image/uploadMultipart form upload. Writes to ORCHESTRATOR_GENERATED_IMAGE_DIR and returns { url, bytes, mimeType }.
GET /unsplash/search?q=&page=&limit=Unsplash proxy used by the modal’s Unsplash tab and by the unsplash.search tool.
GET /gdrive/images / GET /gdrive/images/:fileIdGoogle Drive browse/download with server-side resize, WebP conversion, and EXIF strip.

Environment variables

VariableDefaultPurpose
IMAGE_GEN_PROVIDERopenaiopenai or gemini — which backend image.generate uses.
OPENAI_API_KEYEnables OpenAI generation and the imageGenerate feature flag.
OPENAI_IMAGE_MODELgpt-image-1Final-quality OpenAI image model.
OPENAI_IMAGE_MODEL_DRAFTgpt-image-1-miniDraft-quality OpenAI image model.
GOOGLE_GENAI_API_KEYEnables Gemini generation and the imageGenerateChat feature flag.
GOOGLE_GENAI_IMAGE_MODELgemini-2.5-flash-imageGemini image model — the “nano-banana” model.
UNSPLASH_ACCESS_KEYEnables the Unsplash tab and the unsplash.search tool.
GOOGLE_DRIVE_FOLDER_IDEnables the Drive tab (alternatively GOOGLE_SERVICE_ACCOUNT_KEY_JSON or GOOGLE_API_KEY).
CONTENTFUL_SPACE_IDEnables the Contentful tab as an orchestrator default (must be paired with CONTENTFUL_DELIVERY_TOKEN). Per-site cmsMedia overrides this.
CONTENTFUL_DELIVERY_TOKENContentful Content Delivery API token. Required alongside CONTENTFUL_SPACE_ID.
CONTENTFUL_ENVIRONMENTmasterContentful environment (branch) to read assets from.
ORCHESTRATOR_GENERATED_IMAGE_DIR.data/generated-imagesFilesystem target for uploaded and generated images.
CHAT_DEFER_IMAGE_RESOLUTION1Apply text/structural ops first and resolve images in the background.

Limits & constraints

  • Reference images: ≤14 per message, ≤5 MB each. Must be reachable over HTTP(S) from the orchestrator, or uploaded first via POST /image/upload.
  • Gemini sessions: in-memory only. Capped at 200 concurrent with 30-minute LRU eviction. An orchestrator restart wipes all sessions.
  • Gemini transparency: no native support — background: "transparent" falls back to a prompt hint and the result is best-effort.
  • image.generate timeout: 90 seconds, no automatic retry.
  • Image storage: generated images are written to ORCHESTRATOR_GENERATED_IMAGE_DIR. For production, mount a persistent volume or upload to an external bucket as part of your publish target.