API reference · v1

Audio out.
One endpoint.

Maia is an HTTP API that returns narrated audio. Send text (basic) or context + goal (pro), get WAV bytes back. No SDK. No surprises. One call from your code to a voice your user hears.

Create an API key from the dashboard (keys look like mvk_…), then call POST /v1/generate:

curl -X POST https://api.maia.example/v1/generate \
  -H "Authorization: Bearer mvk_live_xxx" \
  -H "Content-Type: application/json" \
  --output out.wav \
  -d '{
    "mode": "basic",
    "text": "Welcome to Maia. Let'"'"'s get you set up.",
    "voice": "ember",
    "tone": "warm"
  }'

New accounts get a 27¢ trial credit on first sign-in — enough for about three minutes of basic narration.

All /v1/* endpoints require a bearer token:

Authorization: Bearer <token>

Two token types are accepted:

  • API keys (mvk_…) — server-to-server. Create and revoke them on the keys page. Secret shown once at creation.
  • Firebase ID tokens — the dashboard uses these. API keys are rejected on /v1/keys and /v1/billing/*.
Basic
$0.09
per minute · 1¢ min charge

You supply the script. Billed per second.

Pro
$0.20
per minute · 15s min

Agents plan, write, narrate. One call.

Server rejects with 402 if below floor. Add credits on the billing page.

Synthesizes audio. Returns an audio/wav response.

Basic — bring your own script

{
  "mode": "basic",
  "text": "Your script here.",     // required, ≤5000 chars
  "voice": "ember",                 // optional, see Voices
  "tone": "warm",                   // optional, see Tones
  "expressiveness": 0.6,            // optional, 0..1
  "languageCode": "en-US"           // optional, BCP-47
}

Pro — agents write the script

{
  "mode": "pro",
  "context": "What this clip is for, audience, goal.",  // required, ≤4000 chars
  "target_seconds": 30,                                  // optional, 15..120, default 30
  "voice": "atlas",
  "tone": "authoritative"
}

Pro returns the generated script in the x-maia-script header (URL-encoded).

Response headers

HeaderMeaning
x-maia-call-idStable ID for this generation — use when reporting issues.
x-maia-secondsDuration of the returned audio, in seconds.
x-maia-charge-centsAmount debited from your balance.
x-maia-scriptPro only. URL-encoded script the agent wrote.

Idempotency

Pass an Idempotency-Key header to prevent duplicate charges on retried requests. A second request with the same key on an already-settled call returns 409.

Returns the caller's balance and 30-day usage.

{
  "account": { "id": "acc_…", "email": "you@example.com" },
  "balance_cents": 1873,
  "usage_30d": [
    { "mode": "basic", "total_cents": 412, "count": 87 },
    { "mode": "pro",   "total_cents": 1060, "count": 12 }
  ]
}

Set voice to one of the IDs below. Default is ember.

IDGenderBest for
emberfemaleWarm, grounded — coaching, onboarding.
novafemaleBright, confident — announcements, launches.
wrenfemaleCrisp, precise — tutorials, product walkthroughs.
sagefemaleCalm, measured — meditation, wellness.
marlowfemaleSmoky, late-night — lifestyle, storytelling.
irisfemaleClear, professional — IVR, corporate narration.
junofemaleAuthoritative — news, briefings.
larkfemaleUpbeat, morning-show — ads, social spots.
dahliafemaleRich, dramatic — audiobooks, fiction.
piperfemalePlayful, sharp — UX prompts, quick reads.
atlasmaleDeep, steady — narration, documentaries.
archermaleEnergetic, confident — sales, pitches.
reesemalePolished announcer — promos, trailers.
hugomaleAvuncular, warm — explainers, coaching.
onyxmaleLow, serious — cinematic, high-stakes.
cyrusmaleCharismatic, expressive — hosts, interviews.
brammaleGrounded, deliberate — meditation, guidance.
kaimaleBright, modern — tech explainers, podcasts.
dashmaleQuick, punchy — ads, shorts.
orinmaleWise storyteller — long-form, audiobooks.

tone sets delivery register. Default is neutral.

neutralcalmwarmenergeticauthoritativeintimateplayfulnarrativeurgentinspirational

Sprinkle {{tag}} markers in your script to shape local delivery. Unknown tags are left as-is.

"Hey — {{pause}} I have good news. {{happy}} We shipped it."
Pacing
{{slow}}{{fast}}{{pause}}{{pause-long}}
Nonverbal
{{whisper}}{{laugh}}{{sigh}}{{gasp}}{{giggle}}{{cry}}{{shout}}
Upbeat
{{happy}}{{amused}}{{excited}}{{enthusiastic}}{{optimistic}}{{grateful}}{{determined}}{{confident}}{{eager}}{{hopeful}}{{encouraging}}{{loving}}{{adoring}}
Curiosity
{{curious}}{{interested}}{{awed}}{{amazed}}{{astonished}}{{surprised}}{{admiring}}
Stress
{{nervous}}{{anxious}}{{frustrated}}{{annoyed}}{{agitated}}{{confused}}{{sad}}{{fearful}}{{tired}}{{panicked}}{{angry}}{{disappointed}}{{embarrassed}}
Tone
{{serious}}{{sarcastic}}{{mischievous}}
Valence
{{positive}}{{neutral}}{{negative}}

Errors are JSON with a message field.

StatusWhen
400Bad body shape — missing text/context, invalid voice/tone, etc.
401Missing, invalid, revoked, or expired token.
402Balance below the mode's floor, or generated audio would exceed balance.
403API key used on a user-only endpoint (keys, billing).
409Idempotency key matches an already-settled call.
502Upstream pipeline or TTS failure. Safe to retry.