API reference · v1

Give any app
a voice.

A portable persona spec, a one-call API, audio back in milliseconds. Pick a persona, set a tone, drop in cues — whispers, pauses, a laugh — and get a directed performance back as WAV bytes. JSON in. WAV out.

or view /llms.txt

Paste into Cursor, Claude, or ChatGPT — have it write the integration.

Create an API key from the dashboard (keys look like himaia_live_…), then call POST /v1/generate:

curl -X POST https://api.himaia.dev/v1/generate \
  -H "Authorization: Bearer himaia_live_xxx" \
  -H "Content-Type: application/json" \
  --output out.wav \
  -d '{
    "mode": "basic",
    "text": "Welcome to himaia. Let'"'"'s get you set up.",
    "voice": "ember",
    "tone": "warm"
  }'

Free tier: 20 Voiced minutes a month. No card. New accounts also get a small starter credit so a Basic call works on first sign-in.

All /v1/* endpoints require a bearer API key:

Authorization: Bearer himaia_live_…

Create and revoke keys on the keys page. The secret is shown once at creation — copy it immediately; we only store a hash.

Three tiers, one endpoint. Retail rates below; plan tiers (Free, Creator, Pro, Scale) are on the pricing page.

Basic
$0.04
per minute · 1¢ min charge

Plain text in, audio out. Scripted lines, narration. mode: "basic".

Voiced
$0.06
per minute · 1¢ min charge

Characters in real time. One model call → audio. mode: "voiced".

Cinematic
$0.15
per minute · 15s min

Trailers, audio drama, long-form. Three-stage pipeline. mode: "cinematic".

Server rejects with 402 if below the floor. Add credits on the billing page.

Synthesizes audio. Returns an audio/wav response. Pick the tier with mode: "basic", "voiced", or "cinematic".

Basic — bring your own script

{
  "mode": "basic",
  "text": "Your script here.",     // required, ≤5000 chars
  "voice": "ember",                 // optional, see Voices
  "tone": "warm",                   // optional, see Tones
  "expressiveness": 0.6,            // optional, 0..1
  "languageCode": "en-US"           // optional, BCP-47
}

Voiced — persona + scene → spoken reply

The Voiced tier loads a persona, merges any scene overrides, and produces a single in-character spoken turn from your input. One model call → TTS. Pass persona as a registered starter id (string) or as a full .persona.yaml doc inline (object).

Registered starter:

{
  "mode": "voiced",
  "persona": "himaia/dry_butler",          // registered starter; "himaia/<slug>" or "himaia/<slug>@<version>"
  "scene": {                                // optional — guides idiolect + prosody overrides
    "format": "comfort",                    //   the room (comfort | banter | challenge | celebrate)
    "dialogue_act": "reassure"              //   how the persona acts in it
  },
  "fidelity": "shape",                      // optional — "verbatim" | "shape" | "rewrite". Wins over persona+scene defaults. See Fidelity.
  "input": "I'm looking for a room. Just the night.",  // required, ≤5000 chars
  "voice": "ember",                         // optional; persona ships with its own default
  "languageCode": "en-US"                   // optional, BCP-47
}

Inline persona — bring your own .persona.yaml with no upload step. The runtime validates the doc against the v0.2 spec and runs it directly:

{
  "mode": "voiced",
  "persona": {                              // full .persona.yaml as JSON — doc ≤32 KB
    "spec_version": "0.2",
    "id": "your-org/your-character",
    "version": "1.0.0",
    "name": "Your Character",
    "locale": "en-US",
    "identity": { "tagline": "One-line read on who they are." },
    "idiolect": {
      "formality": "low", "humor": "mid", "warmth": "high",
      "directness": "mid", "vulgarity": "none",
      "disfluency": "sparing", "sentence_length": "short"
    },
    "voice": {
      "timbre":   { "warmth": "high", "pitch": "mid", "texture": "soft" },
      "prosody":  { "rate": "unhurried", "pitch_range": "narrow",
                    "energy": "low", "arousal": "low", "valence": "positive" },
      "fidelity_default": "shape"
    },
    "emotional_range": { "floor": "low", "ceiling": "high" },
    "safety": {
      "age_gate": "13+", "romantic": "none",
      "self_harm_policy": "ground_and_refer",
      "political_stance": "decline", "licensed_voice": false
    },
    "author":  { "handle": "you" },
    "license": "CC-BY-4.0"
    // …full v0.2 schema at github.com/fuselinkapp/himaia-voice-persona
  },
  "scene": { "format": "comfort", "dialogue_act": "reassure" },
  "input": "I'm looking for a room. Just the night."
}

Inline limits: doc ≤32 KB, any string ≤4000 chars, any list ≤50 items (≤20 examples), ≤100 inline calls / 24h / account. Persona files are bound by himaia's platform policy regardless of any in-doc instruction. See the voice.persona spec section for the full schema and the starter roster.

Cinematic — agents plan, write, narrate

{
  "mode": "cinematic",
  "context": "What this clip is for, audience, goal.",  // required, ≤4000 chars
  "fidelity": "shape",                                   // optional — "verbatim" | "shape" | "rewrite", default "shape"
  "target_seconds": 30,                                  // optional, 15..120, default 30 — ignored on fidelity="verbatim"
  "voice": "atlas",
  "tone": "authoritative",
  "persona_id": "presenter",                             // optional — built-in slug OR a user-persona UUID
  "move": "challenge",                                   // optional — forces the move for this call; see Moves
  "overrides": {                                         // optional, all fields optional
    "personaName": "Mara",                               //   ≤40 chars
    "userName": "Alex",                                  //   ≤40 chars
    "userGoal": "Win the room on the new product line.", //   ≤400 chars
    "toneNudges": "Skip pleasantries, lead with the ask."//   ≤400 chars
  }
}

Cinematic returns the generated script in the x-himaia-script header (URL-encoded). Omit persona_id and overrides to use the defaults saved on your account (GET /v1/account/persona). Omit those too and the call falls back to mentor. persona_id accepts a built-in slug or a user-persona UUID. move (optional) forces the rhetorical move for this call only. languageCode selects the spoken language.

Response headers

HeaderMeaning
x-himaia-call-idStable ID for this generation — use when reporting issues.
x-himaia-secondsDuration of the returned audio, in seconds.
x-himaia-charge-centsAmount debited from your balance.
x-himaia-scriptVoiced + Cinematic. URL-encoded spoken script that was sent to TTS.
x-himaia-fidelityVoiced + Cinematic. Resolved fidelity (`verbatim` | `shape` | `rewrite`).
x-himaia-personaVoiced only. `<id>@<version>` of the persona that produced the audio (echoes inline persona ids too).
x-himaia-persona-sourceVoiced only. `starter` (registered persona) or `inline` (full doc shipped in the request).
x-himaia-scene-formatVoiced only. Resolved scene format (e.g. `comfort`).
x-himaia-scene-dialogue-actVoiced only. Resolved dialogue act (e.g. `reassure`).

Idempotency

Pass an Idempotency-Key header to prevent duplicate charges on retried requests. A second request with the same key on an already-settled call returns 409.

Returns the caller's balance and 30-day usage.

{
  "account": { "id": "acc_…", "email": "you@example.com" },
  "balance_cents": 1873,
  "usage_30d": [
    { "mode": "basic",     "total_cents": 412,  "count": 87 },
    { "mode": "cinematic", "total_cents": 1060, "count": 12 }
  ],
  "usage_daily": [
    { "day": "2026-04-16", "mode": "basic",     "total_cents": 14, "count": 3 },
    { "day": "2026-04-16", "mode": "cinematic", "total_cents": 62, "count": 1 }
  ],
  "voice_mix": [
    { "voice": "ember", "count": 48 },
    { "voice": "atlas", "count": 21 }
  ]
}

Personas shape who is speaking — identity, idiolect, voice, scenes — in one voice.persona v0.2 doc. 20 archetypes ship with himaia. Paid accounts can author their own from the dashboard (Creator: 15 slots, Pro: 45). Pass a built-in slug (e.g. mentor) or a user-persona UUID as persona_id.

User personas are stored as full voice.persona docs. Author them by pasting YAML into the dashboard or filling out the guided form. The open spec lives at fuselinkapp/himaia-voice-persona.

GET /v1/personas

Merged list of built-ins plus any personas your account has authored. User personas include scope: "user" with created_at/updated_at.

{
  "personas": [
    {
      "id": "mentor",                                       // built-in slug, or a UUID for user personas
      "name": "Mentor",
      "tagline": "Seasoned. Perspective over prescription.",
      "description": "Patient, unhurried, specific. Frames choices instead of prescribing moves.",
      "defaults": { "tone": "calm", "move": "reframe" },
      "scope": "built_in"                                   // "built_in" | "user"
    }
    // …
  ]
}

GET /v1/personas/:id

Returns a single persona by built-in slug or user UUID. 404 if not found. For user personas the response includes the full validated spec — the voice.persona doc you authored — so the dashboard can pre-fill the editor.

Available built-ins

Default when no persona is selected: mentor.

IDNameDefault toneTagline
narratorNarratornarrativeThe storyteller who knows where the pauses go.
reporterReporterauthoritativeAnchor-style. Even, unhurried, factual.
teacherTeacherwarmPatient educator. Analogies before jargon.
friendFriendwarmSomeone who knows you. Warm and specific.
announcerAnnouncerenergeticBroadcast energy. Calls rooms to attention.
assistantAssistantneutralMinimal, efficient. Confirms, hands off.
hostHostplayfulCurious MC. Keeps things moving.
conciergeConciergewarmPolished professional warmth.
tour-guideTour GuideinspirationalEnthusiastic wayfinder.
presenterPresenterinspirationalStructured, confident on stage.
mentorMentorcalmSeasoned. Perspective over prescription.
analystAnalystneutralData-native. Numbers first.
philosopherPhilosophercalmUnhurried. The long view on simple things.
essayistEssayistintimateLiterary, personal. Built to be read twice.
confidantConfidantintimateTrusted, low register, private.
elderElderwarmWisdom without lecture. Stories earned.
skepticSkepticneutralDry. Questioning. Breaks autopilot.
curatorCuratorcalmGallery-quiet. Thoughtful.
dispatcherDispatcherauthoritativeCalm in a crisis. Clear handoffs.
diaristDiaristintimateFirst-person, reflective, quiet.

Overrides

Overrides personalize a persona without rewriting its prompts. All four fields are optional and trimmed; unknown fields are ignored.

HeaderMeaning
personaNameWhat the persona calls itself. ≤40 chars.
userNameHow the persona addresses the listener. ≤40 chars.
userGoalThe outcome this clip should drive toward. ≤400 chars.
toneNudgesPer-call tone tweaks (e.g. "skip pleasantries"). ≤400 chars.

GET /v1/account/persona

Returns the default persona and overrides saved on your account. Cinematic requests that omit persona_id/overrides use these saved values. Defaults are managed from the dashboard.

{
  "persona_id": "presenter",            // null if none saved
  "overrides": {
    "personaName": "Mara",
    "userGoal": "Win the room on the new product line."
  }
}

Every persona is a voice.persona v0.2 doc — Voiced starters, Cinematic built-ins, and personas you author in the dashboard all share the same shape. The spec is Apache-2.0 at github.com/fuselinkapp/himaia-voice-persona. Each doc is a single .persona.yaml file: identity, POV, idiolect, voice (timbre + prosody), scene-format and dialogue-act overrides, and a safety block. 12 voiced starters ship in the public repo and 20 cinematic archetypes ship private to the runtime — same shape, same validator, same composer.

Voiced starters

Pin a version with himaia/<slug>@<version> (currently all starters are at 1.0.0).

IDNameArchetypeTagline
himaia/warm_confidantWarm ConfidantconfidantThe friend who listens before answering.
himaia/tavern_rogueTavern RoguerogueCharming chancer. Sees angles other people miss.
himaia/wandering_oracleWandering OracleoracleSees the future at a slow walking pace.
himaia/brash_squireBrash SquireheroEager hero in training.
himaia/ancient_dragonAncient DragoncreatureOlder than the kingdom. Has thoughts about it.
himaia/anxious_npcAnxious NPCnpcBlurts before he thinks. Tavern Card energy.
himaia/scheming_courtierScheming CourtiercourtierPolite menace at the table. Knows everyone's debts.
himaia/dry_butlerDry ButlerbutlerImpeccable service. Imperceptible feelings about it.
himaia/devoted_companionDevoted CompanioncompanionAttentive, affectionate, yours specifically.
himaia/tender_parentTender ParentparentWorried, attentive, never performative.
himaia/weary_gmWeary GMnarratorVoice of the campaign. Fourteen character deaths and counting.
himaia/sarcastic_narratorSarcastic NarratornarratorReframes the week as a footnote.

Inline personas (bring your own .persona.yaml)

Pass the full doc as a JSON object in the persona field of POST /v1/generate when mode = "voiced". The runtime validates the doc against the v0.2 spec and runs it directly — no upload, no account state. Limits enforced by the validator:

  • Doc ≤ 32 KB (raw JSON byte size).
  • Any single string field ≤ 4000 chars.
  • Any list (signatures, banned_phrases, pov bullets…) ≤ 50 items.
  • examples list ≤ 20 entries.
  • Inline calls capped at 100 / 24h / account429 beyond.

The runtime injects a fixed platform-policy preamble ahead of every persona system prompt — the persona's POV, beliefs, and wont_do rules cannot override himaia's content policy. Inline calls are tagged x-himaia-persona-source: inline in the response and in telemetry.

A move is what the script is doing toward the listener. Move is a per-call concern — pass a top-level move on a Cinematic /v1/generate body to direct the strategy stage. Omit it and the call defaults to reframe; the strategy stage may still pick another move if the context clearly calls for one.

Move is intentionally not a property of the persona doc — the same persona can encourage in one call, challenge in the next. POV / idiolect / voice define who is speaking; move is what they're doing in this turn.

IDNameGuidance
encourageEncouragevalidate, affirm, give permission
challengeChallengepush back, confront a blind spot, disrupt autopilot
reframeReframeshift perspective, re-narrate what happened
directDirectcommand, forceful, no softening
witnessWitnessacknowledge and reflect, no advice
sootheSoothecalm, de-escalate, permission to rest
celebrateCelebratemark a win, affirm with joy
questionQuestionSocratic probe, open curiosity
warnWarnflag risk, highlight stakes
informInformneutral statement of fact, no stance

Fidelity is one slider with three positions — how much creative latitude the model has with the text you send. Available on Voiced and Cinematic. Basic is verbatim by definition (you supply the literal script; nothing rewrites it).

Rule of thumb: fidelity wins for text, persona wins for delivery. Even on verbatim, a Warm Confidant still sounds slow and warm; a Manic Sports Caster on the same words still sounds frantic. Fidelity controls what gets said; persona controls how it lands.

IDWhat it doesBest for
verbatimReads your input as written. Persona controls delivery only — pacing, pauses, breath, paralinguistic tags. Words don't change.Game NPC scripted lines. SillyTavern character-card lines. Audiobook narration. Anywhere a writer already wrote the words.
shapeDefault. Treats your input as a draft. Tightens, fixes filler, may add a natural opener/closer. Preserves your key phrases, facts, sequence.Voice memos polished into takes. Trusted draft text. Most everyday Voiced calls — the safe middle.
rewriteTreats your input as a brief, not a script. Persona writes freely toward the topic. Your phrasing is intent, not constraint.Free-form chat-character replies. "User asks something, persona answers in-character."

Same input, three modes

Voiced + Warm Confidant:

Input: "It's been a long week and I don't know where to start."
  • verbatim"It's been a long week. [short pause] And I don't know where to start." (your words, her pacing)
  • shape"It's been a long week — and you don't know where to start. Okay. Tell me more." (kept your beat, added a confidant opener)
  • rewrite"Hey. Sit with me for a second. You don't have to know where to start." (her line, your prompt)

How fidelity is resolved

For both Voiced and Cinematic, precedence is: per-call request field wins, then any scene override, then the persona's voice.fidelity_default. If nothing sets it, the default is shape. The resolved value is echoed back in the x-himaia-fidelity response header.

request.fidelity   ▶  scene.dialogue_act.fidelity_default
                   ▶  scene.format.fidelity_default
                   ▶  persona.voice.fidelity_default
                   ▶  "shape"

Mode-specific notes

  • Voiced — Single model call; the fidelity directive is injected into the persona system prompt before the input lands.
  • Cinematic — Three-stage pipeline (analysis → strategy → script); each stage receives a fidelity-specific instruction. verbatim ignores target_seconds (length follows your text); shape and rewrite honor it.
  • Basic — No fidelity field. text is the literal script; the TTS step reads it verbatim by definition.

Set voice to one of the IDs below. Default is ember. Hit play to hear each voice on the same neutral line.

IDGenderBest for
emberfemaleWarm, grounded — companions, onboarding.
novafemaleBright, confident — announcements, launches.
wrenfemaleCrisp, precise — tutorials, product walkthroughs.
sagefemaleCalm, measured — meditation, wellness.
marlowfemaleSmoky, late-night — lifestyle, storytelling.
irisfemaleClear, professional — IVR, corporate narration.
junofemaleAuthoritative — news, briefings.
larkfemaleUpbeat, morning-show — ads, social spots.
dahliafemaleRich, dramatic — audiobooks, fiction.
piperfemalePlayful, sharp — UX prompts, quick reads.
atlasmaleDeep, steady — narration, documentaries.
archermaleEnergetic, confident — sales, pitches.
reesemalePolished announcer — promos, trailers.
hugomaleAvuncular, warm — explainers, narration.
onyxmaleLow, serious — cinematic, high-stakes.
cyrusmaleCharismatic, expressive — hosts, interviews.
brammaleGrounded, deliberate — meditation, guidance.
kaimaleBright, modern — tech explainers, podcasts.
dashmaleQuick, punchy — ads, shorts.
orinmaleWise storyteller — long-form, audiobooks.
vesperfemaleSoft, twilight — ASMR, sleep, bedtime content.
mirafemaleSmooth, elegant — brand spots, luxury.
clovefemaleMature, earthy — documentary, spoken essay.
rheafemaleMaternal, round — kids' content, family.
finnmaleCasual, approachable — podcasts, banter.
silasmaleSerious, steady — corporate, training.
mateomaleSmooth, charismatic — lifestyle, travel.
rowanmaleYouthful, fresh — social, short-form.
jaspermaleRefined, dry — commentary, review content.
ezramaleThoughtful, literary — essays, long-form reads.

tone sets delivery register. Default is neutral.

neutralcalmwarmenergeticauthoritativeintimateplayfulnarrativeurgentinspirational

Set languageCode (BCP-47) to render the script in a specific language. Unknown tags still pass through to the TTS model; quality is best on the curated set below.

CodeLanguage
en-USEnglish (US)
en-GBEnglish (UK)
en-AUEnglish (Australia)
en-INEnglish (India)
de-DEGerman
es-ESSpanish (Spain)
es-USSpanish (US)
fr-FRFrench (France)
fr-CAFrench (Canada)
it-ITItalian
pt-BRPortuguese (Brazil)
nl-NLDutch
pl-PLPolish
ru-RURussian
tr-TRTurkish
ja-JPJapanese
ko-KRKorean
cmn-CNMandarin Chinese
hi-INHindi
ar-XAArabic
id-IDIndonesian
vi-VNVietnamese
th-THThai

Sprinkle {{tag}} markers in your script to shape local delivery. Unknown tags are left as-is.

"Hey — {{pause}} I have good news. {{happy}} We shipped it."
Pacing
{{slow}}{{fast}}{{pause}}{{pause-long}}
Nonverbal
{{whisper}}{{laugh}}{{sigh}}{{gasp}}{{giggle}}{{cry}}{{shout}}
Upbeat
{{happy}}{{amused}}{{excited}}{{enthusiastic}}{{optimistic}}{{grateful}}{{determined}}{{confident}}{{eager}}{{hopeful}}{{encouraging}}{{loving}}{{adoring}}
Curiosity
{{curious}}{{interested}}{{awed}}{{amazed}}{{astonished}}{{surprised}}{{admiring}}
Stress
{{nervous}}{{anxious}}{{frustrated}}{{annoyed}}{{agitated}}{{confused}}{{sad}}{{fearful}}{{tired}}{{panicked}}{{angry}}{{disappointed}}{{embarrassed}}
Tone
{{serious}}{{sarcastic}}{{mischievous}}
Valence
{{positive}}{{neutral}}{{negative}}

Errors are JSON with a message field.

StatusWhen
400Bad body shape — missing text/context, invalid voice/tone/persona/move, or a field out of range.
401Missing, invalid, revoked, or expired token.
402Balance below the mode's floor, or generated audio would exceed balance.
404Persona ID not found for this account.
409Idempotency key matches an already-settled call.
502Upstream generation pipeline failure. Safe to retry.