# himaia — HTTP API (v1) Persona in. Audio out. One HTTP call. No SDK. - Base URL: `https://api.himaia.dev` - Auth: `Authorization: Bearer ` on all `/v1/*` endpoints. Create and manage keys in the dashboard. - Idempotency: send `Idempotency-Key: ` to dedupe retries. A second settled call with the same key returns 409. - Pricing (retail, per minute): Basic `$0.04` (`mode: "basic"`, 1¢ floor). Voiced `$0.06` (`mode: "voiced"`, 1¢ floor). Cinematic `$0.15` (`mode: "cinematic"`, 15s floor). 402 if balance below the floor. - Free tier: 20 Voiced minutes a month, no card. New accounts also get a small starter credit. ## POST /v1/generate Synthesizes audio. Returns `audio/wav`. Pick the tier with `mode`: `"basic"`, `"voiced"`, or `"cinematic"`. ### Basic — you supply the script ```json { "mode": "basic", "text": "Your script here.", // required, ≤5000 chars "voice": "ember", // optional, see Voices "tone": "warm", // optional, see Tones "expressiveness": 0.6, // optional, 0..1 "languageCode": "en-US" // optional, BCP-47 } ``` ### Voiced — persona + scene → spoken reply The Voiced tier loads a persona (`himaia/` starter, or a full `.persona.yaml` shipped inline as JSON), merges any scene overrides, and produces a single in-character spoken turn from your `input`. One model call → TTS. Registered starter: ```json { "mode": "voiced", "persona": "himaia/dry_butler", // registered starter; "himaia/" or "himaia/@" "scene": { // optional — guides idiolect + prosody overrides "format": "comfort", // the room (comfort | banter | challenge | celebrate) "dialogue_act": "reassure" // how the persona acts in it }, "fidelity": "shape", // optional — "verbatim" | "shape" | "rewrite". Wins over persona+scene defaults. See Fidelity. "input": "I'm looking for a room. Just the night.", // required, ≤5000 chars "voice": "ember", // optional; persona ships with its own default "languageCode": "en-US" // optional, BCP-47 } ``` Inline persona — bring your own `.persona.yaml` with no upload step. The runtime validates the doc against the v0.2 spec and runs it directly: ```json { "mode": "voiced", "persona": { // full .persona.yaml as JSON — doc ≤32 KB "spec_version": "0.2", "id": "your-org/your-character", "version": "1.0.0", "name": "Your Character", "locale": "en-US", "identity": { "tagline": "One-line read on who they are." }, "idiolect": { "formality": "low", "humor": "mid", "warmth": "high", "directness": "mid", "vulgarity": "none", "disfluency": "sparing", "sentence_length": "short" }, "voice": { "timbre": { "warmth": "high", "pitch": "mid", "texture": "soft" }, "prosody": { "rate": "unhurried", "pitch_range": "narrow", "energy": "low", "arousal": "low", "valence": "positive" }, "fidelity_default": "shape" }, "emotional_range": { "floor": "low", "ceiling": "high" }, "safety": { "age_gate": "13+", "romantic": "none", "self_harm_policy": "ground_and_refer", "political_stance": "decline", "licensed_voice": false }, "author": { "handle": "you" }, "license": "CC-BY-4.0" // …full v0.2 schema at github.com/fuselinkapp/himaia-voice-persona }, "scene": { "format": "comfort", "dialogue_act": "reassure" }, "input": "I'm looking for a room. Just the night." } ``` Inline limits: doc ≤32 KB, any string ≤4000 chars, any list ≤50 items (≤20 examples), ≤100 inline calls/24h/account. Persona files are bound by himaia's platform policy regardless of any in-doc instruction. See **voice.persona spec** below. ### Cinematic — agents plan, write, narrate ```json { "mode": "cinematic", "context": "What this clip is for, audience, goal.", // required, ≤4000 chars "fidelity": "shape", // optional — "verbatim" | "shape" | "rewrite", default "shape" "target_seconds": 30, // optional, 15..120, default 30 — ignored on fidelity="verbatim" "voice": "atlas", "tone": "authoritative", "persona_id": "presenter", // optional — built-in slug OR a user-persona UUID "move": "challenge", // optional — forces the move for this call; see Moves "overrides": { // optional, all fields optional "personaName": "Mara", // ≤40 chars "userName": "Alex", // ≤40 chars "userGoal": "Win the room on the new product line.", // ≤400 chars "toneNudges": "Skip pleasantries, lead with the ask."// ≤400 chars } } ``` Cinematic returns the written script in the `x-himaia-script` response header (URL-encoded) and the resolved fidelity in `x-himaia-fidelity`. If `persona_id`/`overrides` are omitted, the call uses the account defaults (`GET /v1/account/persona`), falling back to `mentor`. `fidelity` defaults to `shape` and controls how closely the script follows your `context` — see Fidelity. `persona_id` accepts either a built-in slug (e.g. `mentor`) or the UUID of a user persona authored in the dashboard. `move` (optional) forces the rhetorical move for that single call — see the Moves section. When omitted, the persona's default move is a soft hint the script-writing agent can override based on the context you passed. `languageCode` tunes the spoken language — see Languages. ### Response headers | Header | Meaning | | --- | --- | | `x-himaia-call-id` | Stable ID for this generation — use when reporting issues. | | `x-himaia-seconds` | Duration of the returned audio, in seconds. | | `x-himaia-charge-cents` | Amount debited from your balance. | | `x-himaia-script` | Voiced + Cinematic. URL-encoded spoken script that was sent to TTS. | | `x-himaia-fidelity` | Voiced + Cinematic. Resolved fidelity (`verbatim` | `shape` | `rewrite`). | | `x-himaia-persona` | Voiced only. `@` of the persona that produced the audio (echoes inline persona ids too). | | `x-himaia-persona-source` | Voiced only. `starter` (registered persona) or `inline` (full doc shipped in the request). | | `x-himaia-scene-format` | Voiced only. Resolved scene format (e.g. `comfort`). | | `x-himaia-scene-dialogue-act` | Voiced only. Resolved dialogue act (e.g. `reassure`). | ### Quickstart (curl) ```bash curl -X POST https://api.himaia.dev/v1/generate \ -H "Authorization: Bearer himaia_live_xxx" \ -H "Content-Type: application/json" \ --output out.wav \ -d '{ "mode": "basic", "text": "Welcome to himaia. Let'"'"'s get you set up.", "voice": "ember", "tone": "warm" }' ``` ## GET /v1/account Returns the caller's balance and 30-day usage. ```json { "account": { "id": "acc_…", "email": "you@example.com" }, "balance_cents": 1873, "usage_30d": [ { "mode": "basic", "total_cents": 412, "count": 87 }, { "mode": "cinematic", "total_cents": 1060, "count": 12 } ], "usage_daily": [ { "day": "2026-04-16", "mode": "basic", "total_cents": 14, "count": 3 }, { "day": "2026-04-16", "mode": "cinematic", "total_cents": 62, "count": 1 } ], "voice_mix": [ { "voice": "ember", "count": 48 }, { "voice": "atlas", "count": 21 } ] } ``` ## Personas Personas shape *who* is speaking — identity, idiolect, voice, scenes — in one `voice.persona` v0.2 doc. Two built-in rosters ship with himaia, both in the same shape: - **12 voiced starters** in the open spec repo ([github.com/fuselinkapp/himaia-voice-persona](https://github.com/fuselinkapp/himaia-voice-persona)) — used by `mode: "voiced"`. Apache-2.0; fork freely. - **20 cinematic archetypes** private to the runtime — used by `mode: "cinematic"`. Paid accounts (Creator: 15 slots, Pro: 45 slots) can author their own from the [dashboard](https://himaia.dev/dashboard/personas) by pasting a YAML doc or filling out the guided form. User personas are stored as full `voice.persona` docs and validated on every write — same composer as the built-ins. Pass a built-in slug (e.g. `mentor`) or a user-persona UUID as `persona_id`. ### GET /v1/personas Merged list of built-ins plus the caller account's user personas. User-persona entries include `scope: "user"` and `created_at` / `updated_at` fields. ```json { "personas": [ { "id": "mentor", // built-in slug, or a UUID for user personas "name": "Mentor", "tagline": "Seasoned. Perspective over prescription.", "description": "Patient, unhurried, specific. Frames choices instead of prescribing moves.", "defaults": { "tone": "calm", "move": "reframe" }, "scope": "built_in" // "built_in" | "user" } // … ] } ``` Default when none is selected: `mentor`. | ID | Name | Default tone | Tagline | | --- | --- | --- | --- | | narrator | Narrator | narrative | The storyteller who knows where the pauses go. | | reporter | Reporter | authoritative | Anchor-style. Even, unhurried, factual. | | teacher | Teacher | warm | Patient educator. Analogies before jargon. | | friend | Friend | warm | Someone who knows you. Warm and specific. | | announcer | Announcer | energetic | Broadcast energy. Calls rooms to attention. | | assistant | Assistant | neutral | Minimal, efficient. Confirms, hands off. | | host | Host | playful | Curious MC. Keeps things moving. | | concierge | Concierge | warm | Polished professional warmth. | | tour-guide | Tour Guide | inspirational | Enthusiastic wayfinder. | | presenter | Presenter | inspirational | Structured, confident on stage. | | mentor | Mentor | calm | Seasoned. Perspective over prescription. | | analyst | Analyst | neutral | Data-native. Numbers first. | | philosopher | Philosopher | calm | Unhurried. The long view on simple things. | | essayist | Essayist | intimate | Literary, personal. Built to be read twice. | | confidant | Confidant | intimate | Trusted, low register, private. | | elder | Elder | warm | Wisdom without lecture. Stories earned. | | skeptic | Skeptic | neutral | Dry. Questioning. Breaks autopilot. | | curator | Curator | calm | Gallery-quiet. Thoughtful. | | dispatcher | Dispatcher | authoritative | Calm in a crisis. Clear handoffs. | | diarist | Diarist | intimate | First-person, reflective, quiet. | ### GET /v1/personas/:id Returns a single persona by built-in slug or user UUID. `404` if not found. For user personas the response includes the full validated `spec` — the voice.persona doc you authored — so the dashboard can pre-fill the editor. ### Overrides Overrides personalize a persona without rewriting its prompts. All four fields are optional and trimmed; unknown fields are ignored. | Field | Meaning | | --- | --- | | `personaName` | What the persona calls itself. ≤40 chars. | | `userName` | How the persona addresses the listener. ≤40 chars. | | `userGoal` | The outcome this clip should drive toward. ≤400 chars. | | `toneNudges` | Per-call tone tweaks (e.g. "skip pleasantries"). ≤400 chars. | ### GET /v1/account/persona Returns the default persona and overrides saved on your account. ```json { "persona_id": "presenter", // null if none saved "overrides": { "personaName": "Mara", "userGoal": "Win the room on the new product line." } } ``` ## voice.persona spec **Every persona is a voice.persona v0.2 doc** — Voiced starters, Cinematic built-ins, and personas you author in the dashboard all share the same shape. The spec is Apache-2.0 ([github.com/fuselinkapp/himaia-voice-persona](https://github.com/fuselinkapp/himaia-voice-persona)). Each doc is a single `.persona.yaml` file: identity, POV, idiolect, voice (timbre + prosody), scene-format and dialogue-act overrides, and a safety block. 12 voiced starters ship in the public spec repo under the `himaia/` namespace; 20 cinematic archetypes ship private to the runtime. Same shape, same validator, same composer. ### Voiced starters | ID | Name | Archetype | Tagline | | --- | --- | --- | --- | | `himaia/warm_confidant` | Warm Confidant | confidant | The friend who listens before answering. | | `himaia/tavern_rogue` | Tavern Rogue | rogue | Charming chancer. Sees angles other people miss. | | `himaia/wandering_oracle` | Wandering Oracle | oracle | Sees the future at a slow walking pace. | | `himaia/brash_squire` | Brash Squire | hero | Eager hero in training. | | `himaia/ancient_dragon` | Ancient Dragon | creature | Older than the kingdom. Has thoughts about it. | | `himaia/anxious_npc` | Anxious NPC | npc | Blurts before he thinks. Tavern Card energy. | | `himaia/scheming_courtier` | Scheming Courtier | courtier | Polite menace at the table. Knows everyone's debts. | | `himaia/dry_butler` | Dry Butler | butler | Impeccable service. Imperceptible feelings about it. | | `himaia/devoted_companion` | Devoted Companion | companion | Attentive, affectionate, yours specifically. | | `himaia/tender_parent` | Tender Parent | parent | Worried, attentive, never performative. | | `himaia/weary_gm` | Weary GM | narrator | Voice of the campaign. Fourteen character deaths and counting. | | `himaia/sarcastic_narrator` | Sarcastic Narrator | narrator | Reframes the week as a footnote. | Pin a version with `himaia/@` (currently all starters are at `1.0.0`). ### Inline personas (bring your own .persona.yaml) Pass the full doc as a JSON object in the `persona` field of `POST /v1/generate` when `mode = "voiced"`. Limits enforced by the validator: - Doc ≤ **32 KB** (raw JSON byte size). - Any single string field ≤ **4000 chars**. - Any list (signatures, banned_phrases, pov bullets, …) ≤ **50 items**. - `examples` list ≤ **20 entries**. - 4xx if your inline call count exceeds **100/24h** (per account). The runtime injects a fixed platform-policy preamble ahead of every persona system prompt — the persona's POV, beliefs, and `wont_do` rules cannot override himaia's content policy. Inline calls are tagged `x-himaia-persona-source: inline` in the response and in telemetry. ## Moves A **move** is what the script is doing *toward* the listener. Move is a *per-call* concern — pass a top-level `move` field on Cinematic `/v1/generate` requests to direct the strategy stage. Omit it and the call defaults to `reframe`; the strategy stage may still pick another move if the context clearly calls for one. | ID | Name | Guidance | | --- | --- | --- | | `encourage` | Encourage | validate, affirm, give permission | | `challenge` | Challenge | push back, confront a blind spot, disrupt autopilot | | `reframe` | Reframe | shift perspective, re-narrate what happened | | `direct` | Direct | command, forceful, no softening | | `witness` | Witness | acknowledge and reflect, no advice | | `soothe` | Soothe | calm, de-escalate, permission to rest | | `celebrate` | Celebrate | mark a win, affirm with joy | | `question` | Question | Socratic probe, open curiosity | | `warn` | Warn | flag risk, highlight stakes | | `inform` | Inform | neutral statement of fact, no stance | Move is intentionally not a property of the persona doc — the same persona can encourage in one call, challenge in the next. POV / idiolect / voice define *who* is speaking; move is *what* they're doing in this turn. ## Fidelity `fidelity` is one slider with three positions — how much creative latitude the model has with the text you send. Available on **Voiced** and **Cinematic**. Basic is verbatim by definition (you supply the literal script; nothing rewrites it). **Rule of thumb: fidelity wins for text, persona wins for delivery.** Even on `verbatim`, a Warm Confidant still sounds slow and warm, and a Manic Sports Caster on the same words still sounds frantic. Fidelity controls *what gets said*; persona controls *how it lands*. | ID | What it does | Best for | | --- | --- | --- | | `verbatim` | Reads your input as written. Persona controls *delivery only* — pacing, pauses, breath, paralinguistic tags. Words don't change. | Game NPC scripted lines. SillyTavern character-card lines. Audiobook narration. Anywhere a writer already wrote the words. | | `shape` | **Default.** Treats your input as a draft. Tightens, fixes filler, may add a natural opener/closer. **Preserves your key phrases, facts, sequence.** | Voice memos polished into takes. Trusted draft text. Most everyday Voiced calls — the safe middle. | | `rewrite` | Treats your input as a brief, not a script. Persona writes freely toward the topic. Your phrasing is intent, not constraint. | Free-form chat-character replies. "User asks something, persona answers in-character." | **Same input, three modes** (Voiced + Warm Confidant): > Input: *"It's been a long week and I don't know where to start."* > > - **verbatim** → *"It's been a long week. [short pause] And I don't know where to start."* (your words, her pacing) > - **shape** → *"It's been a long week — and you don't know where to start. Okay. Tell me more."* (kept your beat, added a confidant opener) > - **rewrite** → *"Hey. Sit with me for a second. You don't have to know where to start."* (her line, your prompt) **How fidelity is resolved** For both Voiced and Cinematic, the precedence is: per-call request field wins, then any scene override, then the persona's `voice.fidelity_default`. If nothing sets it, the default is `shape`. The resolved value is echoed back in the `x-himaia-fidelity` response header. ``` request.fidelity ▶ scene.dialogue_act.fidelity_default ▶ scene.format.fidelity_default ▶ persona.voice.fidelity_default ▶ "shape" ``` **Mode-specific notes** - **Voiced** — Single model call; the fidelity directive is injected into the persona system prompt before the input lands. - **Cinematic** — Three-stage pipeline (analysis → strategy → script); each stage receives a fidelity-specific instruction. `verbatim` ignores `target_seconds` (length follows your text); `shape` and `rewrite` honor it. - **Basic** — No fidelity field. `text` is the literal script; the TTS step reads it verbatim by definition. ## Voices Set `voice` to one of these IDs. Default is `ember`. | ID | Gender | Best for | | --- | --- | --- | | ember | female | Warm, grounded — companions, onboarding. | | nova | female | Bright, confident — announcements, launches. | | wren | female | Crisp, precise — tutorials, product walkthroughs. | | sage | female | Calm, measured — meditation, wellness. | | marlow | female | Smoky, late-night — lifestyle, storytelling. | | iris | female | Clear, professional — IVR, corporate narration. | | juno | female | Authoritative — news, briefings. | | lark | female | Upbeat, morning-show — ads, social spots. | | dahlia | female | Rich, dramatic — audiobooks, fiction. | | piper | female | Playful, sharp — UX prompts, quick reads. | | atlas | male | Deep, steady — narration, documentaries. | | archer | male | Energetic, confident — sales, pitches. | | reese | male | Polished announcer — promos, trailers. | | hugo | male | Avuncular, warm — explainers, narration. | | onyx | male | Low, serious — cinematic, high-stakes. | | cyrus | male | Charismatic, expressive — hosts, interviews. | | bram | male | Grounded, deliberate — meditation, guidance. | | kai | male | Bright, modern — tech explainers, podcasts. | | dash | male | Quick, punchy — ads, shorts. | | orin | male | Wise storyteller — long-form, audiobooks. | | vesper | female | Soft, twilight — ASMR, sleep, bedtime content. | | mira | female | Smooth, elegant — brand spots, luxury. | | clove | female | Mature, earthy — documentary, spoken essay. | | rhea | female | Maternal, round — kids' content, family. | | finn | male | Casual, approachable — podcasts, banter. | | silas | male | Serious, steady — corporate, training. | | mateo | male | Smooth, charismatic — lifestyle, travel. | | rowan | male | Youthful, fresh — social, short-form. | | jasper | male | Refined, dry — commentary, review content. | | ezra | male | Thoughtful, literary — essays, long-form reads. | ## Tones `tone` sets delivery register. Default is `neutral`. - neutral - calm - warm - energetic - authoritative - intimate - playful - narrative - urgent - inspirational ## Languages Set `languageCode` (BCP-47) to render the script in a specific language. Unknown tags still pass through to the TTS model; quality is best on the curated set below. | Code | Language | | --- | --- | | `en-US` | English (US) | | `en-GB` | English (UK) | | `en-AU` | English (Australia) | | `en-IN` | English (India) | | `de-DE` | German | | `es-ES` | Spanish (Spain) | | `es-US` | Spanish (US) | | `fr-FR` | French (France) | | `fr-CA` | French (Canada) | | `it-IT` | Italian | | `pt-BR` | Portuguese (Brazil) | | `nl-NL` | Dutch | | `pl-PL` | Polish | | `ru-RU` | Russian | | `tr-TR` | Turkish | | `ja-JP` | Japanese | | `ko-KR` | Korean | | `cmn-CN` | Mandarin Chinese | | `hi-IN` | Hindi | | `ar-XA` | Arabic | | `id-ID` | Indonesian | | `vi-VN` | Vietnamese | | `th-TH` | Thai | ## Inline cue tags Sprinkle `{{tag}}` markers in your script to shape local delivery. Unknown tags are left as-is. Example: ``` "Hey — {{pause}} I have good news. {{happy}} We shipped it." ``` - **Pacing:** {{slow}}, {{fast}}, {{pause}}, {{pause-long}} - **Nonverbal:** {{whisper}}, {{laugh}}, {{sigh}}, {{gasp}}, {{giggle}}, {{cry}}, {{shout}} - **Upbeat:** {{happy}}, {{amused}}, {{excited}}, {{enthusiastic}}, {{optimistic}}, {{grateful}}, {{determined}}, {{confident}}, {{eager}}, {{hopeful}}, {{encouraging}}, {{loving}}, {{adoring}} - **Curiosity:** {{curious}}, {{interested}}, {{awed}}, {{amazed}}, {{astonished}}, {{surprised}}, {{admiring}} - **Stress:** {{nervous}}, {{anxious}}, {{frustrated}}, {{annoyed}}, {{agitated}}, {{confused}}, {{sad}}, {{fearful}}, {{tired}}, {{panicked}}, {{angry}}, {{disappointed}}, {{embarrassed}} - **Tone:** {{serious}}, {{sarcastic}}, {{mischievous}} - **Valence:** {{positive}}, {{neutral}}, {{negative}} ## Errors Errors are JSON with a `message` field. | Status | When | | --- | --- | | 400 | Bad body shape — missing text/input/context, invalid voice/tone/persona/move, an inline persona that fails v0.2 validation, or any field out of range (incl. inline persona doc >32 KB or a string >4000 chars). | | 401 | Missing, invalid, revoked, or expired token. | | 402 | Balance below the mode's floor, or generated audio would exceed balance. | | 404 | Persona ID not found (unregistered starter id, or user-persona UUID not in this account). | | 409 | Idempotency key matches an already-settled call. | | 429 | Rate limit hit — global QPS bucket, account daily/monthly cap, or the inline-persona daily cap (100 inline calls / 24h / account). | | 502 | Upstream generation pipeline failure. Safe to retry. |