Voice infrastructure

Give your characters a
voice that stays in character.

One file defines who they are. One API call makes them talk.
For the people shipping companions, NPCs, and chat characters.

Open persona spec. No SDK lock-in. Pin it like a dependency.
Same line · three personas

One inn. Three rooms.

Three .persona.yaml files. Same input. Three completely different replies.

Player
I'm looking for a room. Just the night.
scene · comfort · reassure

▶ Press play on any row.Same API call — only the persona slug changes.

  1. 01
    Dry Butler

    The proprietor. Polite to the point of weaponization.

    persona · himaia/dry_butler
  2. 02
    Tavern Rogue

    The chancer at the bar. Sees angles other people miss.

    persona · himaia/tavern_rogue
  3. 03
    Weary GM

    The narrator. Fourteen character deaths and counting.

    persona · himaia/weary_gm
0
Starter personas
open spec, forkable
0
Voices
across the roster
0
Fidelity modes
verbatim · shape · rewrite
0
Languages
out of the box
The problem

Most voice APIs hand you a sound.
They don't hand you a character.

01

Same voice, every scene.

A character whispering at 2am sounds like the one shouting at noon. Timbre is not a performance.

02

The persona slips.

Around turn fifteen the character forgets who they are. A nice voice doesn't fix this.

03

You glue the pipeline.

LLM, prompt, emotion tag, TTS, streaming. Every project, from scratch, badly.

What we ship
What we ship

Persona is the unit.
Voice is the last hop.

01
One persona. Every scene.

Same identity, different room — comfort, banter, challenge — without losing who they are.

Format × dialogue act swaps without persona drift.
02
One file. One API call.

A .persona.yaml you commit to git. A POST /v1/generate that returns audio.

Three modes, one endpoint. JSON in, WAV out.
03
A spec you own.

voice.persona is Apache-2.0. Pin it like a dependency. Fork it like one.

Open spec, closed runtime. The Stripe-shaped arc.

A character
who stays themselves.

The whole bet, one line
Show

One call.
WAV back.

Persona slug. Scene. Input.
That's the whole API surface for a character that stays in character.

Requestvoiced mode
POST /v1/generate
Authorization: Bearer himaia_live_***
Content-Type: application/json

{
  "mode": "voiced",
  "persona": "himaia/warm_confidant",
  "scene": "comfort",
  "input": "You don't have to fix it tonight."
}

→ 200 OK   audio/wav
  x-himaia-seconds: 3.8
  x-himaia-charge-cents: 1
Voices

Thirty voices. Named, pre-tuned.
Same persona, any voice. Same voice, any persona.

Hear them all
Who it's for

Pick the path
that fits.

Modders
Tavern Card power users.

Make your character cards talk without sounding like every other TTS demo. Drop in the SillyTavern extension; pick a persona; press play.

Indie teams
Companion + chat-character apps.

Ship voice without standing up an LLM-prompt-emotion-tag-TTS pipeline that breaks every release. Persona slug, scene, input — that's the whole call.

Game devs
Foundry GMs · Unity solo · IF tools.

Same NPC across every scene. One persona file, one plugin per surface. Coming next: a Foundry VTT extension and a Unity package.

Pricing

Start free. Scale when it works.

Full pricing
01
Free
$0

Enough to ship a demo and tell a friend.

  • 20 Voiced min / mo
  • 3 personas
  • himaia attribution
02
Creator
$19 / mo

Indie builders and one-person shops.

  • 300 Voiced min · mix tiers
  • 10 private personas
  • No attribution
03
Pro
$79 / mo

Teams shipping character apps.

  • 1,000 Voiced + 100 Cinematic
  • Unlimited private personas
  • Export
04
Scale
Usage

Retail rates for high-volume consumers.

  • $0.04 Basic · $0.06 Voiced
  • $0.15 Cinematic
  • SLA · priority queue
05
Enterprise
Custom

BYO-backend, hand-tuned, SOC 2.

  • Dedicated deploy
  • SSO · audit log
  • Paid pilot
Basic$0.04 / minscripted lines
Voiced$0.06 / minin-character, one call
Cinematic$0.15 / minlong-form, three-stage

1 credit = 1 Cinematic min = 6 Voiced min = 15 Basic min. Overage auto-bills at retail with an 80% threshold warning. Pricing pending first-usage calibration.

Questions

The honest
version.

Still have one? Email Oz. Real reply, usually same day.

  • 01Is this just a TTS wrapper?+

    No. Persona is the unit, voice is the timbre. The runtime composes identity, scene, and idiolect at call time, then hands the line to TTS as the last hop. Most voice APIs stop at the timbre step.

  • 02What about lock-in?+

    voice.persona is Apache-2.0 on GitHub (fuselinkapp/himaia-voice-persona). You commit personas to git, fork the spec freely, and migrate runtimes when you want. The runtime is closed; the abstraction is open.

  • 03Which TTS engine is under the hood?+

    We don't pin a vendor on the wire. The spec is backend-opaque so we can swap or add providers without breaking your personas, and any upstream failure surfaces on the response — your retry logic stays the same.

  • 04Do I have to use the SDK?+

    No. himaia-sdk is two methods of convenience. curl works. Anything that can POST JSON and read a WAV stream works.

  • 05Can I bring my own voice clone?+

    Not on the public roster. Thirty pre-tuned named voices ship today; custom voices land later on the higher tiers alongside SLA and a dedicated deploy.

  • 06What's actually free?+

    20 Voiced minutes a month, no card. New accounts get the credits on signup; the same allotment refreshes every month if you stay on the free tier.

Ship the character.
Not the pipeline.

20 Voiced minutes a month.
Free. No card. No setup.

Start shipping