Voice infrastructure

Give your characters a
voice that stays in character.

One file defines who they are. One API call makes them talk.
For the people shipping companions, NPCs, and chat characters.

Start free20 Voiced min / mo Read the spec

Open persona spec. No SDK lock-in. Pin it like a dependency.

Apache-2.0 spec ↗SillyTavern extension ↗Two-method SDK ↗Product Hunt · Apr 2026 ↗

Same line · three personas

One inn. Three rooms.

Three .persona.yaml files. Same input. Three completely different replies.

Player

“I'm looking for a room. Just the night.”

scene · comfort · reassure

▶ Press play on any row.Same API call — only the persona slug changes.

01
Dry Butler
The proprietor. Polite to the point of weaponization.
persona · himaia/dry_butler
02
Tavern Rogue
The chancer at the bar. Sees angles other people miss.
persona · himaia/tavern_rogue
03
Weary GM
The narrator. Fourteen character deaths and counting.
persona · himaia/weary_gm

Starter personas

open spec, forkable

Voices

across the roster

Fidelity modes

verbatim · shape · rewrite

Languages

out of the box

The problem

Most voice APIs hand you a sound.
They don't hand you a character.

Same voice, every scene.

A character whispering at 2am sounds like the one shouting at noon. Timbre is not a performance.

The persona slips.

Around turn fifteen the character forgets who they are. A nice voice doesn't fix this.

You glue the pipeline.

LLM, prompt, emotion tag, TTS, streaming. Every project, from scratch, badly.

What we ship

Persona is the unit.
Voice is the last hop.

One persona. Every scene.

Same identity, different room — comfort, banter, challenge — without losing who they are.

Format × dialogue act swaps without persona drift.

One file. One API call.

A .persona.yaml you commit to git. A POST /v1/generate that returns audio.

Three modes, one endpoint. JSON in, WAV out.

A spec you own.

voice.persona is Apache-2.0. Pin it like a dependency. Fork it like one.

Open spec, closed runtime. The Stripe-shaped arc.

A character
who stays themselves.

The whole bet, one line

Show

One call.
WAV back.

Persona slug. Scene. Input.
That's the whole API surface for a character that stays in character.

Read the reference Open playground

Requestvoiced mode

POST /v1/generate
Authorization: Bearer himaia_live_***
Content-Type: application/json

{
  "mode": "voiced",
  "persona": "himaia/warm_confidant",
  "scene": "comfort",
  "input": "You don't have to fix it tonight."
}

→ 200 OK   audio/wav
  x-himaia-seconds: 3.8
  x-himaia-charge-cents: 1

Voices

Thirty voices. Named, pre-tuned.
Same persona, any voice. Same voice, any persona.

Hear them all

Who it's for

Pick the path
that fits.

Modders

Tavern Card power users.

Make your character cards talk without sounding like every other TTS demo. Drop in the SillyTavern extension; pick a persona; press play.

ST extension ↗

Indie teams

Companion + chat-character apps.

Ship voice without standing up an LLM-prompt-emotion-tag-TTS pipeline that breaks every release. Persona slug, scene, input — that's the whole call.

Read the spec →

Game devs

Foundry GMs · Unity solo · IF tools.

Same NPC across every scene. One persona file, one plugin per surface. Coming next: a Foundry VTT extension and a Unity package.

Get on the list ↗

Pricing

Start free. Scale when it works.

Full pricing

Free

Enough to ship a demo and tell a friend.

—20 Voiced min / mo
—3 personas
—himaia attribution

Start free

Creator

$19 / mo

Indie builders and one-person shops.

—300 Voiced min · mix tiers
—10 private personas
—No attribution

Start Creator

Pro

$79 / mo

Teams shipping character apps.

—1,000 Voiced + 100 Cinematic
—Unlimited private personas
—Export

Start Pro

Scale

Usage

Retail rates for high-volume consumers.

—$0.04 Basic · $0.06 Voiced
—$0.15 Cinematic
—SLA · priority queue

Contact

Enterprise

Custom

BYO-backend, hand-tuned, SOC 2.

—Dedicated deploy
—SSO · audit log
—Paid pilot

Contact

Basic$0.04 / minscripted lines

Voiced$0.06 / minin-character, one call

Cinematic$0.15 / minlong-form, three-stage

1 credit = 1 Cinematic min = 6 Voiced min = 15 Basic min. Overage auto-bills at retail with an 80% threshold warning. Pricing pending first-usage calibration.

Questions

The honest
version.

Still have one? Email Oz. Real reply, usually same day.

01Is this just a TTS wrapper?+
No. Persona is the unit, voice is the timbre. The runtime composes identity, scene, and idiolect at call time, then hands the line to TTS as the last hop. Most voice APIs stop at the timbre step.
02What about lock-in?+
voice.persona is Apache-2.0 on GitHub (fuselinkapp/himaia-voice-persona). You commit personas to git, fork the spec freely, and migrate runtimes when you want. The runtime is closed; the abstraction is open.
03Which TTS engine is under the hood?+
We don't pin a vendor on the wire. The spec is backend-opaque so we can swap or add providers without breaking your personas, and any upstream failure surfaces on the response — your retry logic stays the same.
04Do I have to use the SDK?+
No. himaia-sdk is two methods of convenience. curl works. Anything that can POST JSON and read a WAV stream works.
05Can I bring my own voice clone?+
Not on the public roster. Thirty pre-tuned named voices ship today; custom voices land later on the higher tiers alongside SLA and a dedicated deploy.
06What's actually free?+
20 Voiced minutes a month, no card. New accounts get the credits on signup; the same allotment refreshes every month if you stay on the free tier.

Ship the character.
Not the pipeline.

20 Voiced minutes a month.
Free. No card. No setup.

Start shipping

Give your characters avoice that stays in character.

One inn. Three rooms.

Most voice APIs hand you a sound.They don't hand you a character.

Persona is the unit.Voice is the last hop.

One call.WAV back.

Pick the paththat fits.