API Docs - Voicemaker API Platform

Welcome to the Voicemaker API - Your gateway to high-quality, customizable Text-to-Speech (TTS). Easily integrate real-time speech generation with support for voice tuning, SSML, multiple languages, and advanced Pro voice models for ultra-realistic, expressive audio, built for creators, developers, and enterprises.

Voicemaker offers multiple voice model families, each engineered to balance quality, performance, expressiveness, language coverage, and cost - allowing you to choose the ideal model for your specific use case.

Pro Voices

Our most advanced, ultra-realistic, and high-performance multilingual TTS models. Pro Voices deliver exceptional audio quality and natural expression, and are billed at higher character rates due to their enhanced capabilities and production-grade performance.

ProPlus - Expressive (Beta)
A state-of-the-art, prompt-driven voice model with rich emotional depth. Perfect for creative storytelling and performance-focused applications.

🎭 Deep emotional and expressive performance

🌍 70+ languages

💠 Best For: Storytelling, character voices, dubbing, roleplay

💲 Cost: 4× per character

See prompt guidelines: Expressive Model Prompt Guide.

ProPlus - High-Res
Studio-grade clarity and realism for polished professional production at scale.

🎧 Ultra-high fidelity audio output

🌍 30+ languages

💠 Best For: Media production, ads, video editing, broadcast

💲 Cost: 4× per character

ProPlus - Turbo
Optimized for real-time interactive applications such as AI voice agents, chatbots and low-latency systems.

⚡ Ultra-fast voice generation

🌍 30+ languages

⏱ Latency: ~250–300 ms

💠 Best For: Chatbots, assistants, live dialogue systems

💲 Cost: 2× per character

Pro2
Next-generation multilingual engine with enhanced support for Indian languages. Designed for cultural accuracy, phonetic clarity, and emotional expression.

💲 Cost: 2× per character

Pro1
Standard neural multilingual model with strong performance and cost efficiency.

💲 Cost: 1× per character (CJK (Chinese, Japanese, Korean) = 2×)

Default Voices

AI1, AI2, AI2, AI3, AI4, AI5, AI6, HashCode
Most affortable neural voices for everyday production and High-volume TTS workloads.

🌐 Multi-language support

🌍 130+ languages

💠 Best For: Bulk TTS, internal tools, scalable applications

💲 Cost: 1× per character (CJK (Chinese, Japanese, Korean) = 2×)

From the default voice lineup, AI2 and AI3 seem to offer the best balance of quality and performance.

Voice Model Comparison

Model	Description	Languages	Cost
ProPlus Expressive	Emotionally rich, realistic performance	70+	4×
ProPlus High-Res	Ultra-high clarity and studio-grade fidelity	30+	4×
ProPlus Turbo	Low-latency, optimized for real-time applications	30+	2×
Pro2	High-quality next-gen multilingual neural speech	30+	2×
Pro1	Standard neural multilingual processing	90+	1× (CJK = 2×)
Default Voices	Baseline quality and cost-efficient for scale	130+	1× (CJK = 2×)

Supported Languages

Voicemaker offers an extensive and ever-growing library of international languages, such as:

Afrikaans, Arabic, Armenian, Assamese, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Chinese (Cantonese), Chinese (Mandarin), Croatian, Czech, Danish, Dutch, English (US, UK, AU, IN, ZA), Estonian, Filipino, Finnish, French (FR/CA), Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Marathi, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese (BR/PT), Punjabi, Romanian, Russian, Serbian, Sindhi, Slovak, Slovenian, Somali, Spanish (EU, MX, AR), Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh, and many more.

Supported formats

Format	Sample Rate	Recommended Use
MP3	upto 48 kHz	General media, web, mobile
WAV	upto 48 kHz	Studio post-production
OGG	Variable	Web-optimized lightweight audio
PCM / μ-law / a-law	8 kHz	Telephony (IVR, CCaaS, SIP)

Enjoy premium, high-quality audio output on every plan, with seamless API integration included.

Introduction

Pro Voices#

Default Voices#

Voice Model Comparison#

Supported Languages#

Supported formats#

Pro Voices

Default Voices

Voice Model Comparison

Supported Languages

Supported formats