Welcome to the Voicemaker API - Your gateway to high-quality, customizable Text-to-Speech (TTS). Easily integrate real-time speech generation with support for voice tuning, SSML, multiple languages, and advanced Pro voice models for ultra-realistic, expressive audio, built for creators, developers, and enterprises.Voicemaker offers multiple voice model families, each engineered to balance quality, performance, expressiveness, language coverage, and cost - allowing you to choose the ideal model for your specific use case.Pro Voices#
Our most advanced, ultra-realistic, and high-performance multilingual TTS models. Pro Voices deliver exceptional audio quality and natural expression, and are billed at higher character rates due to their enhanced capabilities and production-grade performance.ProPlus - Expressive (Beta)
A state-of-the-art, prompt-driven voice model with rich emotional depth. Perfect for creative storytelling and performance-focused applications.
🎭 Deep emotional and expressive performance
💠 Best For: Storytelling, character voices, dubbing, roleplay
ProPlus - High-ResStudio-grade clarity and realism for polished professional production at scale.
🎧 Ultra-high fidelity audio output
💠 Best For: Media production, ads, video editing, broadcast
ProPlus - TurboOptimized for real-time interactive applications such as AI voice agents, chatbots and low-latency systems.
⚡ Ultra-fast voice generation
💠 Best For: Chatbots, assistants, live dialogue systems
Pro2Next-generation multilingual engine with enhanced support for Indian languages. Designed for cultural accuracy, phonetic clarity, and emotional expression.
Pro1Standard neural multilingual model with strong performance and cost efficiency.
💲 Cost: 1× per character (CJK (Chinese, Japanese, Korean) = 2×)
Default Voices#
AI1, AI2, AI2, AI3, AI4, AI5, AI6, HashCodeMost affortable neural voices for everyday production and High-volume TTS workloads.
💠 Best For: Bulk TTS, internal tools, scalable applications
💲 Cost: 1× per character (CJK (Chinese, Japanese, Korean) = 2×)
From the default voice lineup, AI2 and AI3 seem to offer the best balance of quality and performance. Voice Model Comparison#
| Model | Description | Languages | Cost |
|---|
| ProPlus Expressive | Emotionally rich, realistic performance | 70+ | 6× |
| ProPlus High-Res | Ultra-high clarity and studio-grade fidelity | 30+ | 6× |
| ProPlus Turbo | Low-latency, optimized for real-time applications | 30+ | 3× |
| Pro2 | High-quality next-gen multilingual neural speech | 30+ | 3× |
| Pro1 | Standard neural multilingual processing | 90+ | 1× (CJK = 2×) |
| Default Voices | Baseline quality and cost-efficient for scale | 130+ | 1× (CJK = 2×) |
Supported Languages#
Voicemaker offers an extensive and ever-growing library of international languages, such as:Afrikaans, Arabic, Armenian, Assamese, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Chinese (Cantonese), Chinese (Mandarin), Croatian, Czech, Danish, Dutch, English (US, UK, AU, IN, ZA), Estonian, Filipino, Finnish, French (FR/CA), Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Marathi, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese (BR/PT), Punjabi, Romanian, Russian, Serbian, Sindhi, Slovak, Slovenian, Somali, Spanish (EU, MX, AR), Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh, and many more.| Format | Sample Rate | Recommended Use |
|---|
| MP3 | upto 48 kHz | General media, web, mobile |
| WAV | upto 48 kHz | Studio post-production |
| OGG | Variable | Web-optimized lightweight audio |
| PCM / μ-law / a-law | 8 kHz | Telephony (IVR, CCaaS, SIP) |
Enjoy premium, high-quality audio output on every plan, with seamless API integration included.