DashboardAPI DocsChangelogSystem StatusMy Account
DashboardAPI DocsChangelogSystem StatusMy Account
  1. API Docs
  • Introduction
  • Voice Library
  • API Authentication
  • Text to Speech API
    • WebSocket
    • Generate TTS
      POST
    • List Voices
      POST
  • VoxFX Effects API
    • Generate TTS with VoxFX
      POST
    • List VoxFX Effects
      GET
  • Voice Cloning API
    • Create Voice Clone
      POST
    • List Voice Clones
      GET
    • Get Single Voice
      GET
    • Edit Voice Clone
      PUT
    • Delete Voice Clone
      DELETE
  1. API Docs

Introduction

Welcome to the Voicemaker API - Your gateway to high-quality, customizable Text-to-Speech (TTS). Easily integrate real-time speech generation with support for voice tuning, SSML, multiple languages, and advanced Pro voice models for ultra-realistic, expressive audio, built for creators, developers, and enterprises.
Voicemaker offers multiple voice model families, each engineered to balance quality, performance, expressiveness, language coverage, and cost - allowing you to choose the ideal model for your specific use case.

Pro Voices#

Our most advanced, ultra-realistic, and high-performance multilingual TTS models. Pro Voices deliver exceptional audio quality and natural expression, and are billed at higher character rates due to their enhanced capabilities and production-grade performance.
ProPlus - Expressive (Beta)
A state-of-the-art, prompt-driven voice model with rich emotional depth. Perfect for creative storytelling and performance-focused applications.
🎭 Deep emotional and expressive performance
🌍 70+ languages
💠 Best For: Storytelling, character voices, dubbing, roleplay
💲 Cost: 6× per character
See prompt guidelines: Expressive Model Prompt Guide.
ProPlus - High-Res
Studio-grade clarity and realism for polished professional production at scale.
🎧 Ultra-high fidelity audio output
🌍 30+ languages
💠 Best For: Media production, ads, video editing, broadcast
💲 Cost: 6× per character
ProPlus - Turbo
Optimized for real-time interactive applications such as AI voice agents, chatbots and low-latency systems.
⚡ Ultra-fast voice generation
🌍 30+ languages
⏱ Latency: ~250–300 ms
💠 Best For: Chatbots, assistants, live dialogue systems
💲 Cost: 3× per character
Pro2
Next-generation multilingual engine with enhanced support for Indian languages. Designed for cultural accuracy, phonetic clarity, and emotional expression.
💲 Cost: 3× per character
Pro1
Standard neural multilingual model with strong performance and cost efficiency.
💲 Cost: 1× per character (CJK (Chinese, Japanese, Korean) = 2×)

Default Voices#

AI1, AI2, AI2, AI3, AI4, AI5, AI6, HashCode
Most affortable neural voices for everyday production and High-volume TTS workloads.
🌐 Multi-language support
🌍 130+ languages
💠 Best For: Bulk TTS, internal tools, scalable applications
💲 Cost: 1× per character (CJK (Chinese, Japanese, Korean) = 2×)
From the default voice lineup, AI2 and AI3 seem to offer the best balance of quality and performance.

Voice Model Comparison#

ModelDescriptionLanguagesCost
ProPlus ExpressiveEmotionally rich, realistic performance70+6×
ProPlus High-ResUltra-high clarity and studio-grade fidelity30+6×
ProPlus TurboLow-latency, optimized for real-time applications30+3×
Pro2High-quality next-gen multilingual neural speech30+3×
Pro1Standard neural multilingual processing90+1× (CJK = 2×)
Default VoicesBaseline quality and cost-efficient for scale130+1× (CJK = 2×)

Supported Languages#

Voicemaker offers an extensive and ever-growing library of international languages, such as:
Afrikaans, Arabic, Armenian, Assamese, Azerbaijani, Belarusian, Bengali, Bosnian, Bulgarian, Catalan, Cebuano, Chichewa, Chinese (Cantonese), Chinese (Mandarin), Croatian, Czech, Danish, Dutch, English (US, UK, AU, IN, ZA), Estonian, Filipino, Finnish, French (FR/CA), Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Latvian, Lingala, Lithuanian, Luxembourgish, Macedonian, Malay, Malayalam, Marathi, Nepali, Norwegian, Pashto, Persian, Polish, Portuguese (BR/PT), Punjabi, Romanian, Russian, Serbian, Sindhi, Slovak, Slovenian, Somali, Spanish (EU, MX, AR), Swahili, Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, Urdu, Vietnamese, Welsh, and many more.

Supported formats#

FormatSample RateRecommended Use
MP3upto 48 kHzGeneral media, web, mobile
WAVupto 48 kHzStudio post-production
OGGVariableWeb-optimized lightweight audio
PCM / μ-law / a-law8 kHzTelephony (IVR, CCaaS, SIP)
Enjoy premium, high-quality audio output on every plan, with seamless API integration included.

Modified at 2025-11-23 04:52:27
Next
Voice Library
Built with