DashboardAPI DocsChangelogSystem StatusMy Account
DashboardAPI DocsChangelogSystem StatusMy Account
  1. Text to Speech API
  • Introduction
  • Voice Library
  • API Authentication
  • Text to Speech API
    • WebSocket
    • Generate TTS
      POST
    • List Voices
      POST
  • VoxFX Effects API
    • Generate TTS with VoxFX
      POST
    • List VoxFX Effects
      GET
  • Voice Cloning API
    • Create Voice Clone
      POST
    • List Voice Clones
      GET
    • Get Single Voice
      GET
    • Edit Voice Clone
      PUT
    • Delete Voice Clone
      DELETE
  1. Text to Speech API

WebSocket

The Voicemaker API supports WebSocket connections for real-time audio streaming. You can send text to the API and receive the generated audio stream instantly in response.

WebSocket Endpoint#

wss://developer.voicemaker.in/api/v1/voice/convert

Authentication#

Authentication is required via the Authorization header with a Bearer token when establishing the WebSocket connection:
Authorization: Bearer YOUR-API-KEY

How It Works#

Connect: Establish a WebSocket connection to the endpoint with your API key in the Authorization header.
Send Text: Send a JSON payload containing the text, voiceId (and optional settings) you want to convert. Text limit is up to 3,000 characters max. Large text inputs are automatically split and processed sequentially.
Receive Audio: For every message you send, the server streams back sequential base64-encoded audio chunks formatted as { "success": true, "audio": "..." }.
Completion Signal: When the final chunk is delivered, the response includes isFinal: true so you know the audio is complete. Responses are delivered in sequence. Use this flag to detect completion and concatenate chunks on the client if you need a single audio file.
Timeout: The WebSocket connection automatically closes after one minute of inactivity. Each message you send resets this timeout.
Rate Limit: Supports up to 20 concurrent requests. (Contact support for higher limits.)

Send Text (JSON message)#

{
    "Engine": "neural",
    "VoiceId": "ai3-Jony",
    "LanguageCode": "en-US",
    "Text": "Welcome to Voicemaker API.",
    "OutputFormat": "mp3",
    "SampleRate": "48000",
    "Effect": "default",
    "MasterVolume": "0",
    "MasterSpeed": "0",
    "MasterPitch": "0"
}

Request Parameters#

Required Fields#

Text (*): The text content to convert to speech. Supports SSML tags.
VoiceId (*): The ID of the voice to use for speech synthesis. (e.g., ai3-Jony, ai3-Aria)
LanguageCode (*): The language code for the voice. (e.g., en-US, en-GB, multi-lang for Pro voices)

Optional Fields#

Engine: standard, neural (Default: neural)
OutputFormat: mp3, wav (Default: mp3)
SampleRate: Audio sample rate. Common values: 22050, 24000, 44100, 48000
Effect: Voice effect to apply. (e.g., default, whispered, happy, sad, angry, excited, friendly)
MasterSettings: advanced_v1, advanced_v2 (Default: advanced_v1)
MasterVolume: Volume adjustment: -20 to 20 (Default: 0)
MasterSpeed: Speed adjustment: -100 to 100 (Default: 0)
MasterPitch: Pitch adjustment: -100 to 100 (Default: 0)
AccentCode: Accent code for multilingual voices. (e.g., en-US, en-GB, fr-FR)
CustomFileName: Custom filename for the output audio file.
Stability (ProPlus voices only): Stability setting: 0 to 100 (Default: 50)
Similarity (ProPlus voices only): Similarity setting: 0 to 100 (Default: 80)
ProEngine (ProPlus voices only): turbo, highres, expressive (Default: highres)

Not Supported#

VoxFx: Audio is streamed in chunks, so VoxFx is not supported over WebSocket. To use VoxFx effects, please use the REST API instead.

Response Format#

Per-message response#
{
    "success": true,
    "audio": "base64-encoded-audio-chunk"
}
Final chunk response#
{
    "success": true,
    "audio": "base64-encoded-audio-chunk",
    "isFinal": true
}
Error response#
{
    "success": false,
    "message": "Validation error",
    "errors": ["Text is required and must be a non-empty string"]
}
Modified at 2025-11-22 10:45:41
Previous
API Authentication
Next
Generate TTS
Built with