Speech to Text API

Convert speech to text by uploading an audio file to our transcription API.
Flagship V1, the most accurate transcription model ever, supporting 90+ languages, SRT subtitle generation, and delivers high accuracy on long, complex recordings across diverse speakers, accents, and delivery styles.

Pricing: 5 credits per second of audio

Workflow:

Audio files 3 minutes or shorter are processed synchronously.

Audio files longer than 3 minutes are automatically processed asynchronously.
For asynchronous jobs, transcription runs in the background and may take additional time depending on file length.

If your audio is processed asynchronously, you can retrieve the transcription status and results using the Get Single Transcription API, which allows you to check pending or completed jobs.

Supported Languages (90+ languages)

curl --location --request POST 'https://developer.voicemaker.in/api/v1/speech-to-text' \ --form 'file=@""' \ --form 'model=""' \ --form 'language="auto"' \ --form 'responseFormat=""' \ --form 'includeSubtitle="false"' \ --form 'tagAudioEvents="false"'

{ "success": true, "data": { "taskId": "6963d6514017c12417a5d2fb", "name": "176798484225230676tx65i0xs-voicemaker.in-speech.mp3", "fileName": "transcribe-1768150609567472.mp3", "speechFile": "https://developer.voicemaker.in/uploads-transcribe/transcribe-1768150609567472.mp3", "model": "stt-flagship-v1", "generatedText": "Erbongweni prison transfer. Plot to kill Kat Muthala in prison unearthed. Very corrupt senior government officials want Muthala dead before he exposes them.", "status": "completed", "charge": 225 }, "isProcessing": false, "usedChars": 225, "remainChars": 2417996, "remainKeyChars": 2417996 }

Speech to Text API

Workflow:

Request

Responses

Speech to Text API

Workflow:#

Request

Responses

Workflow: