Speech to Text | Streaming API


Speech-to-Text Streaming API (STT Streaming API)

The Speech-to-Text accurately converts speech into text using an API powered by Reverie's AI technology. The solution will transcribe the speech in real-time of various Indian languages and audio formats.
The solution is a fully managed and continually trained solution, which leverages machine learning to combine knowledge of grammar, language structure, and the composition of audio and voice signals to accurately transcribe the speech.


The prerequisites to set-up and use the STT API are:
WWS Port
Speech Recognition Type
Continuous Speech Recognition
No. of Channel
Time Limit
The requests are limited to audio data of 180 seconds (3 minutes) or less in duration.
Note: The default audio length duration is set for 15 seconds.

Supporting Languages

The solution will understand regional accents, bi-lingual nature of Indians, and is dialect-agnostic. it will transcribe the audio from widely spoken Indian languages:
  1. 1.
  1. 2.
  1. 3.
  1. 4.
  1. 5.
  1. 6.
  1. 7.
  1. 8.
  1. 9.
  1. 10.
    Indian English
  1. 11.
  1. 12.
Note: Our Research and Development team is continuously working to enable all the leading Indian languages on the Speech-to-Text platform and strive to enhance the existing model's accuracy.

Key Features

The Speech-to-Text solution offers robust features that help in delivering better user experience in interacting with the products through voice commands:
Real-time Transcription
The pre-recorded audio files are transcribed accurately into text format in real-time. It will decode speech with high accuracy and confidence, even from the lower-quality audio input.
Personalize Speech Model
Tailor speech recognition to transcribe domain-specific terms and boost your transcription accuracy of specific words or phrases.
Noise Resistance
The solution will decode moderate noisy audio data recorded in various environments without requiring additional noise cancellation.
Content Filtering
Obscenity filter will detect inappropriate or unprofessional content in your audio data and filter out profane words in text output.
Flexible deployment
The API is platform agnostic and will support both the deployment model:
  • Cloud-based deployment
  • On-premise deployment

Supporting Domain

Domain Name
The model trained on continuously transcribing the speech irrespective of an industry type.
Specially trained to transcribe audio related to banking & financial terminologies. [Available for Punjabi, Odia and Assamese on-demand].
Specially trained to transcribe audio related to E-commerce/FMCG terminologies. [Available only for English and Hindi]
Last modified 2mo ago