Speech to Text | File API

Overview

Speech-to-Text File API (STT File API)

The Speech-to-Text accurately converts speech into text using an API powered by Reverie's AI technology. The solution will transcribe the speech in real-time of various Indian languages and audio formats.

The solution is a fully managed and continually trained solution, which leverages machine learning to combine knowledge of grammar, language structure, and the composition of audio and voice signals to accurately transcribe the speech.

Prerequisite

Speech Recognition Type

File Upload

No. of Channel

1

Time Limit

Audio file length should be equal or less than 300 seconds (5 minutes)

Supporting Languages

The STT will understand regional accents, bi-lingual nature of Indians, and is dialect-agnostic. It will transcribe the audio from widely spoken Indian languages:

  1. Hindi

  1. Bengali

  1. Gujarati

  1. Tamil

  1. Telugu

  1. Marathi

  1. Malayalam

  1. Kannada

  1. Punjabi

  1. Indian English

  1. Assamese

  1. Odia

Note: Our Research and Development team is continuously working to enable all the leading Indian languages on the Speech-to-Text platform and strive to enhance the existing model's accuracy.

Key Features

The Speech-to-Text solution will offer robust features that help you to deliver better user experience in products through voice commands:

Real-time Transcription

The pre-recorded audio files are transcribed accurately into text format in real-time. It will decode speech with high accuracy and confidence, even from the lower-quality audio input.

Personalize Speech Model

Tailor speech recognition to transcribe domain-specific terms and boost your transcription accuracy of specific words or phrases.

Noise Resistance

The solution will decode moderate noisy audio data recorded in various environments without requiring additional noise cancellation.

Content Filtering

An obscenity filter will detect inappropriate or unprofessional content in your audio data and filter out profane words in text output.

Flexible deployment

The API is platform agnostic and will support both the deployment model:

  • Cloud-based deployment

  • On-premise deployment

Supporting Domain

Domain Name
Description

generic

The model is trained on transcribing audio files irrespective of industry type.

bfsi

The BFSI model is specially trained to accurately transcribe the audio files related to the banking and financial terminologies

ecomm

voice-search

alphanumeric

names

address

language

yes-no

Last updated