Speech-to-Text (Streaming) API
The Speech-to-Text accurately converts speech into text using an API powered by Reverie’s AI technology. The solution will transcribe the speech in real-time of various Indian languages and audio formats.
How Does API Work?
- The process to transcribe the continuous audio input:
- Open the connection with the STT service by defining
apikey
,appid
,appname
,domain
,src_lang
. - In the API response, if the
cause = Ready
, then the connection is successfully established. - Write the speech data into the upstream and continuously receive the transcribed data.
- Note: In the response, if the
final = false
, then the audio is partially transcribed, and the service is still processing the input data.
- Note: In the response, if the
- Write
--EOF--
into the upstream to stop the recognition process.- Note: If you fail to write
--EOF--
into the upstream, then the STT service will automatically terminate the recognition process.
- Note: If you fail to write
- Below are the scenarios when the service will auto-terminate the recognition process:
- On connection timeout.
- After starting recording, if the user maintains silence for more than the defined duration.
- In the API response, if the
final = true
, then the text received is considered the final transcript.
- Open the connection with the STT service by defining
Code Snippets for Integration
Initiating Speech Service
URL Elements
CDN
Add the following script to the head section of your index.html
Implementation
Response
Successful Response - Establishing the connection
Partial Utterance - In-between an utterance
The Final Successful Response
Error Response
API References
Query Parameters
A unique key/token is provided by Reverie to identify the user using the STT API.
A unique account ID to identify the user and the default account settings
The parameter to identify the API.
Note: The value allowed is stt_stream
- Indicates the language in which the audio is spoken.
- Specify the ISO language code.
- Example:
hi
- Refer to section Language Codes for valid language code.
- The universe in which the Streaming STT API is used for transcribing the speech.
- Specify the domain ID.
- Example:
generic
- Refer to section Domains for valid domain code.
- The duration to keep a connection open between the application and the STT server.
- Note: The default timeout =
15
seconds, and the maximum time allowed =180
seconds
- The time to determine when to end the connection automatically after detecting the silence after receiving the speech data.
- Example:
- Consider
silence = 15
seconds i.e., On passing the speech for 60 seconds, and if you remain silent, the connection will be open for the next 15 seconds and then will automatically get disconnected.
- Consider
- Note: The default silence=
1
second, and the maximum silence =30
seconds.
- The audio sampling rate and the data format of the speech.
- Refer to section Supported Audio Formats for valid audio format code.
- Note: By default, the format =
16k_int16
. (WAV, Signed 16 bit, 16,000 or 16K Hz).
- Possible values are :
true
- stores audio and keep transcripts in logs.no_audio
- does not store audios but keep transcripts in logs.no_transcript
- does not keep transcripts in logs but stores audios.false
- does not keep neither audios nor transcripts in log.
- Default value is
true
- It will enable punctuation and capitalisation in the transcript. The values it can take are true and false.
- Supported languages:
en
,hi
- Default value is
true
- It will enable continuous decoding even after silence is detected.
- Can take value
true
/1
orfalse
/0
. - Default value is
false
/0
Request
- The audio streamed from the input device.
- Note: The default timeout = 15 seconds, and the maximum time allowed = 180 seconds.
Response
- API will auto-assign assign a unique identification number for each request.
- Will indicate the functional status of the API:
- If the
success
=true
, then the API is functioning and ready to generate output. - If the
success
=false
, then the API is not functional and has some errors
- If the
- Will report whether the received output is partial or final:
- If the
final
=true
, then the received text is the final output. - If the
final
=false
, then the text received is partial and is still processing the file
- If the
- Reason for obtaining the final output.
- The cause will appear for both successful and failed requests;
- Refer to section API Codes to view the list of messages/ cause and its description.
- The streaming audio input is converted into text format in the requested language.
- Note: The field will remain empty in case of any error or on connect.
- The level of confidence that Streaming STT API has in the accuracy of the transcription.
- The Confidence score ranges from
0
to1
. Higher scores indicate greater relevance to the transcription
- The beautified text of the final transcript.
- If the final transcript consists of digits, URL, app names, it is quickly converted to a readable format for the user.
- Note: The field will remain empty in case of any error or on connect
Handling Error
The Streaming STT API raises exceptions for many reasons, such as a failed connection, invalid parameters, authentication errors, and network unavailability. We provide more specific human-readable messages with an error response so that users can react to errors more.
In the API response, if the success
= false
, then the cause will display the reason for the error. Refer to API Codes
to view the list of error messages and its description.
API Messages
Message Code | Message | Description |
---|---|---|
403 | Forbidden | Entered Invalid credentials |
403 | Forbidden | The provided credit limit is exhausted. |
403 | Forbidden | The API key provided to a user is expired. |
403 | Forbidden | The user is not authorized to use the STT API. |
403 | Forbidden | The invalid language code is passed, or the user has entered the code, which he is not authorized to use. |
400 | no decoder for given args | For the language and domain given in the request, STT service is not available |
200 | unsupported audio format | The API does not support the audio format. |
200 | ready | STT is ready to receive audio data |
200 | partial | This is a partial response. Keep waiting for the final response. |
200 | EOF received | This is the final response. The connection will be closed as EOF received in the request. |
200 | silence detected | This is the final response. The connection will be closed as silence is received in the request. |
200 | timeout | This is the final response. The connection will be closed as no data is received. |
200 | audio too short | Audio too short to detect transcription |