How Does API Work?

The process to transcribe the continuous audio input:
- Open the connection with the STT service by defining apikey, appid, appname, domain, src_lang.
- In the API response, if the cause = Ready, then the connection is successfully established.
- Write the speech data into the upstream and continuously receive the transcribed data.
  - Note: In the response, if the final = false, then the audio is partially transcribed, and the service is still processing the input data.
- Write --EOF-- into the upstream to stop the recognition process.
  - Note: If you fail to write --EOF-- into the upstream, then the STT service will automatically terminate the recognition process.
- Below are the scenarios when the service will auto-terminate the recognition process:
  - On connection timeout.
  - After starting recording, if the user maintains silence for more than the defined duration.
- In the API response, if the final = true, then the text received is considered the final transcript.

Code Snippets for Integration

Initiating Speech Service

URL Elements

wss://revapi.reverieinc.com/stream?apikey=<your apikey>&appid=<your app id>&appname=stt_stream&src_lang=hi&domain=generic

CDN
Add the following script to the head section of your index.html

<script src="https://cdn.jsdelivr.net/npm/reverie-stt-sdk/dist/bundle.js"></script>

Implementation

const ReverieClient = require("reverie-client");

const reverieClient = new ReverieClient({
apiKey: "YOUR-API-KEY",
appId: "YOUR-APP-ID",
});

await reverieClient.init_stt({
src_lang: "hi",
callback: (event) => {
console.log("STT event:", event);
},
element: document.getElementById("transcript"),
domain: "generic"

});

// Start/Stop streaming
await reverieClient.start_stt();
await reverieClient.stop_stt();

Response

Successful Response - Establishing the connection

{
  "id": "bb261bd789af4ba487a2667f8d942d4d7e0195fd1c8e4073",
  "success": true,
  "text": "",
  "display_text": "",
  "final": false,
  "confidence": 1,
  "cause": "ready"
}

Partial Utterance - In-between an utterance

{
  "id": "bb261bd789af4ba487a2667f8d942d4d7e0195fd1c8e4073",
  "success": true,
  "text": "आत्म निर्भर योजना इस योजना अथवा अभियान का उद्देश्य एक सौ तीस करोड़ भारतवासियो",
  "display_text": "आत्म निर्भर योजना इस योजना अथवा अभियान का उद्देश्य 130 करोड़ भारतवासियों",
  "final": false,
  "confidence": 0.797274,
  "cause": "partial"
}

The Final Successful Response

{
  "id": "bb261bd789af4ba487a2667f8d942d4d7e0195fd1c8e4073",
  "success": true,
  "text": "आत्म निर्भर योजना इस योजना अथवा अभियान का उद्देश्य एक सौ तीस करोड़ भारतवासियों को आत्मनिर्भर बनाना है ताकि देश का हर नागरिक संकट की इस घड़ी में कदम से कदम मिलाकर चल सके",
  "display_text": "आत्म निर्भर योजना इस योजना अथवा अभियान का उद्देश्य 130 करोड़ भारतवासियों को आत्मनिर्भर बनाना है  ताकि देश का हर नागरिक संकट की इस घड़ी में कदम से कदम मिलाकर चल सके",
  "final": true,
  "confidence": 0.743304,
  "cause": "EOF received"
}

Error Response

{
  "id": "2ab3a0b76c854953a022df742f6b3857a76494acd72e4489",
  "success": false,
  "text": "",
  "final": true,
  "confidence": 1,
  "cause": "no `domain` given"
}

API References

Query Parameters

apikey

string

required

A unique key/token is provided by Reverie to identify the user using the STT API.

appid

string

required

A unique account ID to identify the user and the default account settings

appname

string

required

The parameter to identify the API.
Note: The value allowed is stt_stream

src_lang

string

required

Indicates the language in which the audio is spoken.
Specify the ISO language code.
Example: hi
Refer to section Language Codes for valid language code.

domain

string

required

The universe in which the Streaming STT API is used for transcribing the speech.
Specify the domain ID.
Example: generic
Refer to section Domains for valid domain code.

timeout

float

The duration to keep a connection open between the application and the STT server.
Note: The default timeout = 15 seconds, and the maximum time allowed = 180 seconds

silence

float

The time to determine when to end the connection automatically after detecting the silence after receiving the speech data.
Example:
- Consider silence = 15 seconds i.e., On passing the speech for 60 seconds, and if you remain silent, the connection will be open for the next 15 seconds and then will automatically get disconnected.
Note: The default silence= 1 second, and the maximum silence = 30 seconds.

format

string

The audio sampling rate and the data format of the speech.
Refer to section Supported Audio Formats for valid audio format code.
Note: By default, the format = 16k_int16. (WAV, Signed 16 bit, 16,000 or 16K Hz).

logging

string

Possible values are :
1. true - stores audio and keep transcripts in logs.
2. no_audio - does not store audios but keep transcripts in logs.
3. no_transcript - does not keep transcripts in logs but stores audios.
4. false - does not keep neither audios nor transcripts in log.
Default value is true

punctuate

string

It will enable punctuation and capitalisation in the transcript. The values it can take are true and false.
Supported languages: en, hi
Default value is true

continuous

string

It will enable continuous decoding even after silence is detected.
Can take value true/1 or false/0.
Default value is false/0

Request

streaming audio

binary

required

The audio streamed from the input device.
Note: The default timeout = 15 seconds, and the maximum time allowed = 180 seconds.

Response

string

API will auto-assign assign a unique identification number for each request.

success

boolean

Will indicate the functional status of the API:
- If the success = true, then the API is functioning and ready to generate output.
- If the success = false, then the API is not functional and has some errors

final

boolean

Will report whether the received output is partial or final:
- If the final = true, then the received text is the final output.
- If the final = false, then the text received is partial and is still processing the file

cause

string

Reason for obtaining the final output.
The cause will appear for both successful and failed requests;
Refer to section API Codes to view the list of messages/ cause and its description.

text

string

The streaming audio input is converted into text format in the requested language.
Note: The field will remain empty in case of any error or on connect.

confidence

float

The level of confidence that Streaming STT API has in the accuracy of the transcription.
The Confidence score ranges from 0 to 1. Higher scores indicate greater relevance to the transcription

display_text

string

The beautified text of the final transcript.
If the final transcript consists of digits, URL, app names, it is quickly converted to a readable format for the user.
Note: The field will remain empty in case of any error or on connect

Handling Error

The Streaming STT API raises exceptions for many reasons, such as a failed connection, invalid parameters, authentication errors, and network unavailability. We provide more specific human-readable messages with an error response so that users can react to errors more. In the API response, if the success = false, then the cause will display the reason for the error. Refer to API Codes to view the list of error messages and its description.

{
  "id": "2ab3a0b76c854953a022df742f6b3857a76494acd72e4489",
  "success": false,
  "text": "",
  "final": true,
  "confidence": 1,
  "cause": "no `domain` given"
}

API Messages

Message Code	Message	Description
403	Forbidden	Entered Invalid credentials
403	Forbidden	The provided credit limit is exhausted.
403	Forbidden	The API key provided to a user is expired.
403	Forbidden	The user is not authorized to use the STT API.
403	Forbidden	The invalid language code is passed, or the user has entered the code, which he is not authorized to use.
400	no decoder for given args	For the language and domain given in the request, STT service is not available
200	unsupported audio format	The API does not support the audio format.
200	ready	STT is ready to receive audio data
200	partial	This is a partial response. Keep waiting for the final response.
200	EOF received	This is the final response. The connection will be closed as EOF received in the request.
200	silence detected	This is the final response. The connection will be closed as silence is received in the request.
200	timeout	This is the final response. The connection will be closed as no data is received.
200	audio too short	Audio too short to detect transcription

Getting Started

Usage Guides

API References

Use Cases

SDKs

Endpoints

Speech-to-Text (Streaming) API

How Does API Work?

Code Snippets for Integration

Initiating Speech Service

Implementation

Response

Successful Response - Establishing the connection

Partial Utterance - In-between an utterance

The Final Successful Response

Error Response

API References

Query Parameters

Request

Response

Handling Error

API Messages

Getting Started

Usage Guides

API References

Use Cases

SDKs

Endpoints

​How Does API Work?

​Code Snippets for Integration

​Initiating Speech Service

​Implementation

​Response

​Successful Response - Establishing the connection

​Partial Utterance - In-between an utterance

​The Final Successful Response

​Error Response

​API References

​Query Parameters

​Request

​Response

​Handling Error

​API Messages

How Does API Work?

Code Snippets for Integration

Initiating Speech Service

Implementation

Response

Successful Response - Establishing the connection

Partial Utterance - In-between an utterance

The Final Successful Response

Error Response

API References

Query Parameters

Request

Response

Handling Error

API Messages