How Does API Work?

  • The process to transcribe the continuous audio input:
    • Open the connection with the STT service by defining apikey, appid, appname, domain, src_lang.
    • In the API response, if the cause = Ready, then the connection is successfully established.
    • Write the speech data into the upstream and continuously receive the transcribed data.
      • Note: In the response, if the final = false, then the audio is partially transcribed, and the service is still processing the input data.
    • Write --EOF-- into the upstream to stop the recognition process.
      • Note: If you fail to write --EOF-- into the upstream, then the STT service will automatically terminate the recognition process.
    • Below are the scenarios when the service will auto-terminate the recognition process:
      • On connection timeout.
      • After starting recording, if the user maintains silence for more than the defined duration.
    • In the API response, if the final = true, then the text received is considered the final transcript.

Code Snippets for Integration

Initiating Speech Service

URL Elements

wss://revapi.reverieinc.com/stream?apikey=<your apikey>&appid=<your app id>&appname=stt_stream&src_lang=hi&domain=generic

CDN
Add the following script to the head section of your index.html

<script src="https://cdn.jsdelivr.net/npm/reverie-stt-sdk/dist/bundle.js"></script>

Implementation

const ReverieClient = require("reverie-client");

const reverieClient = new ReverieClient({
apiKey: "YOUR-API-KEY",
appId: "YOUR-APP-ID",
});

await reverieClient.init_stt({
src_lang: "hi",
callback: (event) => {
console.log("STT event:", event);
},
element: document.getElementById("transcript"),
domain: "generic"

});

// Start/Stop streaming
await reverieClient.start_stt();
await reverieClient.stop_stt();

Response

Successful Response - Establishing the connection

{
  "id": "bb261bd789af4ba487a2667f8d942d4d7e0195fd1c8e4073",
  "success": true,
  "text": "",
  "display_text": "",
  "final": false,
  "confidence": 1,
  "cause": "ready"
}

Partial Utterance - In-between an utterance

{
  "id": "bb261bd789af4ba487a2667f8d942d4d7e0195fd1c8e4073",
  "success": true,
  "text": "आत्म निर्भर योजना इस योजना अथवा अभियान का उद्देश्य एक सौ तीस करोड़ भारतवासियो",
  "display_text": "आत्म निर्भर योजना इस योजना अथवा अभियान का उद्देश्य 130 करोड़ भारतवासियों",
  "final": false,
  "confidence": 0.797274,
  "cause": "partial"
}

The Final Successful Response

{
  "id": "bb261bd789af4ba487a2667f8d942d4d7e0195fd1c8e4073",
  "success": true,
  "text": "आत्म निर्भर योजना इस योजना अथवा अभियान का उद्देश्य एक सौ तीस करोड़ भारतवासियों को आत्मनिर्भर बनाना है ताकि देश का हर नागरिक संकट की इस घड़ी में कदम से कदम मिलाकर चल सके",
  "display_text": "आत्म निर्भर योजना इस योजना अथवा अभियान का उद्देश्य 130 करोड़ भारतवासियों को आत्मनिर्भर बनाना है ताकि देश का हर नागरिक संकट की इस घड़ी में कदम से कदम मिलाकर चल सके",
  "final": true,
  "confidence": 0.743304,
  "cause": "EOF received"
}

Error Response

{
  "id": "2ab3a0b76c854953a022df742f6b3857a76494acd72e4489",
  "success": false,
  "text": "",
  "final": true,
  "confidence": 1,
  "cause": "no `domain` given"
}

API References

Query Parameters

apikey
string
required

A unique key/token is provided by Reverie to identify the user using the STT API.

appid
string
required

A unique account ID to identify the user and the default account settings

appname
string
required

The parameter to identify the API.
Note: The value allowed is stt_stream

src_lang
string
required
  • Indicates the language in which the audio is spoken.
  • Specify the ISO language code.
  • Example: hi
  • Refer to section Language Codes for valid language code.
domain
string
required
  • The universe in which the Streaming STT API is used for transcribing the speech.
  • Specify the domain ID.
  • Example: generic
  • Refer to section Domains for valid domain code.
timeout
float
  • The duration to keep a connection open between the application and the STT server.
  • Note: The default timeout = 15 seconds, and the maximum time allowed = 180 seconds
silence
float
  • The time to determine when to end the connection automatically after detecting the silence after receiving the speech data.
  • Example:
    • Consider silence = 15 seconds i.e., On passing the speech for 60 seconds, and if you remain silent, the connection will be open for the next 15 seconds and then will automatically get disconnected.
  • Note: The default silence= 1 second, and the maximum silence = 30 seconds.
format
string
  • The audio sampling rate and the data format of the speech.
  • Refer to section Supported Audio Formats for valid audio format code.
  • Note: By default, the format = 16k_int16. (WAV, Signed 16 bit, 16,000 or 16K Hz).
logging
string
  • Possible values are :
    1. true - stores audio and keep transcripts in logs.
    2. no_audio - does not store audios but keep transcripts in logs.
    3. no_transcript - does not keep transcripts in logs but stores audios.
    4. false - does not keep neither audios nor transcripts in log.
  • Default value is true
punctuate
string
  • It will enable punctuation and capitalisation in the transcript. The values it can take are true and false.
  • Supported languages: en, hi
  • Default value is true
continuous
string
  • It will enable continuous decoding even after silence is detected.
  • Can take value true/1 or false/0.
  • Default value is false/0

Request

streaming audio
binary
required
  • The audio streamed from the input device.
  • Note: The default timeout = 15 seconds, and the maximum time allowed = 180 seconds.

Response

id
string
  • API will auto-assign assign a unique identification number for each request.
success
boolean
  • Will indicate the functional status of the API:
    • If the success = true, then the API is functioning and ready to generate output.
    • If the success = false, then the API is not functional and has some errors
final
boolean
  • Will report whether the received output is partial or final:
    • If the final = true, then the received text is the final output.
    • If the final = false, then the text received is partial and is still processing the file
cause
string
  • Reason for obtaining the final output.
  • The cause will appear for both successful and failed requests;
  • Refer to section API Codes to view the list of messages/ cause and its description.
text
string
  • The streaming audio input is converted into text format in the requested language.
  • Note: The field will remain empty in case of any error or on connect.
confidence
float
  • The level of confidence that Streaming STT API has in the accuracy of the transcription.
  • The Confidence score ranges from 0 to 1. Higher scores indicate greater relevance to the transcription
display_text
string
  • The beautified text of the final transcript.
  • If the final transcript consists of digits, URL, app names, it is quickly converted to a readable format for the user.
  • Note: The field will remain empty in case of any error or on connect

Handling Error

The Streaming STT API raises exceptions for many reasons, such as a failed connection, invalid parameters, authentication errors, and network unavailability. We provide more specific human-readable messages with an error response so that users can react to errors more.

In the API response, if the success = false, then the cause will display the reason for the error. Refer to API Codes to view the list of error messages and its description.

{
  "id": "2ab3a0b76c854953a022df742f6b3857a76494acd72e4489",
  "success": false,
  "text": "",
  "final": true,
  "confidence": 1,
  "cause": "no `domain` given"
}

API Messages

Message CodeMessageDescription
403ForbiddenEntered Invalid credentials
403ForbiddenThe provided credit limit is exhausted.
403ForbiddenThe API key provided to a user is expired.
403ForbiddenThe user is not authorized to use the STT API.
403ForbiddenThe invalid language code is passed, or the user has entered the code, which he is not authorized to use.
400no decoder for given argsFor the language and domain given in the request, STT service is not available
200unsupported audio formatThe API does not support the audio format.
200readySTT is ready to receive audio data
200partialThis is a partial response. Keep waiting for the final response.
200EOF receivedThis is the final response. The connection will be closed as EOF received in the request.
200silence detectedThis is the final response. The connection will be closed as silence is received in the request.
200timeoutThis is the final response. The connection will be closed as no data is received.
200audio too shortAudio too short to detect transcription