Use Cases
Speech To Speech
This Python script demonstrates a real-time Speech-to-Speech (STS) translation system using the Reverie SDK. The script captures speech from the microphone, transcribes it into text, translates it into a target language, and finally converts the translated text back into speech, which is then played back to the user.
Install Dependencies
How does it Works?
The main steps involved in the script are as follows:
- Environment Setup:
The script loads essential API credentials (REVERIE_APP_ID and REVERIE_API_KEY) from environment variables using the dotenv package. These credentials are required to authenticate with the Reverie API.
Python
- Real-Time Audio Capture:
The script uses pyaudio to capture real-time audio input from the microphone. This audio is then streamed asynchronously to the Reverie ASR (Automatic Speech Recognition) service for transcription into text.
Python
- Speech-to-Text (STT) Conversion:
The captured audio is processed and transcribed using Reverie’s ASR service. The transcription occurs in real-time, providing immediate feedback as the speech is converted into text.
Python
- Translation:
Once the transcription is complete, the text is sent to Reverie’s NMT (Neural Machine Translation) service to be translated from the source language (e.g., Hindi) to a target language (e.g., English). The translated text is returned for further processing.
Python
- Text-to-Speech (TTS) Conversion:
The translated text is then sent to Reverie’s TTS (Text-to-Speech) service to convert it into speech. The generated audio is played back to the user, providing a seamless speech-to-speech translation experience.
Python
- Optional Audio Saving:
The generated speech audio can optionally be saved to a file (e.g., .wav) for later use or playback.
Python
- Audio Playback:
Using the pydub library, the resulting audio is played back in real-time to the user, completing the cycle of speech-to-text, translation, and speech-to-speech output.
Python
- Error Handling:
The script includes error handling to manage potential issues during the microphone capture, transcription, translation, and speech synthesis processes.
Python