> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reverieinc.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Speech To Speech

> This Python script demonstrates a real-time Speech-to-Speech (STS) translation system using the Reverie SDK. The script captures speech from the microphone, transcribes it into text, translates it into a target language, and finally converts the translated text back into speech, which is then played back to the user.

### Install Dependencies

<CodeGroup>
  ```bash Python theme={null}
  pip install pyaudio 
  pip install reverie-sdk
  pip install python-dotenv
  pip install pydub
  ```
</CodeGroup>

### How does it Works?

The main steps involved in the script are as follows:<br />

1. **Environment Setup**:<br />
   The script loads essential API credentials (REVERIE\_APP\_ID and REVERIE\_API\_KEY) from environment variables using the dotenv package. These credentials are required to authenticate with the Reverie API.

```python Python theme={null}
import os
import dotenv
from reverie_sdk import ReverieClient

dotenv.load_dotenv("../.env")
REVERIE_APP_ID = os.environ.get("REVERIE_APP_ID")
REVERIE_API_KEY = os.environ.get("REVERIE_API_KEY")
client = ReverieClient(
    api_key=REVERIE_API_KEY,
    app_id=REVERIE_APP_ID,
)
```

2. **Real-Time Audio Capture**:<br />
   The script uses pyaudio to capture real-time audio input from the microphone. This audio is then streamed asynchronously to the Reverie ASR (Automatic Speech Recognition) service for transcription into text.

```python Python theme={null}
import pyaudio
import asyncio
from reverie_sdk.services.asr import AudioStream

stream = AudioStream()
pa = pyaudio.PyAudio()

def mic_callback(in_data, frame_count, time_info, status):
    try:
        asyncio.run(stream.add_chunk_async(in_data))
    except Exception as e:
        print(f"Error in mic callback: {e}")
        return (None, pyaudio.paAbort)
    return (None, pyaudio.paContinue)

pa_stream = pa.open(
    rate=16000,
    channels=1,
    format=pyaudio.paInt16,
    frames_per_buffer=1024,
    input=True,
    stream_callback=mic_callback,
)
pa_stream.start_stream()
```

3. **Speech-to-Text (STT) Conversion**:<br />
   The captured audio is processed and transcribed using Reverie’s ASR service. The transcription occurs in real-time, providing immediate feedback as the speech is converted into text.

```python Python theme={null}
await client.asr.stt_stream_async(
    src_lang=source_lang,
    bytes_or_stream=stream,
    callback=asr_callback,
    format="16k_int16",
    punctuate="true",
)
```

4. **Translation**:<br />
   Once the transcription is complete, the text is sent to Reverie’s NMT (Neural Machine Translation) service to be translated from the source language (e.g., Hindi) to a target language (e.g., English). The translated text is returned for further processing.

```python Python theme={null}
src_lang = "hi"
tgt_lang = "en"

resp: str = asyncio.run(speech_to_text(src_lang))

if len(resp.strip()) > 0:
    tgt_resp = (
        client.nmt.localization(
            [resp],
            src_lang=src_lang,
            tgt_lang=tgt_lang,
            enableTransliteration=False,
            enableNmt=True,
        )
        .responseList[0]
        .outString
    )
    print(tgt_resp)
```

5. **Text-to-Speech (TTS) Conversion**:<br />
   The translated text is then sent to Reverie’s TTS (Text-to-Speech) service to convert it into speech. The generated audio is played back to the user, providing a seamless speech-to-speech translation experience.

```python Python theme={null}
tts_resp = client.tts.tts(
    f"{tgt_lang}_female",
    tgt_resp,
)
print(tts_resp)
```

6. **Optional Audio Saving**:<br />
   The generated speech audio can optionally be saved to a file (e.g., .wav) for later use or playback.

```python Python theme={null}
tts_resp.save_audio("./sample.wav", overwrite_existing=True)
```

7. **Audio Playback**:<br />
   Using the pydub library, the resulting audio is played back in real-time to the user, completing the cycle of speech-to-text, translation, and speech-to-speech output.

```python Python theme={null}
from pydub import AudioSegment, playback
import io

playback.play(AudioSegment.from_file(io.BytesIO(tts_resp.audio_bytes), format="wav"))
```

8. **Error Handling**:<br />
   The script includes error handling to manage potential issues during the microphone capture, transcription, translation, and speech synthesis processes.

```python Python theme={null}
import traceback

try:
    # [Audio streaming, transcription, translation, and TTS code]
except Exception as err:
    print(f"Error in Speech-to-Text: {err}")
    traceback.print_exc()
finally:
    pa_stream.stop_stream()
    pa_stream.close()
```

### Sample Code

<CodeGroup>
  ```python Python theme={null}
  import asyncio
  import io
  import os
  import traceback
  import pyaudio
  from pydub import AudioSegment, playback
  from reverie_sdk import ReverieClient
  from reverie_sdk.services.asr import ReverieAsrResult, AudioStream
  import dotenv

  dotenv.load_dotenv("../.env")

  REVERIE_APP_ID = os.environ.get("REVERIE_APP_ID")
  REVERIE_API_KEY = os.environ.get("REVERIE_API_KEY")

  client = ReverieClient(
      api_key=REVERIE_API_KEY,
      app_id=REVERIE_APP_ID,
  )


  async def speech_to_text(source_lang):
      stream = AudioStream()
      pa = pyaudio.PyAudio()

      def mic_callback(in_data, frame_count, time_info, status):
          try:
              asyncio.run(stream.add_chunk_async(in_data))
          except Exception as e:
              print(f"Error in mic callback: {e}")
              return (None, pyaudio.paAbort)
          return (None, pyaudio.paContinue)

      pa_stream = pa.open(
          rate=16000,
          channels=1,
          format=pyaudio.paInt16,
          frames_per_buffer=1024,
          input=True,
          stream_callback=mic_callback,
      )
      pa_stream.start_stream()

      print("Listening at STT..")

      asr_responses: list[ReverieAsrResult] = []

      def asr_callback(resp: ReverieAsrResult):
          print(resp)
          asr_responses.append(resp)

      try:
          await client.asr.stt_stream_async(
              src_lang=source_lang,
              bytes_or_stream=stream,
              callback=asr_callback,
              format="16k_int16",
              punctuate="true",
          )

          while True:
              if len(asr_responses) == 0:
                  continue

              if asr_responses[-1].final:
                  print(asr_responses[-1].display_text)
                  return asr_responses[-1].display_text

      except Exception as err:
          print(f"Error in Speech-to-Text: {err}")
          traceback.print_exc()

      finally:
          pa_stream.stop_stream()
          pa_stream.close()


  src_lang = "hi"
  tgt_lang = "en"


  resp: str = asyncio.run(speech_to_text(src_lang))

  if len(resp.strip()) > 0:
      tgt_resp = (
          client.nmt.localization(
              [resp],
              src_lang=src_lang,
              tgt_lang=tgt_lang,
              enableTransliteration=False,
              enableNmt=True,
          )
          .responseList[0]
          .outString
      )
      print(tgt_resp)

      tts_resp = client.tts.tts(
          f"{tgt_lang}_female",
          tgt_resp,
      )
      print(tts_resp)
      
      # optional: save the audio
      # tts_resp.save_audio("./sample.wav", overwrite_existing=True)

      playback.play(AudioSegment.from_file(io.BytesIO(tts_resp.audio_bytes), format="wav"))    
  ```
</CodeGroup>
