Batch translate audio languages
This document shows how to use the Speech-to-Speech Translation Batch solution, designed to translate speech audio from one language to another.
Summary
The Speech-to-Speech Translation solution leverages Google Cloud services, including Speech-to-Text, Translation API, Text-to-Speech, as well as Cloud Storage to perform the following chained steps:
- Speech-to-Text: The source audio file is transcribed using the Speech-to-Text API, converting it to text.
- Translation: The transcribed text is translated to the target language using the Translation API.
- Text-to-Speech: The translated text is synthesized into speech using the Text-to-Speech API, generating an audio file in the target language.
Architecture
Costs
This solution uses billable components of Google Cloud, including the following:
Use the pricing calculator to generate a cost estimate based on your projected usage.
Installation
-
Prerequisites:
- Python 3.7 or higher
- Google Cloud SDK installed and configured
- Google Cloud project with billing enabled
-
Following Google Cloud services enabled:
Use link Enable API or the command below:
-
Clone the repository and change to its directory
Note: it is recommended to use venv or pyenv virtualenv to create an isolated environment for this application and its imported modules. Please refer to the links in this step for more information on virtual environment
-
Install dependencies:
Usage example
To use the application, follow these steps:
-
Create a configuration file:
The application requires a configuration file
config.ini
in the application's main directory. You can use the below example as a starting point and replace the values with yours.[parameters] project_id = your-project-ID location = us-central1 gcs_path = gs://your-bucket/optional-path stt_model = long stt_timeout = 300 stt_alternative = 0 input_audio_file_name = input.wav input_audio_file_path = your-server-local-path/ output_audio_file_name = output.wav source_language_code = en-us target_language = fi target_language_code = fi-FI target_voice = fi-FI-Wavenet-A target_voice_gender = female tts_timeout = 300 log = info filename_prefix = timestamp
Optionally, you can override the configuration file parameter values with command line switches of the same name. If both the configuration file and the command line arguments contain the same parameter, the command line argument's value will be used. You can execute the following command to see the command line arguments:
-
Authenticate the shell:
Execute the following command to authenticate your shell with the Google Cloud project in use:
-
Run the application:
After preparing the config.ini file, run the application with e.g:
The application will output its progress including the output files. Example:
... 2024-06-05 01:16:51,380 - root - INFO - Uploading input/en.wav to gs://bucket/path/en.wav 2024-06-05 01:16:53,690 - root - INFO - Waiting for STT operation to complete... timeout: 300s 2024-06-05 01:17:10,088 - root - INFO - Translating text to: fi 2024-06-05 01:17:11,051 - root - INFO - Waiting for TTS operation to complete... timeout: 300s 2024-06-05 01:17:28,616 - root - INFO - TTS output written to: gs://bucket/path/2024-06-05T01:16:49.593763_out.wav 2024-06-05 01:17:30,956 - root - INFO - Transcript written to: gs://bucket/path/2024-06-05T01:16:49.593763_transcript.txt 2024-06-05 01:17:32,982 - root - INFO - Translation written to: gs://bucket/path/2024-06-05T01:16:49.593763_translation.txt
Listing available Translation languages
To see the available languages supported by the Translation API, execute the following command:
The command will output the supported languages as follows (snippet):
Listing available Text to Speech voices
To see the available voices and genders supported by the Text to Speech API, execute the following command:
The command will output the supported voice and gender options as follows (snippet):
Name: af-ZA-Standard-A
Supported language: af-ZA
SSML Voice Gender: FEMALE
Natural Sample Rate Hertz: 24000
Name: am-ET-Standard-A
Supported language: am-ET
SSML Voice Gender: FEMALE
Natural Sample Rate Hertz: 24000
Name: am-ET-Standard-B
Supported language: am-ET
SSML Voice Gender: MALE
Natural Sample Rate Hertz: 24000
...
Additional notes and limitations
- The output quality of each interim stage is dependent on the AI API in use. E.g speech transcribing, language translation, and text to speech generation.
- The data processing format (such as audio codec), duration, file size etc limitations are those of the respective API services used.
- The
stt_model
parameter can be set to different Speech-to-Text models based on your needs. Please refer to the Speech-to-Text API documentation for details. - The
target_voice
parameter andtarget_voice_gender
can be set to different Text-to-Speech voices based on the target language. Please refer to the Text-to-Speech API documentation for details. - The
output_interim_files
parameter allows you to save the transcribed text and translated text, in addition to the final audio output.
This user guide provides a comprehensive overview of the Speech-to-Speech Translation application. For more detailed information on Google Cloud services, refer to the official documentation.
License
This is not an Official Google Product.
Licensed under the Apache License, Version 2.0 (the "License")