Transcribe an audio voice file to text with Google Cloud Speech API



This video is going to show you how to use Google Cloud’s Speech API to convert an audio recording of someone speaking to text. We’ll use Kotlin and the Google Speech client for best results.

What you will learn:
- Correct encoding for selecting audio files
- How to handle response from the speech api
- What do alternative translations look like
- How to convert a file of speech to text

Google offers a speech recognition API, that is able to convert your spoken language into a textual representation: Cloud Speech to Text.

Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google’s machine learning technology.

Some other ideas of why you would want to convert text to speech include:

Using Cloud Speech-to-Text you can identify what language is spoken in the utterance (limit to four languages). This can be used for voice search (such as, “What is the temperature in Paris?”) and command use cases (such as, “Turn the volume up.”)


