Speech-to-Text: Automatic Speech Recognition

Jump to

Speech-to-Text

Accurately convert speech into text using an API powered by Google’s AI technologies. Try it for free ● Created with Sketch. Transcribe your content in real time or from stored files ● Created with Sketch. Deliver a better user experience in products through voice commands ● Created with Sketch. Gain insights from customer interactions to improve your service Gartner logo

Gartner names Google Cloud a Leader in the 2020 Magic Quadrant for Cloud AI Developer Services.

Learn more Benefits

State-of-the-art accuracy

Apply Google’s most advanced deep learning neural network algorithms for automatic speech recognition (ASR).

Global reach

Meet your users where they are, globally, with voice recognition that supports more than 125 languages and variants.

Flexible deployment

Deploy speech recognition wherever you need, whether in the cloud with the API or on-premises with Speech-to-Text On-Prem. Demo

Put Speech-to-Text into action

Key features

Speech adaptation

Customize speech recognition to transcribe domain-specific terms and rare words by providing hints and boost your transcription accuracy of specific words or phrases. Automatically convert spoken numbers into addresses, years, currencies, and more using classes.

Domain-specific models

Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.

Streaming speech recognition

Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage).

Speech-to-Text On-Prem

Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started. View all features Blog image speech to text

BLOG Enhanced models and features now available in new languages Customers

Customers

Castbox uses Speech-to-Text to deliver its in-audio search service for podcasts.

Read the story

Story highlights

Enabling users to search audio content for words or phrases Audio-to-text conversion accuracy rates of greater than 96% Typical search queries with latency of just 50 milliseconds

Industry

Technology

Voximplant uses Speech-to-Text to help companies build voice solutions and boost the number of calls they can handle.

InteractiveTel uses Speech-to-Text to provide accurate analysis of voice communications and increased customer satisfaction to its clients.

With Speech-to-Text and Vision API, Ananda Development created a mobile application to automate and streamline condominium inspections. See all customers What's new

What's new

Video Next '20 OnAir: Measuring and improving Speech-to-Text accuracy Watch video

Video Automated Subtitles with AI Watch video YouTube video image

Video Solving for accessible phone calls with Speech-to-Text and Text-to-Speech Watch video Speech to text logo

Video Getting Started with Converting speech to text with Node.js Watch video

Documentation

Google Cloud Basics

Speech-to-Text basics

Learn the fundamental concepts in Speech-to-Text. Learn more Quickstart

Quickstart: Using the gcloud tool

Send an audio transcription request to Speech-to-Text using the gcloud tool from the command line. Learn more Best Practice

Best practices

Review the best practices for transcribing audio with Speech-to-Text. Learn more Google Cloud Basics

Supported languages

Learn which languages are available for Speech-to-Text, plus the features and recognition models available for each. Learn more Google Cloud Basics

Speech-to-Text On-Prem

Learn more about Speech-to-Text On-Prem, which enables easy integration of Google speech recognition technology into your on-premises solutions. Learn more

Not seeing what you’re looking for?

View all product documentation

Explore more docs

Quickstarts Get a quick intro to using this product. How-to guides Learn to complete specific tasks with this product. Tutorials Browse walkthroughs of common uses and scenarios for this product. APIs & references View APIs, references, and other resources for this product. Release notes Read about the latest releases for Speech-to-Text Use cases

Use cases

Use case Improve customer service Empower your customer service system by adding IVR (interactive voice response) and agent conversations to your call centers. Perform analytics on your conversation data to gain more insights into the calls and your customers. Speech-to-Text and its enhanced phone call models are already powering Google Cloud’s powerful solution, Contact Center AI. Using contact center AI with speech to text technology to improve customer service

Using contact center AI with speech to text technology to improve customer service

Use case Enable voice control Implement voice commands such as “turn the volume up,” and voice search such as saying “what is the temperature in Paris?” Combine this with the Text-to-Speech API to deliver voice-enabled experiences in IoT (Internet of Things) applications. Workflow of voice control using speech to text API

Workflow of voice control using speech to text API

Use case Transcribe multimedia content Transcribe your audio and video to include captions and improve your audience reach and experience. Add subtitles to your content real time to your streaming content. Our video transcription model is ideal for indexing or subtitling video and/or multispeaker content and uses machine learning technology that is similar to video captioning on YouTube. Transcribe multimedia content workflow

View all technical guides All features

All features

Global vocabulary	Support your global user base with Speech-to-Text’s extensive language support in over 125 languages and variants.
Streaming speech recognition	Receive real-time speech recognition results as the API processes the audio input streamed from your application’s microphone or sent from a prerecorded audio file (inline or through Cloud Storage).
Speech adaptation	Customize speech recognition to transcribe domain-specific terms and rare words by providing hints and boost your transcription accuracy of specific words or phrases. Automatically convert spoken numbers into addresses, years, currencies, and more using classes.
Speech-to-Text On-Prem	Have full control over your infrastructure and protected speech data while leveraging Google’s speech recognition technology on-premises, right in your own private data centers. Contact sales to get started.
Multichannel recognition	Speech-to-Text can recognize distinct channels in multichannel situations (e.g., video conference) and annotate the transcripts to preserve the order.
Noise robustness	Speech-to-Text can handle noisy audio from many environments without requiring additional noise cancellation.
Domain-specific models	Choose from a selection of trained models for voice control and phone call and video transcription optimized for domain-specific quality requirements. For example, our enhanced phone call model is tuned for audio originated from telephony, such as phone calls recorded at an 8khz sampling rate.
Content filtering	Profanity filter helps you detect inappropriate or unprofessional content in your audio data and filter out profane words in text results.
Auto-detect language (beta)	Specify up to four language codes and Speech-to-Text will identify the correct language spoken in multilingual scenarios.
Automatic punctuation (beta)	Speech-to-Text accurately punctuates transcriptions (e.g., commas, question marks, and periods).
Speaker diarization (beta)	Know who said what by receiving automatic predictions about which of the speakers in a conversation spoke each utterance.

Pricing

The first 60 minutes of Speech-to-Text successfully processed each month is free, then it is priced per 15 seconds of audio. Specific rates vary depending on the model used, if there is data logging, and the number of audio channels. View pricing details

Take the next step

Start building on Google Cloud with $300 in free credits and 20+ always free products. Try it for free Need help getting started? Contact sales Work with a trusted partner Find a partner Continue browsing See all products ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Sign up for the Google Cloud newsletter

Aug	SEP	Oct
	20
2019	2020	2021

Speech-to-Text

Gartner names Google Cloud a Leader in the 2020 Magic Quadrant for Cloud AI Developer Services.

State-of-the-art accuracy

Global reach

Flexible deployment

Put Speech-to-Text into action

Key features

Speech adaptation

Domain-specific models

Streaming speech recognition

Speech-to-Text On-Prem

Customers

Castbox uses Speech-to-Text to deliver its in-audio search service for podcasts.

Story highlights

Industry

What's new

Documentation

Speech-to-Text basics

Quickstart: Using the gcloud tool

Best practices

Supported languages

Speech-to-Text On-Prem

Not seeing what you’re looking for?

Explore more docs

Use cases

All features

Pricing

Take the next step

Why Google

Products and pricing

Solutions

Resources

Engage