aiy.cloudspeech

APIs that simplify interaction with the Google Cloud Speech-to-Text service so you can convert voice commands into actions. To use this service, you must have a Google Cloud account and a corresponding credentials file. For more information, see these setup instructions.

For an example, see src/examples/voice/cloudspeech_demo.py.

Note

These APIs are designed for the Voice Kit, but have no dependency on the Voice HAT/Bonnet specifically. However, they do require some type of sound card attached to the Raspberry Pi that can be detected by the ALSA subsystem.

class aiy.cloudspeech.CloudSpeechClient(service_accout_file=None)

Bases: object

A simplified version of the Google Cloud SpeechClient class.

Parameters:service_accout_file – Absolute path to your JSON account credentials file. If None, it looks for the file at ~/cloud_speech.json. To get a credentials file, these setup instructions.
recognize(language_code='en-US', hint_phrases=None)

Performs speech-to-text for a single utterance using the default ALSA soundcard driver. Once it detects the user is done speaking, it stops listening and delivers the top result as text.

By default, this method calls start_listening() and stop_listening() as the recording begins and ends, respectively.

Parameters:
  • language_code – Language expected from the user, in IETF BCP 47 syntax (default is “en-US”). See the list of Cloud’s supported languages.
  • hint_phrase – A list of strings containing words and phrases that may be expected from the user. These hints help the speech recognizer identify them in the dialog and improve the accuracy of your results.
Returns:

The text transcription of the user’s dialog.

recognize_bytes(data, language_code='en-US', hint_phrases=None)

Performs speech-to-text for a single utterance using the given data source. Once it detects the user is done speaking, it stops listening and delivers the top result as text.

Parameters:
  • data – The audio data source. Must be encoded with a sample rate of 16000Hz.
  • language_code – Language expected from the user, in IETF BCP 47 syntax (default is “en-US”). See the list of Cloud’s supported languages.
  • hint_phrase – A list of strings containing words and phrases that may be expected from the user. These hints help the speech recognizer identify them in the dialog and improve the accuracy of your results.
Returns:

The text transcription of the user’s dialog.

start_listening()

By default, this simply prints “Start listening” to the log.

This method is provided as a convenience method that you can override in a derived class to do something else that indicates the status to the user, such as change the LED state.

Called by recognize() when recording begins.

stop_listening()

By default, this simply prints “Stop listening” to the log.

This method is provided as a convenience method that you can override in a derived class to do something else that indicates the status to the user, such as change the LED state.

Called by recognize() when recording ends.