STT longRunningRecognize in Cloud function

https://gigazine.net/gsc_news/en/20180824-speech-to-text-gcp-cloud-mojiokoshi/

Asynchronous speech recognition starts a long running audio processing operation. Use asynchronous speech recognition to recognize audio that is longer than a minute. For shorter audio, Synchronous Speech Recognition is faster and simpler.

You can retrieve the results of the operation via the google.longrunning.Operations interface. Results remain available for retrieval for 5 days (120 hours). Audio content can be sent directly to Cloud Speech-to-Text or it can process audio content that already resides in Google Cloud Storage. See also the audio limits for asynchronous speech recognition requests.

from https://cloud.google.com/speech-to-text/docs/async-recognize

This resource represents a long-running operation that is the result of a network API call.

{
  "name": string,
  "metadata": {
    "@type": string,
    field1: ...,
    ...
  },
  "done": boolean,

  // Union field result can be only one of the following:
  "error": {
    object (Status)
  },
  "response": {
    "@type": string,
    field1: ...,
    ...
  }
  // End of list of possible types for union field result.
}

done - boolean

If the value is false, it means the operation is still in progress. If true, the operation is completed, and either error or response is available.

Union field result. The operation result, which can be either an error or a valid response. If done == false, neither error nor response is set. If done == true, exactly one of error or response is set. result can be only one of the following:

from https://cloud.google.com/speech-to-text/docs/reference/rest/v1/operations#resource-operation

From this example using await it seems like there isn't a way to get response before the result is ready. Which would not work in a cloud function.

But in in the SDK Client Refernece, SpeechClient, code example https://googleapis.dev/nodejs/speech/latest/v1p1beta1.SpeechClient.html#longRunningRecognize-examples

SDK

Setup

The either

responses example, from second then .I've shorten the data in Buffer attributes for brevity.

  • first element of the response array is the results of GCP STT

  • second element of the response array is the metadata

  • third element of the response array is the final api response

Or

initialApiResponseexample

Or

operations.get

operations.get

Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.

in the example below, operationNameis the name that you get in initialApiResponse. while firebaseApiKey you can find it in google cloud console. You do not need this API key if calling this end point within a firebase function.

from https://cloud.google.com/speech-to-text/docs/reference/rest/v1/operations/get

Where name is a string and is

The name of the operation resource.

Running STT long recognise in cloud function

If the audio takes to long to transcribe and the cloud function times out, from the name attribute of the initialApiResponse it would be possible to do pooling and gets the latest state of a long-running operation. Including result when it's done. via the operations.get.

for the firebase API key project settings in console, under web api key

Last updated

Was this helpful?