STT longRunningRecognize in Cloud function

https://gigazine.net/gsc_news/en/20180824-speech-to-text-gcp-cloud-mojiokoshi/arrow-up-right

circle-exclamation

Asynchronous speech recognition starts a long running audio processing operation. Use asynchronous speech recognition to recognize audio that is longer than a minute. For shorter audio, Synchronous Speech Recognitionarrow-up-right is faster and simpler.

You can retrieve the results of the operation via the google.longrunning.Operationsarrow-up-right interface. Results remain available for retrieval for 5 days (120 hours). Audio content can be sent directly to Cloud Speech-to-Text or it can process audio content that already resides in Google Cloud Storage. See also the audio limitsarrow-up-right for asynchronous speech recognition requests.

from https://cloud.google.com/speech-to-text/docs/async-recognizearrow-up-right

This resource represents a long-running operation that is the result of a network API call.

{
  "name": string,
  "metadata": {
    "@type": string,
    field1: ...,
    ...
  },
  "done": boolean,

  // Union field result can be only one of the following:
  "error": {
    object (Status)
  },
  "response": {
    "@type": string,
    field1: ...,
    ...
  }
  // End of list of possible types for union field result.
}

done - boolean

If the value is false, it means the operation is still in progress. If true, the operation is completed, and either error or response is available.

Union field result. The operation result, which can be either an error or a valid response. If done == false, neither error nor response is set. If done == true, exactly one of error or response is set. result can be only one of the following:

from https://cloud.google.com/speech-to-text/docs/reference/rest/v1/operations#resource-operationarrow-up-right

From this example using await it seems like there isn't a way to get response before the result is ready. Which would not work in a cloud function.

But in in the SDK Client Refernece, SpeechClient, code example https://googleapis.dev/nodejs/speech/latest/v1p1beta1.SpeechClient.html#longRunningRecognize-examplesarrow-up-right

SDK

Setup

The either

responses example, from second then .I've shorten the data in Buffer attributes for brevity.

  • first element of the response array is the results of GCP STT

  • second element of the response array is the metadata

  • third element of the response array is the final api response

Or

initialApiResponseexample

Or

operations.get

operations.get

Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.

in the example below, operationNameis the name that you get in initialApiResponse. while firebaseApiKey you can find it in google cloud consolearrow-up-right. You do not need this API key if calling this end point within a firebase function.

from https://cloud.google.com/speech-to-text/docs/reference/rest/v1/operations/getarrow-up-right

Where name is a string and is

The name of the operation resource.

Running STT long recognise in cloud function

If the audio takes to long to transcribe and the cloud function times out, from the name attribute of the initialApiResponse it would be possible to do pooling and gets the latest state of a long-running operation. Including result when it's done. via the operations.get.

for the firebase API key project settings in console, under web api key

Last updated

Was this helpful?