STT longRunningRecognize in Cloud function
https://gigazine.net/gsc_news/en/20180824-speech-to-text-gcp-cloud-mojiokoshi/
Based on what I found in the docs, see below, it seems like it could be possible to start the STT operation via a cloud function, and then check on the results later.
Asynchronous speech recognition starts a long running audio processing operation. Use asynchronous speech recognition to recognize audio that is longer than a minute. For shorter audio, Synchronous Speech Recognition is faster and simpler.
You can retrieve the results of the operation via the google.longrunning.Operations interface. Results remain available for retrieval for 5 days (120 hours). Audio content can be sent directly to Cloud Speech-to-Text or it can process audio content that already resides in Google Cloud Storage. See also the audio limits for asynchronous speech recognition requests.
from https://cloud.google.com/speech-to-text/docs/async-recognize
This resource represents a long-running operation that is the result of a network API call.
{
"name": string,
"metadata": {
"@type": string,
field1: ...,
...
},
"done": boolean,
// Union field result can be only one of the following:
"error": {
object (Status)
},
"response": {
"@type": string,
field1: ...,
...
}
// End of list of possible types for union field result.
}done - boolean
If the value is
false, it means the operation is still in progress. Iftrue, the operation is completed, and eithererrororresponseis available.Union field
result. The operation result, which can be either anerroror a validresponse. Ifdone==false, neithererrornorresponseis set. Ifdone==true, exactly one oferrororresponseis set.resultcan be only one of the following:
from https://cloud.google.com/speech-to-text/docs/reference/rest/v1/operations#resource-operation
From this example using await it seems like there isn't a way to get response before the result is ready. Which would not work in a cloud function.
But in in the SDK Client Refernece, SpeechClient, code example https://googleapis.dev/nodejs/speech/latest/v1p1beta1.SpeechClient.html#longRunningRecognize-examples
SDK
Setup
The either
responses example, from second then .I've shorten the data in Buffer attributes for brevity.
first element of the response array is the results of GCP STT
second element of the response array is the metadata
third element of the response array is the final api response
Or
initialApiResponseexample
Or
operations.get
operations.get
Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.
in the example below, operationNameis the name that you get in initialApiResponse. while firebaseApiKey you can find it in google cloud console. You do not need this API key if calling this end point within a firebase function.
from https://cloud.google.com/speech-to-text/docs/reference/rest/v1/operations/get
Where name is a string and is
The name of the operation resource.
Running STT long recognise in cloud function
If the audio takes to long to transcribe and the cloud function times out, from the name attribute of the initialApiResponse it would be possible to do pooling and gets the latest state of a long-running operation. Including result when it's done. via the operations.get.
for the firebase API key project settings in console, under web api key
Last updated
Was this helpful?