@datafire/google_speech
Client library for Cloud Speech-to-Text API
Installation and Usage
npm install --save @datafire/google_speech
let google_speech = require('@datafire/google_speech').create({
access_token: "",
refresh_token: "",
client_id: "",
client_secret: "",
redirect_uri: ""
});
.then(data => {
console.log(data);
});
Description
Converts audio to text by applying powerful neural network models.
Actions
oauthCallback
Exchange the code passed to your redirect URI for an access_token
google_speech.oauthCallback({
"code": ""
}, context)
Input
- input
object
- code required
string
- code required
Output
- output
object
- access_token
string
- refresh_token
string
- token_type
string
- scope
string
- expiration
string
- access_token
oauthRefresh
Exchange a refresh_token for an access_token
google_speech.oauthRefresh(null, context)
Input
This action has no parameters
Output
- output
object
- access_token
string
- refresh_token
string
- token_type
string
- scope
string
- expiration
string
- access_token
speech.projects.locations.operations.get
Gets the latest state of a long-running operation. Clients can use this method to poll the operation result at intervals as recommended by the API service.
google_speech.speech.projects.locations.operations.get({
"name": ""
}, context)
Input
- input
object
- name required
string
: The name of the operation resource. - $.xgafv
string
(values: 1, 2): V1 error format. - access_token
string
: OAuth access token. - alt
string
(values: json, media, proto): Data format for response. - callback
string
: JSONP - fields
string
: Selector specifying which fields to include in a partial response. - key
string
: API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token. - oauth_token
string
: OAuth 2.0 token for the current user. - prettyPrint
boolean
: Returns response with indentations and line breaks. - quotaUser
string
: Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters. - upload_protocol
string
: Upload protocol for media (e.g. "raw", "multipart"). - uploadType
string
: Legacy upload protocol for media (e.g. "media", "multipart").
- name required
Output
- output Operation
speech.projects.locations.operations.list
Lists operations that match the specified filter in the request. If the server doesn't support this method, it returns UNIMPLEMENTED
. NOTE: the name
binding allows API services to override the binding to use different resource name schemes, such as users/*/operations
. To override the binding, API services can add a binding such as "/v1/{name=users/*}/operations"
to their service configuration. For backwards compatibility, the default name includes the operations collection id, however overriding users must ensure the name binding is the parent resource, without the operations collection id.
google_speech.speech.projects.locations.operations.list({
"name": ""
}, context)
Input
- input
object
- name required
string
: The name of the operation's parent resource. - filter
string
: The standard list filter. - pageSize
integer
: The standard list page size. - pageToken
string
: The standard list page token. - $.xgafv
string
(values: 1, 2): V1 error format. - access_token
string
: OAuth access token. - alt
string
(values: json, media, proto): Data format for response. - callback
string
: JSONP - fields
string
: Selector specifying which fields to include in a partial response. - key
string
: API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token. - oauth_token
string
: OAuth 2.0 token for the current user. - prettyPrint
boolean
: Returns response with indentations and line breaks. - quotaUser
string
: Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters. - upload_protocol
string
: Upload protocol for media (e.g. "raw", "multipart"). - uploadType
string
: Legacy upload protocol for media (e.g. "media", "multipart").
- name required
Output
- output ListOperationsResponse
Definitions
ListOperationsResponse
- ListOperationsResponse
object
: The response message for Operations.ListOperations.- nextPageToken
string
: The standard List next-page token. - operations
array
: A list of operations that matches the specified filter in the request.- items Operation
- nextPageToken
LongRunningRecognizeMetadata
- LongRunningRecognizeMetadata
object
: Describes the progress of a long-runningLongRunningRecognize
call. It is included in themetadata
field of theOperation
returned by theGetOperation
call of thegoogle::longrunning::Operations
service.- lastUpdateTime
string
: Output only. Time of the most recent processing update. - progressPercent
integer
: Output only. Approximate percentage of audio processed thus far. Guaranteed to be 100 when the audio is fully processed and the results are available. - startTime
string
: Output only. Time when the request was received. - uri
string
: The URI of the audio file being transcribed. Empty if the audio was sent as byte content.
- lastUpdateTime
LongRunningRecognizeResponse
- LongRunningRecognizeResponse
object
: The only message returned to the client by theLongRunningRecognize
method. It contains the result as zero or more sequential SpeechRecognitionResult messages. It is included in theresult.response
field of theOperation
returned by theGetOperation
call of thegoogle::longrunning::Operations
service.- results
array
: Output only. Sequential list of transcription results corresponding to sequential portions of audio.- items SpeechRecognitionResult
- results
Operation
- Operation
object
: This resource represents a long-running operation that is the result of a network API call.- done
boolean
: If the value isfalse
, it means the operation is still in progress. Iftrue
, the operation is completed, and eithererror
orresponse
is available. - error Status
- metadata
object
: Service-specific metadata associated with the operation. It typically contains progress information and common metadata such as create time. Some services might not provide such metadata. Any method that returns a long-running operation should document the metadata type, if any. - name
string
: The server-assigned name, which is only unique within the same service that originally returns it. If you use the default HTTP mapping, thename
should be a resource name ending withoperations/{unique_id}
. - response
object
: The normal response of the operation in case of success. If the original method returns no data on success, such asDelete
, the response isgoogle.protobuf.Empty
. If the original method is standardGet
/Create
/Update
, the response should be the resource. For other methods, the response should have the typeXxxResponse
, whereXxx
is the original method name. For example, if the original method name isTakeSnapshot()
, the inferred response type isTakeSnapshotResponse
.
- done
SpeechRecognitionAlternative
- SpeechRecognitionAlternative
object
: Alternative hypotheses (a.k.a. n-best list).- confidence
number
: Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for the top alternative of a non-streaming result or, of a streaming result whereis_final=true
. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicatingconfidence
was not set. - transcript
string
: Output only. Transcript text representing the words that the user spoke. - words
array
: Output only. A list of word-specific information for each recognized word. Note: Whenenable_speaker_diarization
is true, you will see all the words from the beginning of the audio.- items WordInfo
- confidence
SpeechRecognitionResult
- SpeechRecognitionResult
object
: A speech recognition result corresponding to a portion of the audio.- alternatives
array
: Output only. May contain one or more recognition hypotheses (up to the maximum specified inmax_alternatives
). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer. - channelTag
integer
: Output only. For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. Foraudio_channel_count
= N, its output values can range from1
toN
. - languageCode
string
: Output only. The BCP-47 language tag of the language in this result. This language code was detected to have the most likelihood of being spoken in the audio.
- alternatives
Status
- Status
object
: TheStatus
type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by gRPC. EachStatus
message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the API Design Guide.- code
integer
: The status code, which should be an enum value of google.rpc.Code. - details
array
: A list of messages that carry the error details. There is a common set of message types for APIs to use.- items
object
- items
- message
string
: A developer-facing error message, which should be in English. Any user-facing error message should be localized and sent in the google.rpc.Status.details field, or localized by the client.
- code
WordInfo
- WordInfo
object
: Word-specific information for recognized words.- confidence
number
: Output only. The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for the top alternative of a non-streaming result or, of a streaming result whereis_final=true
. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicatingconfidence
was not set. - endOffset
string
: Output only. Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set ifenable_word_time_offsets=true
and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary. - speakerTag
integer
: Output only. A distinct integer value is assigned for every speaker within the audio. This field specifies which one of those speakers was detected to have spoken this word. Value ranges from1
todiarization_config.max_speaker_count
.speaker_tag
is set ifdiarization_config.enable_speaker_diarization
=true
and only in the top alternative. - startOffset
string
: Output only. Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set ifenable_word_time_offsets=true
and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary. - word
string
: Output only. The word corresponding to this set of information.
- confidence