Michael的雲端筆記本: 簡易Bing Speech STT WebSocket API Wrapper

Bing Speech API除了REST API介面之外，也提供了WebSocket介面；不過只有Android、iOS以及Windows平台才有提供相對應WebSocket的API。其他平台如果可以使用WebSocket，則可以依照Protocol文件(https://docs.microsoft.com/zh-tw/azure/cognitive-services/speech/api-reference-rest/websocketprotocol) 實作以WebSocket與Bing Speech API溝通。

Bing Speech To Text API提供了三種辨識的方式：Interactive、Conversation及Dictaction；每種方式有各自的Endpoint；連線時需要依據辨識方式連接到不同的端點。

Mode	Path
Interactive	/speech/recognize/interactive/cognitiveservices/v1
Conversation	/speech/recognize/conversation/cognitiveservices/v1
Dictation	/speech/recognize/dictation/cognitiveservices/v1

建立連線時，需要以Query String的方式帶入驗證資訊在網址中，例如：

Var url = 'wss://speech.platform.bing.com/speech/recognition/interactive/cognitiveservices/v1?' +

'format=' + defaultOptions.format +

'&language=' + defaultOptions.language +

'&Ocp-Apim-Subscription-Key=' + subscriptionKey +

'&X-ConnectionId=' + guid;

其中，format為simple或是detailed；language為輸入語系；Ocp-Apim-Subscribtion-Key為Bing Speech API的Subscription Key；X-ConnectionId為Guid。此ConnectionId在整個連線中是一致的。

連線成功之後，接下來需要送出speech.config訊息，通知Bing Speech API設備端的能力。

接著就可以發送audio訊息，將要辨識的語音送到Bing Speech API辨識；辨識過程中，WebSocket Client端會收到Server端回應的turn.start、speech.startDetected、speech.hypothesis、speech.endDetected、speech.phrase及turn.end。

其中，speech.phrase為辨識的結果。而當Client收到turn.end時，表示辨識完成，client端需要回應telemetry訊息回報各種數據。

這裡是一個簡單的WebSocket API Wrapper：https://github.com/michael-chi/BingStt-Websocket

Michael的雲端筆記本

2017年6月29日

簡易Bing Speech STT WebSocket API Wrapper

沒有留言:

Blog Archive

About Me