IBM Watson Speech Services for Web Browsers

Allows you to easily add voice recognition and synthesis to any web app with minimal code.

Warning This library is still early-stage and may see significant breaking changes.

For Web Browsers Only This library is primarily intended for use in browsers. Check out watson-developer-cloud to use Watson services (speech and others) from Node.js.

However, a server-side component is required to generate auth tokens. The examples/ folder includes example Node.js and Python servers, and SDKs are available for Node.js, Java, Python, and there is also a REST API.

See several examples at https://github.com/watson-developer-cloud/speech-javascript-sdk/tree/master/examples

This library is built with browserify and easy to use in browserify-based projects (npm install --save watson-speech), but you can also grab the compiled bundle from the dist/ folder and use it as a standalone library.

Basic API

Complete API docs should be published at http://watson-developer-cloud.github.io/speech-javascript-sdk/

All API methods require an auth token that must be generated server-side. (Snp teee examples/token-server.js for a basic example.)

`WatsonSpeech.TextToSpeech`

`.synthesize({text, token})` -> `<audio>`

Speaks the supplied text through an automatically-created <audio> element. Currently limited to text that can fit within a GET URL (this is particularly an issue on Internet Explorer before Windows 10 where the max length is around 1000 characters after the token is accounted for.)

Options:

text - the text to transcribe // todo: list supported languages
voice - the desired playback voice's name - see .getVoices(). Note that the voices are language-specific.
autoPlay - set to false to prevent the audio from automatically playing

`.getVoices()` -> Promise

Returns a promise that resolves to an array of objects containing the name, language, gender, and other details for each voice.

Requireswindow.fetch, a pollyfill for IE/Edge and older Chrome/Firefox.

`WatsonSpeech.SpeechToText`

`.recognizeMicrophone({token})` -> `RecognizeStream`

Options:

keepMic: if true, preserves the MicrophoneStream for subsequent calls, preventing additional permissions requests in Firefox
Other options passed to MediaElementAudioStream and RecognizeStream
Other options passed to WritableElementStream if options.outputElement is set

Requires the getUserMedia API, so limited browser compatibility (see http://caniuse.com/#search=getusermedia) Also note that Chrome requires https (with a few exceptions for localhost and such) - see https://www.chromium.org/Home/chromium-security/prefer-secure-origins-for-powerful-new-features

Pipes results through a {FormatStream} by default, set options.format=false to disable.

Known issue: Firefox continues to display a microphone icon in the address bar even after recording has ceased. This is a browser bug.

`.recognizeElement({element, token})` -> `RecognizeStream`

Extract audio from an <audio> or <video> element and transcribe speech.

This method has some limitations:

the audio is run through two lossy conversions: first from the source format to WebAudio, and second to l16 (raw wav) for Watson
the WebAudio API does not guarantee the same exact output for the same file played twice, so it's possible to receive slight different transcriptions for the same file played repeatedly
it transcribes the audio as it is heard, so pausing or skipping will affect the transcription
audio that is paused for too long will cause the socket to time out and disconnect, preventing further transcription (without setting things up again)

Because of these limitations, there are two alternative methods that may be preferable in some situations:

fetch the audio via ajax and then pass it to recognizeFile() - this resolves the issues around lossy conversion and inexact transcription, but the audio playback, if enabled, cannot be paused, rewound, etc.
Pre-process the audio and generate a WebVTT subtitles file to insert in <track>, completely bypassing this SDK. This resolves all of the above issues, and gives you an opportunity to review and/or edit the subtitles if desired.

Options:

element: an <audio> or <video> element (could be generated pragmatically, e.g. new Audio())
Other options passed to MediaElementAudioStream and RecognizeStream
Other options passed to WritableElementStream if options.outputElement is set

Requires that the browser support MediaElement and whatever audio codec is used in your media file.

Will automatically call .play() the element, set options.autoPlay=false to disable. Calling .stop() on the returned stream will automatically call .stop() on the element.

Pipes results through a {FormatStream} by default, set options.format=false to disable.

`.recognizeFile({data, token})` -> `RecognizeStream`

Can recognize and optionally attempt to play a File or Blob (such as from an <input type="file"/> or from an ajax request.)

Options:

data: a Blob or File instance.
play: (optional, default=false) Attempt to also play the file locally while uploading it for transcription
Other options passed to RecognizeStream
Other options passed to WritableElementStream if options.outputElement is set

playrequires that the browser support the format; most browsers support wav and ogg/opus, but not flac.) Will emit a playback-error on the RecognizeStream if playback fails. Playback will automatically stop when .stop() is called on the RecognizeStream.

Pipes results through a {TimingStream} by if options.play=true, set options.realtime=false to disable.

Pipes results through a {FormatStream} by default, set options.format=false to disable.

Class `RecognizeStream()`

A Node.js-style stream of the final text, with some helpers and extra events built in.

RecognizeStream is generally not instantiated directly but rather returned as the result of calling one of the recognize* methods.

The RecognizeStream waits until after receiving data to open a connection. If no content-type option is set, it will attempt to parse the first chunk of data to determine type.

See speech-to-text/recognize-stream.js for other options.

Methods

.promise(): returns a promise that will resolve to the final text. Note that you must either set continuous: false or call .stop() on the stream to make the promise resolve in a timely manner.
.stop(): stops the stream. No more data will be sent, but the stream may still receive additional results with the transcription of already-sent audio. Standard close event will fire once the underlying websocket is closed and end once all of the data is consumed.

Events

Follows standard Node.js stream events, in particular:

data: emits either final Strings or final/interim result objects depending on if the stream is in objectMode
end: emitted once all data has been consumed.

(Note: there are several custom events, but they are deprecated or intended for internal usage)

Class `FormatStream()`

Pipe a RecognizeStream to a format stream, and the resulting text and results events will have basic formatting applied:

Capitalize the first word of each sentence
Add a period to the end
Fix any "cruft" in the transcription
A few other tweaks for asian languages and such.

Inherits .promise() from the RecognizeStream.

Class `TimingStream()`

For use with .recognizeFile({play: true}) - slows the results down to match the audio. Pipe in the RecognizeStream (or FormatStream) and listen for results as usual.

Inherits .promise() from the RecognizeStream.

Class `WritableElementStream()`

Accepts input from RecognizeStream() and friends, writes text to supplied outputElement.

Changelog

v0.14

Moved getUserMedia shim to a standalone library
added a python token server example

v0.13

Fixed bug where continuous: false didn't close the microphone at end of recognition
Added keepMic option to recognizeMicrophone() to prevent multiple permission popups in firefox

v0.12

Added autoPlay option to synthesize()
Added proper parameter filtering to synthesize()

v0.11

renamed recognizeBlob to recognizeFile to make the primary usage more apparent
Added support for <input> and <textarea> elements when using the targetElement option (or a WritableElementStream)
For objectMode, changed defaults for word_confidence to false, alternatives to 1, and timing to off unless required for realtime option.
Fixed bug with calling .promise() on objectMode streams
Fixed bug with calling .promise() on recognizeFile({play: true})

v0.10

Added ability to write text directly to targetElement, updated examples to use this
converted examples from jQuery to vanilla JS (w/ fetch pollyfill when necessary)
significantly improved JSDoc

v0.9

Added basic text to speech support

v0.8

deprecated result events in favor of objectMode.
renamed the autoplay option to autoPlay on recognizeElement() (capital P)

v0.7

Changed playFile option of recognizeBlob() to just play, corrected default
Added options.format=true to recognize*() to pipe text through a FormatStream
Added options.realtime=options.play to recognizeBlob() to automatically pipe results through a TimingStream when playing locally
Added close and end events to TimingStream
Added delay option to TimingStream
Moved compiled binary to GitHub Releases (in addition to uncompiled source on npm).
Misc. doc and internal improvements

todo

Solidify API
break components into standalone npm modules where it makes sense
record which shim/pollyfills would be useful to extend partial support to older browsers (Promise, fetch, etc.)
run integration tests on travis (fall back to offline server for pull requests)
more tests in general
better cross-browser testing (saucelabs?)
update node-sdk to use current version of this lib's RecognizeStream (and also provide the FormatStream + anything else that might be handy)
move result and results events to node wrapper (along with the deprecation notice)
improve docs
consider a wrapper to match https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
support a "hard" stop that prevents any further data events, even for already uploaded audio, ensure timing stream also implements this.
handle pause/resume in media element streams - perhaps just stop and then create a new stream on resume, using the same token
consider moving STT core to standalone module
look for bug where single-word final results may omit word confidence (possibly due to FormatStream?)
fix bug where TimingStream shows words slightly before they're spoken
support jquery objects for element and targetElement

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
dist		dist
examples		examples
jsdoc		jsdoc
speech-to-text		speech-to-text
test		test
text-to-speech		text-to-speech
util		util
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.npmignore		.npmignore
.travis.yml		.travis.yml
README.md		README.md
index.js		index.js
karma.conf.js		karma.conf.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IBM Watson Speech Services for Web Browsers

Basic API

`WatsonSpeech.TextToSpeech`

`.synthesize({text, token})` -> `<audio>`

`.getVoices()` -> Promise

`WatsonSpeech.SpeechToText`

`.recognizeMicrophone({token})` -> `RecognizeStream`

`.recognizeElement({element, token})` -> `RecognizeStream`

`.recognizeFile({data, token})` -> `RecognizeStream`

Class `RecognizeStream()`

Methods

Events

Class `FormatStream()`

Class `TimingStream()`

Class `WritableElementStream()`

Changelog

v0.14

v0.13

v0.12

v0.11

v0.10

v0.9

v0.8

v0.7

todo

About

Uh oh!

Releases

Packages

Uh oh!

Languages

chrisemoulton/speech-javascript-sdk

Folders and files

Latest commit

History

Repository files navigation

IBM Watson Speech Services for Web Browsers

Basic API

WatsonSpeech.TextToSpeech

.synthesize({text, token}) -> <audio>

.getVoices() -> Promise

WatsonSpeech.SpeechToText

.recognizeMicrophone({token}) -> RecognizeStream

.recognizeElement({element, token}) -> RecognizeStream

.recognizeFile({data, token}) -> RecognizeStream

Class RecognizeStream()

Methods

Events

Class FormatStream()

Class TimingStream()

Class WritableElementStream()

Changelog

v0.14

v0.13

v0.12

v0.11

v0.10

v0.9

v0.8

v0.7

todo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`WatsonSpeech.TextToSpeech`

`.synthesize({text, token})` -> `<audio>`

`.getVoices()` -> Promise

`WatsonSpeech.SpeechToText`

`.recognizeMicrophone({token})` -> `RecognizeStream`

`.recognizeElement({element, token})` -> `RecognizeStream`

`.recognizeFile({data, token})` -> `RecognizeStream`

Class `RecognizeStream()`

Class `FormatStream()`

Class `TimingStream()`

Class `WritableElementStream()`

Packages