Support SpeechRecognition on an audio MediaStreamTrack

There is an old issue in [bugzilla](https://www.w3.org/Bugs/Public/show_bug.cgi?id=26336) but it doesn't discuss much.

We should revive this, to give the application control over the source of audio.

Not letting the application be in control has several issues:
- The spec needs to re-define ([mediacapture-main](https://w3c.github.io/mediacapture-main/) does this too) what sources there are, today I don't see this mentioned. It seems implied that a microphone is used.
- The error with code "audio-capture" groups all kinds of errors that capturing audio from a microphone could have.
- There'd have to be an additional permission setting for speech, in addition to that of audio capture through getUserMedia. The spec doesn't help in clearing out how this relates to getUserMedia's permissions currently, and doing so could become complicated (if capture is already ongoing, do we ask again? if not, how does a user choose device? etc.)
- Depending on implementation, if we rely on start() requesting audio from getUserMedia() (seems reasonable), doing multiple requests after each other could lead to a new permission prompt for each one, unless the user consents to giving a permanent permission. This would be an issue in Firefox as through the SpeechRecognition API an application cannot control the lifetime of the audio capture.
- Probably more.

Letting the application be in control has several advantages:
- It can rely on mediacapture-main and its extension specs to define sources of audio and all security and privacy aspects around them. Some language might still be needed around cross-origin tracks. There's already a concept of isolated tracks in webrtc-identity, that will move into the main spec in the future, that one could rely on for the rest.
- If no backwards-compatible path is kept, the spec can be simplified by removing all text, attributes, errors, etc. related to audio-capture.
- The application is in full control of the track's lifetime, and thus can avoid any permission prompts the user agent might otherwise throw at the user, when doing multiple speech recognitions.
- The application can recognize speech from other sources than microphones.
- Probably more.

To support a MediaStreamTrack argument to `start()`, we need to:
- Throw in start() if the track is not of kind "audio".
- Throw in start() if the track's readyState is not "live".
- Throw in start() if the track is isolated.
- If the track becomes isolated while recognizing, discard any pending results and fire an error.
- If the track ends while recognizing, treat it as the end of speech and handle it gracefully.
- If the track is muted or disabled, do nothing special as this means the track contains silence. It could become unmuted or enabled at any time.

What to throw and what to fire I leave unsaid for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support SpeechRecognition on an audio MediaStreamTrack #66

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support SpeechRecognition on an audio MediaStreamTrack #66

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions