-
Notifications
You must be signed in to change notification settings - Fork 38
Closed
Description
There is an old issue in bugzilla but it doesn't discuss much.
We should revive this, to give the application control over the source of audio.
Not letting the application be in control has several issues:
- The spec needs to re-define (mediacapture-main does this too) what sources there are, today I don't see this mentioned. It seems implied that a microphone is used.
- The error with code "audio-capture" groups all kinds of errors that capturing audio from a microphone could have.
- There'd have to be an additional permission setting for speech, in addition to that of audio capture through getUserMedia. The spec doesn't help in clearing out how this relates to getUserMedia's permissions currently, and doing so could become complicated (if capture is already ongoing, do we ask again? if not, how does a user choose device? etc.)
- Depending on implementation, if we rely on start() requesting audio from getUserMedia() (seems reasonable), doing multiple requests after each other could lead to a new permission prompt for each one, unless the user consents to giving a permanent permission. This would be an issue in Firefox as through the SpeechRecognition API an application cannot control the lifetime of the audio capture.
- Probably more.
Letting the application be in control has several advantages:
- It can rely on mediacapture-main and its extension specs to define sources of audio and all security and privacy aspects around them. Some language might still be needed around cross-origin tracks. There's already a concept of isolated tracks in webrtc-identity, that will move into the main spec in the future, that one could rely on for the rest.
- If no backwards-compatible path is kept, the spec can be simplified by removing all text, attributes, errors, etc. related to audio-capture.
- The application is in full control of the track's lifetime, and thus can avoid any permission prompts the user agent might otherwise throw at the user, when doing multiple speech recognitions.
- The application can recognize speech from other sources than microphones.
- Probably more.
To support a MediaStreamTrack argument to start(), we need to:
- Throw in start() if the track is not of kind "audio".
- Throw in start() if the track's readyState is not "live".
- Throw in start() if the track is isolated.
- If the track becomes isolated while recognizing, discard any pending results and fire an error.
- If the track ends while recognizing, treat it as the end of speech and handle it gracefully.
- If the track is muted or disabled, do nothing special as this means the track contains silence. It could become unmuted or enabled at any time.
What to throw and what to fire I leave unsaid for now.
guest271314, vhfmag, Aviking88, patrickmccallum, randomor and 8 more
Metadata
Metadata
Assignees
Labels
No labels