Skip to content

[Question] Why does the spec require a browser to select the presentation device #526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Marvin-Brouwer opened this issue Dec 28, 2024 · 10 comments

Comments

@Marvin-Brouwer
Copy link

I've been reading the closed issues and I've skimmed through the spec before posting this.
I don't really understand why I'm not allowed to get a list of objects containing the presentable device's name, id, capability etc. and call presentationRequest.start('deviceId') or something similar.
Like the Media Capture and Streams API.

I've seen someone speculate here #315 (comment) that it might be because you want to give users control over which device they want to select. Preventing a website from just picking one at random and causing strange situations.
However, I haven't seen any official statement as to why this is specced the way it is.
If you make a mobile app you get the option to list out devices and style them inside of the app, so to me it makes sense a web site could do something similar. If I'm wrong that's fine, but I'd like to be able to read why the decision has been made.

@tidoust
Copy link
Member

tidoust commented Jan 6, 2025

I'd say a major reason is fingerprinting. Giving access to the list of presentation displays would reveal too many bits of information about the user's context. I believe enumeration of devices in the case of the Media Capture and Streams API is only possible once capture of audio/video has already started (meaning when user has already granted permission about microphone and camera usage).

The Presentation API still allows web pages to assess whether there is a presentation display (and notes the underlying fingerprinting consequence under Personally Identifiable Information).

The spec notes another reason under Device Access: "The Presentation API requires user permission for a page to access any display to mitigate issues that could arise, such as showing unwanted content on a display viewable by others".

The Window Management API could perhaps be a better fit for multi-screen scenarios and allows to get detailed information about screens used by the user's device.

@Marvin-Brouwer
Copy link
Author

Marvin-Brouwer commented Jan 15, 2025

Hmm, I figured it had something to do with security.
Didn't think of the fingerprinting.
However, from a user standpoint, I'm not particularly happy with the browser scanning my network for cast devices without my consent anyway. So I'd be more than fine with a popup saying something like "chrome would like to scan your network for cast devices". And then make the start of casting similar to playing audio, that it has to be a user interaction.

I'm not a security expert by any means so maybe I'm missing something, but I am experienced with javascript, and from a developer's perspective this is yet another inconsistency in programming api.

I'm not sure if this is the right place, but I hope sharing this helps.

@markafoltz
Copy link
Contributor

MacOS is moving in the direction you suggest and requiring users to give Chrome permission to access your local network before Cast or other devices can be discovered. This does give users more control, but has also has caused a lot of problems because features like Cast and printing stop working if the permission is not granted. Users don't always associate (technical) prompts like these with features and capabilities they rely on.

@Marvin-Brouwer
Copy link
Author

Interesting, I guess asking for specific permission like scanning for cast devices or printing would be more user friendly than just "local network access".
However, yet again, I'd rather have consistency in functionality and usage.

@tidoust
Copy link
Member

tidoust commented Jan 17, 2025

I see you redacted this part of your comment, but you initially drew a parallel with Web Bluetooth and Web USB which I found interesting. Both specs have a mechanism to enumerate devices: Bluetooth.getDevices() and USB.getDevices(). These enumerations are restricted to "allowed devices", meaning those devices that the user previously granted the origin permission to use. Something similar could perhaps be discussed for the Presentation API?

@Marvin-Brouwer
Copy link
Author

I see you redacted this part of your comment, but you initially drew a parallel with Web Bluetooth and Web USB which I found interesting. Both specs have a mechanism to enumerate devices: Bluetooth.getDevices() and USB.getDevices(). These enumerations are restricted to "allowed devices", meaning those devices that the user previously granted the origin permission to use. Something similar could perhaps be discussed for the Presentation API?

I redacted this because I realized the Bluetooth api opens a "native" browser window to list the devices similar to the chrome cast.
So my point was invalid and distracting.
But I do like your suggestion of "allowed devices" or perhaps even "allowed protocols". However, for the latter a human friendly description should be provided I guess.

@markafoltz
Copy link
Contributor

The situation is slightly different for Bluetooth and USB. In those cases, the set of eligible targets is relatively fixed from the host device's point of view, either through previous Bluetooth pairing, or physical connection through USB. Therefore, if the user has previously granted a site access to a device, it's likely that the device will be available in the future.

Presentation API targets are typically discovered through network multicast, which results in an ephemeral list of devices depending on which network sender and receiver devices happen to be connected to (if at all).

If the browser finds a device that the site has not connected to before, it won't be in the enumeration, so every site will need to implement a fallback on having a browser-mediated dialog to allow the user to select non-enumerated devices.

Also, the user should provide consent before presentation starts, because the site will be placing information on a shared display. So even if the site has a device enumeration and a way to request presentation on a device, there should be confirmation by a trusted browser dialog.

In summary, for the general presentation scenario, site enumeration and direct selection of presentation receivers creates more problems than it solves from privacy and user interface complexity points of view.

There might be some use cases where site driven selection makes sense, such as a kiosk or other configuration with a fixed set of displays. In those scenarios, the browser could be customized with policies to select displays on behalf of the user. Also, the Window Management API is set up to better support these scenarios since it allows display enumeration and direct placement of content on attached displays (once permission is granted).

Happy to discuss further, but maybe a concrete use case would help focus the discussion more.

@Marvin-Brouwer
Copy link
Author

The situation is slightly different for Bluetooth and USB. In those cases, the set of eligible targets is relatively fixed from the host device's point of view, either through previous Bluetooth pairing, or physical connection through USB. Therefore, if the user has previously granted a site access to a device, it's likely that the device will be available in the future.

I don't agree, all the "explorable devices" in my network are devices like Chromecasts or smart tv's they are stationary. And USBdevices, like flash drives, will most likely be disconnected after use.
To me this is exactly the opposite of your statement.

Also, the user should provide consent before presentation starts, because the site will be placing information on a shared display. So even if the site has a device enumeration and a way to request presentation on a device, there should be confirmation by a trusted browser dialog.

I completely agree, however, I think this consent should be given prior to the enumerating.
I would like to decide whether a browser may scan my network at all before it does so.
Whether the site get's to see this or not.

Also, the Window Management API is set up to better support these scenarios since it allows display enumeration and direct placement of content on attached displays (once permission is granted).

But, shouldn't these APIs act in a similar way?
It seems odd UX to me that the window API requests me if I'm okay with the site having access and the screen cast API just asking me "where do you want this site to stream to" and suddenly showing me devices on my network.
Additionally, I think it's kind of odd that the screen presentation API also adds the "additional screens" as an option, but not the primary screen. And, there is no way to either opt-out of having windows as share target or adding the main window as one.
In summary, all of this behavior seems very inconsistent from both the User and the Programmers point of view.

Happy to discuss further, but maybe a concrete use case would help focus the discussion more.

I don't really have a use-case, I was just playing around with possibly replacing my current chomecast button on a hobby project (which I realize also has these issues).

@markafoltz
Copy link
Contributor

I completely agree, however, I think this consent should be given prior to the enumerating.
I would like to decide whether a browser may scan my network at all before it does so.

That's a reasonable request, and as I mentioned previously, this is now required by MacOS. The Presentation API allows browsers to request additional permission before discovering devices, or to not implement background discovery at all so that it happens only when starting a presentation. So I don't think there's any incompatibility with your wishes here.

In summary, all of this behavior seems very inconsistent from both the User and the Programmers point of view.

I generally agree with your point that there are some overlaps in functionality between Presentation API and Window Management API. Rather than try to coerce the APIs to function identically, I think they serve different use cases. Window Management is closer to what you want, as far as I can tell - automating the placement of content on a fixed set of displays. The Window Management API permits use of wireless displays as well.

What types of displays and display connections are you using for your hobby project? HDMI, Chromecast, something else?

@Marvin-Brouwer
Copy link
Author

The Window Management API permits use of wireless displays as well.

I did not know that. I'm mostly working with ChromeCast, and, I was looking at replacing it for a more generic solution, hoping that one day it'll also work with Airplay etc.
I'll play around with it to see if it fits my needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants