Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy-Safe Storage API #28

Closed
Minigugus opened this issue Aug 2, 2021 · 9 comments
Closed

Privacy-Safe Storage API #28

Minigugus opened this issue Aug 2, 2021 · 9 comments

Comments

@Minigugus
Copy link

Minigugus commented Aug 2, 2021

Privacy-Safe Storage API

Privacy issues on the web occurs because an entity accessed data about a user that didn't wanted to share those data with that entity. However, if a user wants to visit a website, it has no choice other than trust the website to use it. Therefore, the website owner is the only one responsible for what happens to data about users of its website.

This proposal aims to solve this problem by enabling the website to on purpose loose its cross-origin and network access in exchange to an access to a long-term storage and different browser UI.

Main goal

  • Protects users from data leaks even when a frequently visited websites is corrupted (even if the attacker manages to keep network access, it won't be able to access user data)
  • Offers an strong alternative to authentication for PWAs and decentralized applications (simpler to implement, transparent for the user)
  • Allows users to safely enter sensitive information on an untrusted website
  • Allows websites and web applications to engage with more users thanks to the new privacy-by-design architecture enforced by this API design
  • Be incrementally adoptable

This proposal takes some ideas from other proposals but is more general:

  • the Fenced Frames proposal suggests a new HTML node type, and therefore apply only to embedded frames. Instead, the Privacy-Safe Storage API suggests a JavaScript API that a main frame may call at any time. Also, this proposal might not need Web Bundles to be implemented.
  • the Shared Storage is really close to this one as it proposes a storage only accessible in a secure environment. The Privacy-Safe Storage API adds a new secure environment context: an offline main frame. However, the Shared Storage is currently designed around worklets, whereas the current proposal focuses any kind of contexts (frames, workers, service worker, and so on).

Design

Proposed surface API

const storage = await navigator.storage.requestPrivacySafeStorage();

Calling the above API have the following effect:

  • Embedded frames created before access to the storage will continue to work without any change, however the main frame won't be able to communicate with them anymore.
  • Dynamically created frames won't load if created while the Privacy-Safe Storage is accessible. In the same way, workers created after the Privacy-Safe Storage is accessed will have access to this storage but the same restrictions as for the main frame will apply.
  • The Service Worker also looses cross-origin and network access, but can still work with its cache, as if the user where really offline. The main frame can still issue fetch call for instance since the network restriction apply at the Service Worker level. Other frames and workers that don't have access to the Privacy-Safe Storage will either use another instance of the Service Worker with network access, or not use a Service Worker at all (to be discussed).
  • In order to avoid leaks, only same origin links that open in the current frame are be opened. (to be discussed too, see the Fenced Frame equivalent)
  • All other storage APIs and cookies are either inaccessible or limited to reads only. Another solution is to have 2 storage instances per website, the first accessible from a non secure context and the second from a secure context.
  • The main frame can't communicate with cross-origin and non secure contexts, but the other way is allowed. This way, it is possible to have a worker that forwards messages for a websocket connection to the main frame, but is unable to get a any form of response from the main thread. Therefore, analytics collection is still possible (online session duration, web page loading time, browser information) but without access to sensitive information.

The storage API is yet to be discussed, but something like Local Storage API may be a good starting point.

Key scenarios

General use case
  1. A user visits a page on website A.
  2. The web page starts service worker installation.
  3. While the page loads, the service worker caches resources needed by the website to work offline.
  4. The web page is displayed and all resources are loaded, the web page requests an Privacy-Safe Storage access.
  5. The browser drops any cross-origin and network access of the main frame, and updates its UI to indicate to the user that anything it do as of now is full private.
  6. The browser returns a handle that give access to the Privacy-Safe Storage to main frame.
  7. The website can do anything with what's inside the storage or with other APIs, nothing will leave the browser.

Adoption strategy

This proposal tries to mimic what happens when websites works offline, thus websites that already works offline won't have to change their code a lot to use the feature proposed here. For instance, web applications like those below shouldn't require a lot of work to adopt this feature:

  • Single or local multiplayer games, text/image/video/music editors, notepads, public newspapers: no need for a "write access" to the network, and local analytics are still possible. Also, using the feature proposed here can attract users thanks to the updated UI and privacy guaranties. Also this proposal is compatible with proposals like TURTLEDOVE, so ads may still be displayed safely without breaking safety guaranties.
  • Passwords managers, authenticators (with OTP): security is critical, attacks on those web applications are highly valuables for attackers. When using the Privacy-Safe Storage API, users are guaranteed the remote server don't have any sensitive information that may leak.

Furthermore, this feature can be easily turned off if it don't match expected needs, for instance after an origin trial.

FAQ

What would happens if a third-party script call the proposed API?

This proposal assume it's the website developer responsibility to be careful when choosing external libraries. Also, from user point of view, the only risk is to see a broken website; the third-party script won't be able to leak storage information anyway.

Maybe an attacker can't retrieve storage content, but it can still alter/delete its content

Third-party libraries can already alter/clear localStorage storage, but most don't. Like above, this is outside of the scope of this proposal.

Will the browser offer a new permissions access level, such as "privacy-safe context only" when accessing the camera for instance?

That would be great, but this proposal primarily focuses on offering the storage mechanism first. (to be discussed too)

What is the purpose of such a feature since there's already the Local Storage, IndexedDB, and so on for long-term storage?

The main difference is that: (1) API design and behavior prevents the Privacy-Safe Storage content from leaving the browser, and (2) the browser UI helps the user feeling more comfortable when a secure website ask some information.

If this proposal gets enough attraction, maybe another proposal would request enabling "clear cookies at the end of the session" parameter by default.

What about data redundancy (backups, multi-devices)?

Since user data would be stored client-side, this can indeed be problematic in some cases (damaged devices, accessing the same site with multiple devices at the same time, and so on). This isn't a goal of this proposal, especially since most browsers offer profile synchronization.

Another possibility is to update the API to let the non-secure context of a website request encrypted snapshots of the Privacy-Safe Storage for its origin. Therefore, the website become responsible for dealing with the storage synchronization across devices and backup without having access to the storage content. (to be discussed)

@annevk
Copy link

annevk commented Aug 3, 2021

It's an interesting idea. I suspect you'll find it challenging to prevent communication with nested browsing contexts as width/height or history gives quite a few channels. And given the service worker backing it's not clear you can really prevent dynamic generation of nested browsing contexts. Or at least you'd have to provide more detail as to how that would work.

Another thing this would need is a way to update the website. I think that poses another challenge as you want to allow frequent updates for security reasons, but you do not want to allow it to be used as a communication channel of sorts. (And involving the users for updates is a non-starter I think from a usability perspective. That is, it seems attractive this allows the user to be involved for those that want that level of control, but that cannot be the default.)

@Minigugus
Copy link
Author

Minigugus commented Aug 3, 2021

@annevk Thanks for the feedback 👍

I suspect you'll find it challenging to prevent communication with nested browsing contexts as width/height or history gives quite a few channels.

As a work around, this proposal shouldn't work within embedded frames, only on main frames and workers, so that when a user navigate to a different page or origin, the website state is reset and is becomes possible to disable the storage access and re-enable network access without leak risks. Concerning the service worker, as explained in my previous message, requests and messages routing to service workers depends whether the client context is privacy-safe (has access to the privacy-safe storage), so that the target service worker inherits storage access permissions as well as network restrictions. When a user navigates using the history buttons, the main frame load the page using from the service worker without storage access, so there is no leak issues.

To sum up, the storage access affects the current tab only, not to the whole origin.

EDIT: Ok I just realized what you meant: indeed, since history can be manipulated dynamically, it becomes a sensible feature. As a work-around, maybe browsers could propagate storage access status to history entries, so that links opened by when the storage is accessible will force thoses links to be loaded with the storage enabled too. Anyway, they are already discussions about this with for instant Opaque Urls I think.

Another thing this would need is a way to update the website. I think that poses another challenge as you want to allow frequent updates for security reasons, but you do not want to allow it to be used as a communication channel of sorts.

Again, this is why this proposal suggests 2 services workers instances (the first with network access but no storage access, the second without network access but with storage access): the first one can update the cache safely as it don't have access to user data, and the second one can emulate a real API using cache and storage content, which would therefore be fully transparent for the main frame code.

@annevk
Copy link

annevk commented Aug 3, 2021

So how do you trigger an update from the networkless context without leaking state?

@Minigugus
Copy link
Author

Minigugus commented Aug 3, 2021

So how do you trigger an update from the networkless context without leaking state?

@annevk You don't... 😅 Currently, service workers are already updated automatically by the browser, thus the automatic update of cached resources will still be possible (when the service worker is installaling, it runs in the context with network access, the switch to the networkless context when activated).
However, as you mentions, non-automatic updates (i.e. with manual user agreement) would require a different approach, but that should still be doable with multiple caches (the first service worker downloads updated resources in another cache and the second one only use the new cache when the user wants to update), or with a private cache storage (to be able to delete older versions, but this solution requires another change in browsers).
I agree that in some cases, this proposal introduce new privacy-oriented architecture (similar to the ownership and borrowing principles from the Rust world). I think reactions from the users and developers will determine the fate of this proposal.

@jkarlin
Copy link

jkarlin commented Aug 10, 2021

Interesting. This is similar to the notion of a fenced frame that can get read-only storage access w/out a prompt and no network. And then if the user later wants to save state or something then they can use the standard rSA permission prompt.

I think it's preferable to perform this in a subframe (that starts off fenced) as opposed to the main frame of a page, since the main frame may not want to be network-less forever.

@Minigugus
Copy link
Author

This is similar to the notion of a fenced frame that can get read-only storage access w/out a prompt and no network.

Interesting, I missed the issue you document you mentioned, thanks for the feedback 👍 Indeed, it's quite similar to what you mentioned, except the fact that the proposal presented here currently do not propose integration with unpartitionned storage, only traditional origin-restricted storage (I might be wrong about what is unpartionned storage thought).

I think it's preferable to perform this in a subframe (that starts off fenced) as opposed to the main frame of a page

I don't think so since this proposal requires the browser to inform the user that anything happening on the current frame while the browser UI is updated cannot be known or leave the browser. Moreover, this proposal do not require webbundles thanks to service workers.
As opposed to the fenced frame proposal, the goal here is not directly to limit user tracking across origins, but to offer a way to build complete websites without storing anything about the user server-side. I think the risk with only fenced frame is that only ads use them, not real world websites, especially if only web bundles are supported. Moreover, without updated browser UI, a user can easily be fooled by a phishing iframe that claim to be a fenced frame to the user. Finally, fenced frames do not play well with CDNs, again because of webbundles, even thought CDNs are not a problem for privacy when users data are stored client-side only.

the main frame may not want to be network-less forever.

By "main frame" I mean a top-level document like a tab or a popup. A main frame that don't want to loose its network access would still be able to open another main frame (popup or another tab) than then request network-less storage access. Therefore, it is also simpler for the user to understand what's happening.

@jkarlin
Copy link

jkarlin commented Aug 11, 2021

Interesting, I missed the issue you document you mentioned, thanks for the feedback 👍 Indeed, it's quite similar to what you mentioned, except the fact that the proposal presented here currently do not propose integration with unpartitionned storage, only traditional origin-restricted storage (I might be wrong about what is unpartionned storage thought).

By unpartitioned storage I'm referring to origin-partitioned, but not top-site partitioned storage. E.g., third-party cookies and javascript storage.

I don't think so since this proposal requires the browser to inform the user that anything happening on the current frame while the browser UI is updated cannot be known or leave the browser. Moreover, this proposal do not require webbundles thanks to service workers.
As opposed to the fenced frame proposal, the goal here is not directly to limit user tracking across origins, but to offer a way to build complete websites without storing anything about the user server-side. I think the risk with only fenced frame is that only ads use them, not real world websites, especially if only web bundles are supported. Moreover, without updated browser UI, a user can easily be fooled by a phishing iframe that claim to be a fenced frame to the user. Finally, fenced frames do not play well with CDNs, again because of webbundles, even thought CDNs are not a problem for privacy when users data are stored client-side only.

Ah, sorry, I didn't catch that this was meant to support an offline app use case. In regards to fenced frames that don't have network access, we are considering allowing network access to caching-only servers from fenced frames that don't log user requests. That would help with the cdn case.

For the particular mode of fenced frame I am describing however, the fenced frame does have network until the frame accesses unpartitioned storage. So it's quite similar.

By "main frame" I mean a top-level document like a tab or a popup. A main frame that don't want to loose its network access would still be able to open another main frame (popup or another tab) than then request network-less storage access. Therefore, it is also simpler for the user to understand what's happening.

We're in agreement on main frame here. Multiple tabs/popups are pretty confusing for mobile users however.

@erik-anderson
Copy link
Member

@Minigugus it doesn't look like there's been recent discussion on this. Is this something you're still interested in pursuing or should we close this?

@Minigugus
Copy link
Author

@erik-anderson I've thought more about this and prefered creating a new issue, as they diverged too much. I think we can indeed close this one then, and redirect to #31 instead 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants