Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full storage partitioning / double-keying #4

Closed
othermaciej opened this issue Jan 31, 2020 · 19 comments
Closed

Full storage partitioning / double-keying #4

othermaciej opened this issue Jan 31, 2020 · 19 comments
Labels
interest: gecko Implementer interest from Gecko (e.g. Mozilla/Firefox, Cliqz) interest: webkit Implementer interest from WebKit (e.g. Apple/Safari, Igalia/GNOME Web) work item? Formal request to adopt this proposal as a Work Item of the CG

Comments

@othermaciej
Copy link

Many privacy measures have focused on cookies and replacements for them. However, many other storage types can, without appropriate restrictions, be used for stateful cross-site tracking.

This includes explicit storage types like LocalStorage or IndexedDB, implicit storage, like the HTTP cache, communication channels like ServiceWorker and BroadcastChannel, and subtle state like HSTS flags.

When such state is accessible from a third-party context, it may enable cross-site tracking. Some state like this, e.g. the HTTP cache, can effectively be read by a passive resource, and thus any third-party resource may be affected. Other such state requires a scripting context in a third-party origin and thus would require an iframe or similar mechanism.

For any given storage mechanism, multiple approaches are possible. One approach is to totally deny access to the storage mechanism in a third party context. Another is to partition or double-key it; that is, have a completely separate instance of the storage based on the origin of the top-level browsing context. Yet another is to expose a unique ephemeral storage area.

WebKit has long partitioned many storages, including (at one point controversially), the HTTP cache. Unfortunately, exactly what we did is not documented. Blink and Gecko also have work in progress to add pervasive double keying.

It would be useful to agree on a common behavior, and to push these changes into standards as requirements.

Changes along these lines would ultimately go into HTML, Fetch, and perhaps various IETF deliverables. Perhaps also other standalone Web APIs that create a storage mechanism. However, it could be useful to have a central location and issue tracker to develop a plan and proposed behavior before filing issues/PRs against the relevant specifications.

@hober hober added agenda+ Request to add this issue to the agenda of our next telcon or F2F work item? Formal request to adopt this proposal as a Work Item of the CG needs implementer interest Proposals cannot become work items without multi-implementer interest labels Feb 3, 2020
@kdzwinel
Copy link

This includes explicit storage types like LocalStorage or IndexedDB, implicit storage, like the HTTP cache, communication channels like ServiceWorker and BroadcastChannel, and subtle state like HSTS flags.

FWIW Chrome's Storage Isolation Project does a great job listing many (all?) state mechanisms in browsers: https://docs.google.com/document/d/1V8sFDCEYTXZmwKa_qWUfTVNAuBcPsu6FC0PhqMD6KKQ/edit#heading=h.5wyylz23hbkc

It would be useful to agree on a common behavior, and to push these changes into standards as requirements.

+1 Double keying of caches (and socket pools) being a requirement in e.g. Resource Timing API would help us avoid many privacy concerns (w3c/resource-timing#222).

Changes along these lines would ultimately go into HTML, Fetch, and perhaps various IETF deliverables.

Double (and triple) keying of HTTP cache is already being discussed here: whatwg/fetch#904 . As far as I understand though, this does not include any other state mechanisms.

@ehsan
Copy link

ehsan commented Feb 12, 2020

Mozilla has two implementations of storage partitioning, one shipped and one in the works:

We're also actively working on partitioning our HTTP cache and some related caches (per whatwg/fetch#904).

Given our ongoing work in this area Mozilla is supportive of this work and agree on a common behaviour.

@othermaciej
Copy link
Author

For avoidance of doubt, Apple also supports this work.

@hober hober removed needs implementer interest Proposals cannot become work items without multi-implementer interest agenda+ Request to add this issue to the agenda of our next telcon or F2F labels Feb 12, 2020
@TanviHacks
Copy link
Member

Would anyone like to volunteer to be an editor for this?

@othermaciej
Copy link
Author

If no one else can step up, I'm willing to make an Explainer that states the problem and surveys what different browser engines do for this currently. But I definitely will not be able to turn this into formal spec language once we agree on solutions.

@hober hober added the needs editor Proposals cannot become work items without an editor label Feb 26, 2020
@annevk
Copy link

annevk commented Mar 4, 2020

Part of this seems like it should be part of storage access, right? That is, both are talking about changes to keying of site storage (loosely defined at https://storage.spec.whatwg.org/#infrastructure and probably in need of some changes, in particular for cookies). Now storage access might also allow for changes to the key, depending, but it seems best if that's sorted out together.

Thus far at Mozilla we've been thinking about separate "cache" and "storage" keys so the HTTP cache can always use multiple keys for instance and be static, whereas the "storage" key might be able to change depending on the storage access API. Unfortunately it doesn't seem to be quite that clean as while service workers and the cache API are logically "storage", putting them there is not necessarily great.

Also, some infrastructure work here is being done by @shivanigithub. See whatwg/html#4966 and whatwg/fetch#943 for an early adopter (which I should get to reviewing and I'd also welcome review on those from others here). Having access to the top-level origin will make defining these keys easier.

@annevk
Copy link

annevk commented Mar 4, 2020

(I could see them being isolated if "storage access" only applies to a UA-determined set of sites and is about blocked storage -> storage and this tackles partitioning the set of sites that remain and "storage access" does nothing there, but it seems to early to make that kind of decision, right?)

@othermaciej
Copy link
Author

They are related, but I think neither is a part of the other.

Storage partitioning and cache partitioning can clearly be implemented and/or specified without Storage Access API. Safari shipped a form of partitioning many years before we shipped Storage Access API.

Storage Access API's model is easier to explain if the specs require partitioning, and Storage Access API merely explains how it can be selectively undone. This can be done as two separate layers. But it is also reasonable to specify it as an API that undoes UA-specified partitioning or blocking, at least as an interim step.

Also worth noting, specifying partitioning is not sufficient as a mechanism to underpin current implementations of Storage Access API. The most essential thing SAA undoes in many browsers is third-party cookie blocking. While Safari briefly had selective third-party cookie partitioning, currently the measure ITP takes against cookies is fully backing them.

@othermaciej
Copy link
Author

Also, I think it's reasonable to have logically multiple partitioning keys, of which Storage Access API undoes only a subset. Instead of storage/cache, the separation should probably be explicit/implicit. Cookies, LocalStorage and IndexedDB are all explicit storage APIs. HTTP Cache is not, but there are other things in that category that are worth partitioning. For example, if network-level state like TLS session state or Alt-Svc are partitioned, that should probably not be undone by Storage Access API.

(And thanks for the pointers to Google infrastructure/proposals in this area.

@jkarlin
Copy link

jkarlin commented Mar 4, 2020

As noted above, I think we're dealing with two states. The default state is that the 3p storage is somehow sharded by key (either no key, an ephemeral key, a 1p key, or a double key). Where no key means the browser throws. The other state is one where the document is promoted via Storage Access API, and the key changes to the 1p key if it isn't already.

Perhaps this issue should focus on the first state, but understanding how it transitions to the second is useful.

@othermaciej
Copy link
Author

Cookies are also a special case. I don't believe they are partitioned/sharded/double-keyed or whatever in any current browser, and I don't think any browser plans to do it in the future. The present is that many browsers block third-party cookies for some sites. I think a likely future is that all third-party cookies are totally blocked by default (both getting and setting).

@jkarlin
Copy link

jkarlin commented Mar 4, 2020

Would it make sense to partition third-party cookies? If it makes sense to do so for localStorage I don't see why it wouldn't for cookies.

@annevk
Copy link

annevk commented Mar 5, 2020

and Storage Access API merely explains how it can be selectively undone

So that would mean there's a transition from "partitioned" to "first-pary" storage, right? And unless the "partitioned" storage was ephemeral, the "first-party" would potentially get a lot of additional information?

Instead of storage/cache, the separation should probably be explicit/implicit.

A way I was thinking about is that caches can be appended to and storage can be manipulated. (I.e., you can delete a specific cookie or Indexed DB store, but you cannot delete a specific connection pool or session identifier.)

Cookies are also a special case.

I agree with regards to the status quo, but I wonder what principle backs that as data can flow between cookies and other storage APIs. HttpOnly is a difference, but cannot think of a suitable angle for that mattering here.

@hober
Copy link
Member

hober commented Mar 10, 2020

Would anyone like to talk about this on this week's telcon? If so, please add the agenda+ label. (If you can't, let me know and I can add it.)

@hober hober changed the title Proposed deliverable: full storage partitioning / double-keying Mar 10, 2020
@annevk annevk added the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Mar 11, 2020
@npdoty
Copy link

npdoty commented Mar 12, 2020

For the security threat of private information disclosure from timing attacks by a first party based on the loading of specific resources from another site, there seems to be an analogy to Access-Control-Allow-Origin and related headers, where a server can indicate properties about the data that it's returning and whether it should be accessible by other origins. Should servers also be able to indicate whether caching could be sensitive (like search results pages)? Or is the assumption that all such timing attacks will be sensitive to some degree and we want to mitigate all of them rather than have servers indicate scope?

@annevk
Copy link

annevk commented Mar 12, 2020

w3ctag/design-reviews#424 has some additional context, but in general, we cannot trust servers to make decisions in the best interests of end users.

@npdoty
Copy link

npdoty commented Mar 12, 2020

Thanks for that context @annevk and that makes sense; I mostly just wanted to make sure the concept had been considered. And it might be especially tricky for some of these cases for the server to make those decisions.

If browsers are concerned about implementing completely separate caches for all requests, I do wonder if some of these server-provided headers would be useful places to start, especially to address known attacks on discovering authenticated content. I don't know if browser developers are being held back by interop, efficiency (bandwidth or speed) or just implementation, but it might be that Cache-Control: public and Access-Control-Allow-Origin: * resources don't need the cache separation as urgently as other resources.

@hober hober removed the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Mar 24, 2020
@hober hober added the interest: webkit Implementer interest from WebKit (e.g. Apple/Safari, Igalia/GNOME Web) label Apr 2, 2020
@hober hober added the interest: gecko Implementer interest from Gecko (e.g. Mozilla/Firefox, Cliqz) label Apr 2, 2020
@hober hober removed the needs editor Proposals cannot become work items without an editor label Apr 21, 2020
@hober
Copy link
Member

hober commented Apr 21, 2020

This has been adopted as a Work Item, with @annevk as editor, as of privacycg/privacycg.github.io@fe69d9a.

@annevk
Copy link

annevk commented Apr 22, 2020

I put up an introduction proposal at privacycg/storage-partitioning#1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interest: gecko Implementer interest from Gecko (e.g. Mozilla/Firefox, Cliqz) interest: webkit Implementer interest from WebKit (e.g. Apple/Safari, Igalia/GNOME Web) work item? Formal request to adopt this proposal as a Work Item of the CG
8 participants