-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design and implement a fancier heuristicblock data structure #266
Comments
@jcb82 expressed interest in designing this. |
My understanding of what we want is a client-side data structure to store counts of pairs (first party, third party) which we'll call (x, y) such that when the same third party y has been detected for three or more different first parties, y is marked as a tracker domain. Except we don't want a fine-grained history to be stored. Any more information on the threat model and exactly how much leakage we can tolerate? I would propose a two-level Bloom filter: keep N bloom filters around. When x, y is observed, compute H(x||i) for 0 < i < k. For each index i, update bloom filter i with value y. Once j of N bloom filters (for j =3k-e) have been updated with y, y is considered a tracking domain. We can tune i, N, k, j such that we have a high probability of marking tracking domains. There might be a simpler approach, I'm not totally sure what the design requirements are here. |
This was discussed in the bi-weekly developers meeting on 9/27. Threat model: A local attacker would be able to view a subset of cleared history (only visited domains, not urls or times). other option: respect history clearing events, and remove domains from snitch map when they are removed from history. |
We definitely should do this. |
In particular, my instinct is that we should do the following two things:
|
I don't think this part is optional; I think this is a hard requirement if you are going to maintain the full domain name. |
This is a great idea; it gives deniability, you can use a very very fast HMAC digest algorithm for it, and the only draw back I see is that once in x times you'll undercount unique domains. This is a winner, unless there's some technical drawback I'm not thinking of. |
I agree that leaving cleartext history samples around is unacceptable if the user clears their history. But I've been wondering if 1-hex-digit records are acceptable in that case. There are some corner cases where a small amount of information about browsing history may be genuinely inferrable from those records. For instance, if a third party domain is on a grand total of 5 first party domains across the Web, then |
The corner cases still leave you with a deniability factor that the plaintext domains don't. I think the single(or double)-character hash here is a clear winner. |
I don't see a way to be notified of history-clearing events at this time without asking for the "history" permission, which will trigger a new permission warning, which seems unacceptable for Privacy Badger at this point (will lose a significant percentage of users in Chrome, and have a similar percentage stop upgrading in Firefox). |
Currently the httpRequestOriginFrequency data structure needs to store fragments of browsing history locally within the browser, in order to detect which 3rd party domains are tracking. We should migrate to a fancier, higher-privacy data structure that prunes at the right time, uses HMAC fragments instead of origin names, and takes up less space.
The text was updated successfully, but these errors were encountered: