Design and implement a fancier heuristicblock data structure #266

pde · 2014-08-18T18:15:15Z

Currently the httpRequestOriginFrequency data structure needs to store fragments of browsing history locally within the browser, in order to detect which 3rd party domains are tracking. We should migrate to a fancier, higher-privacy data structure that prunes at the right time, uses HMAC fragments instead of origin names, and takes up less space.

pde · 2014-08-18T18:15:29Z

@jcb82 expressed interest in designing this.

jcb82 · 2014-08-23T00:11:48Z

My understanding of what we want is a client-side data structure to store counts of pairs (first party, third party) which we'll call (x, y) such that when the same third party y has been detected for three or more different first parties, y is marked as a tracker domain. Except we don't want a fine-grained history to be stored. Any more information on the threat model and exactly how much leakage we can tolerate?

I would propose a two-level Bloom filter: keep N bloom filters around. When x, y is observed, compute H(x||i) for 0 < i < k. For each index i, update bloom filter i with value y. Once j of N bloom filters (for j =3k-e) have been updated with y, y is considered a tracking domain. We can tune i, N, k, j such that we have a high probability of marking tracking domains.

There might be a simpler approach, I'm not totally sure what the design requirements are here.

cooperq · 2016-09-28T01:29:39Z

This was discussed in the bi-weekly developers meeting on 9/27.

Threat model: A local attacker would be able to view a subset of cleared history (only visited domains, not urls or times).
This solution does not actually help against that since the attacker could still see if the users has visited site n by taking H(n) and searching for it in privacy badger's database.
what does this stop: an attacker who can view pb data but doesn't know how to hash strings.

other option: respect history clearing events, and remove domains from snitch map when they are removed from history.
problem: this will break privacy badger if the user consistently clears their entire history.

pde · 2016-12-15T00:29:02Z

We definitely should do this.

pde · 2016-12-15T00:44:01Z

In particular, my instinct is that we should do the following two things:

When tracker domains get redlisted, remove the three first party entries that caused them to be redlisted.
Maybe replace the first party domain names with a 1-2 hex digit HMAC slice. That's a simple special case of a bloom filter; it will make the algorithm a bit slower to block some trackers, but not terribly so.

pjlsergeant · 2016-12-15T01:16:23Z

respect history clearing events, and remove domains from snitch map when they are removed from history

I don't think this part is optional; I think this is a hard requirement if you are going to maintain the full domain name.

pjlsergeant · 2016-12-15T01:18:35Z

Maybe replace the first party domain names with a 1-2 hex digit HMAC slice. That's a simple special case of a bloom filter; it will make the algorithm a bit slower to block some trackers, but not terribly so.

This is a great idea; it gives deniability, you can use a very very fast HMAC digest algorithm for it, and the only draw back I see is that once in x times you'll undercount unique domains. This is a winner, unless there's some technical drawback I'm not thinking of.

pde · 2016-12-15T21:35:14Z

I don't think this part is optional; I think this is a hard requirement if you are going to maintain the full domain name.

I agree that leaving cleartext history samples around is unacceptable if the user clears their history. But I've been wondering if 1-hex-digit records are acceptable in that case.

There are some corner cases where a small amount of information about browsing history may be genuinely inferrable from those records. For instance, if a third party domain is on a grand total of 5 first party domains across the Web, then thirdparty-exampe.com : ["5", "A", "C"] probably allows an attacker to infer three visited first parties from the five possibilities. But such cases would be quite rare, and it isn't clear that these records are much worse than having thirdparty-example.com in our blocklist, which is more profoundly unavoidable.

pjlsergeant · 2016-12-16T05:33:54Z

The corner cases still leave you with a deniability factor that the plaintext domains don't. I think the single(or double)-character hash here is a clear winner.

ghostwords · 2018-03-31T17:09:38Z

I don't see a way to be notified of history-clearing events at this time without asking for the "history" permission, which will trigger a new permission warning, which seems unacceptable for Privacy Badger at this point (will lose a significant percentage of users in Chrome, and have a similar percentage stop upgrading in Firefox).

pde added enhancement labels Aug 18, 2014

cooperq added this to the Privacy Badger Beta2/version 0.3 milestone Aug 29, 2014

cooperq modified the milestones: Privacy Badger 2.0, Privacy Badger Beta2/version 0.3 May 27, 2015

cooperq removed the important label May 17, 2016

cooperq assigned ghostwords Aug 2, 2016

cooperq unassigned ghostwords Sep 13, 2016

cooperq removed the important label Sep 13, 2016

alexristich self-assigned this Sep 14, 2016

cooperq removed this from the Privacy Badger 2.0 milestone Sep 27, 2016

cooperq unassigned alexristich Sep 27, 2016

cooperq closed this as completed Dec 6, 2016

ghostwords mentioned this issue Dec 14, 2016

Privacy Badger maintains a separate, plain-text list of e̶v̶e̶r̶y̶ ̶d̶o̶m̶a̶i̶n̶ ̶y̶o̶u̶'̶v̶e̶ ̶e̶v̶e̶r̶ ̶v̶i̶s̶i̶t̶e̶d̶ some visited domains #1064

Closed

pde reopened this Dec 15, 2016

ghostwords added the heuristic Badger's core learning-what-to-block functionality label Mar 15, 2017

pde mentioned this issue Mar 16, 2017

Show explanation of why a domain is blocked #963

Open

ghostwords mentioned this issue Mar 31, 2017

Show where a domain was seen tracking #1289

Closed

ghostwords added the important label May 4, 2017

ghostwords mentioned this issue Oct 16, 2017

Better datastructure for storing domains #1531

Closed

ghostwords mentioned this issue Mar 31, 2018

Send snitch_map with error reports #1901

Closed

ghostwords mentioned this issue Apr 17, 2018

Importing an older data export can lose all blocked domains on restart #1972

Closed

ghostwords mentioned this issue Jul 10, 2018

Badger Settings Causes New Instance Of Browser Opened #2058

Closed

ghostwords removed the important label Jan 8, 2019

ghostwords mentioned this issue Feb 24, 2019

Delete list of allowed domains after x number of days #2320

Closed

ghostwords added the low priority label Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design and implement a fancier heuristicblock data structure #266

Design and implement a fancier heuristicblock data structure #266

pde commented Aug 18, 2014

pde commented Aug 18, 2014

jcb82 commented Aug 23, 2014

cooperq commented Sep 28, 2016

pde commented Dec 15, 2016

pde commented Dec 15, 2016 •

edited

Loading

pjlsergeant commented Dec 15, 2016 •

edited

Loading

pjlsergeant commented Dec 15, 2016 •

edited

Loading

pde commented Dec 15, 2016

pjlsergeant commented Dec 16, 2016

ghostwords commented Mar 31, 2018

Design and implement a fancier heuristicblock data structure #266

Design and implement a fancier heuristicblock data structure #266

Comments

pde commented Aug 18, 2014

pde commented Aug 18, 2014

jcb82 commented Aug 23, 2014

cooperq commented Sep 28, 2016

pde commented Dec 15, 2016

pde commented Dec 15, 2016 • edited Loading

pjlsergeant commented Dec 15, 2016 • edited Loading

pjlsergeant commented Dec 15, 2016 • edited Loading

pde commented Dec 15, 2016

pjlsergeant commented Dec 16, 2016

ghostwords commented Mar 31, 2018

pde commented Dec 15, 2016 •

edited

Loading

pjlsergeant commented Dec 15, 2016 •

edited

Loading

pjlsergeant commented Dec 15, 2016 •

edited

Loading