Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-train Badger on popular websites #1891

Closed
bcyphers opened this issue Feb 22, 2018 · 2 comments · Fixed by #1947
Closed

Pre-train Badger on popular websites #1891

bcyphers opened this issue Feb 22, 2018 · 2 comments · Fixed by #1947
Labels
enhancement heuristic Badger's core learning-what-to-block functionality important

Comments

@bcyphers
Copy link
Contributor

There have been several suggestions that PB ship with a "blacklist" of sites that are known trackers (#1333). The team has avoided this in order to stay out of the business of deciding who is and isn't a tracker -- it's more fair to let the Badger learn on its own. Still, it would be great to have PB start blocking common trackers right out of the box.

One way to achieve that without breaking our policy would be to have a "training set" of websites that PB visits before being installed. We could set a list of, say, 1000 web pages from the Alexa or Majestic top million, visit each one with Privacy Badger, and have users opt-in to using the pre-trained list of trackers. The training set would be public, auditable, and editable by pull requests. Hopefully, this would mitigate the "why isn't pb working?" FAQs that new users have.

@ghostwords ghostwords added heuristic Badger's core learning-what-to-block functionality important labels Feb 22, 2018
@ghostwords
Copy link
Member

ghostwords commented Feb 22, 2018

Related to #1019, #1299, #1374.

@jawz101
Copy link
Contributor

jawz101 commented Mar 1, 2018

It should also be completely wiped and re-trained every year or so to account for techniques and domains that are either newer or no longer exist.

https://www.quantcast.com/top-sites/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement heuristic Badger's core learning-what-to-block functionality important
3 participants