Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent keystroke and mouse movement tracking (detect and learn to block session replay scripts like Hotjar) #715

Open
kesara opened this issue Dec 25, 2015 · 11 comments
Labels
enhancement heuristic Badger's core learning-what-to-block functionality privacy General privacy issues; stuff that isn't about Privacy Badger's heuristic

Comments

@kesara
Copy link

kesara commented Dec 25, 2015

Services like Hotjar should be blocked by default, at the moment Privacy Badger does not do this.

With hotjar it is possible to record what users does on the website, not just mouse movements but keystrokes as well.
So what users type inside the site will be recorded, even though they don't submit that information. These per user recordings can be played back on hotjar.

Hotjar doesn't give information like IP addresses for those recordings, but if a user enter their personal details on that site, it will be recorded on those sessions. Hotjar claims that they mask out Credit Card and password fields. (Hotjar might still collect those information and just mask out on the presentation side.)

I think recording of these per user recordings are violation of the privacy and privacy badger should be able to detect them and block them.

References:

@cooperq
Copy link
Contributor

cooperq commented Feb 5, 2016

One of the core principals of Privacy Badger is that we do not maintain a blacklist and we do not block anything by default. That said, it certainly seems like the case that Hotjar is a particularly egregious example of tracking. I would like Privacy Badger to be able to detect and block tracking of this sort. If someone was able to come up with a heuristic for detecting this I would happily add it to privacy badger.
Do you have any examples of sites using this technology?

@kesara
Copy link
Author

kesara commented Feb 6, 2016

I've noticed that http://www.avg.com/ uses HotJar.

@cooperq cooperq changed the title Services like Hotjar should be blocked by default. Feb 9, 2016
@cooperq
Copy link
Contributor

cooperq commented Feb 9, 2016

Great! Now we just need to figure out how they are tracking and come up with a heuristic for it.

@ghostwords ghostwords added the heuristic Badger's core learning-what-to-block functionality label Mar 31, 2017
@ghostwords
Copy link
Member

ghostwords commented Nov 21, 2017

Here is a recently-released privacy study that looks into session-replay scripts: No boundaries: Exfiltration of personal data by session-replay scripts

Edit: 2020 update/paper

Edit: 2022 paper

@ghostwords ghostwords added the privacy General privacy issues; stuff that isn't about Privacy Badger's heuristic label Nov 21, 2017
@chipironcin
Copy link

chipironcin commented Nov 22, 2017

Hotjar scripts need to bind their logging to the hooks/events the browser provides, uh?
The only way of preventing recording is by removing any external listener from mouse and keyboard events.
Am I wrong? Can someone shed more light?

This is quite a deal, even some adblockers have those kind of sites in the blacklist, user privacy should have been never violated.

@bcyphers
Copy link
Contributor

It looks like HotJar in particular uses third-party cookies, so it will be blocked if the user visits enough sites that use it. That doesn't mean we shouldn't try to implement heuristics for these techniques too, but it might be lower-priority unless we find another culprit that won't get blocked anyway.

@pabloab
Copy link

pabloab commented Jan 3, 2019

Probably this issue should be renamed to "Prevent fingerprinting using keystrokes timing" as stated on this articke (from FSF). Already exist a Chrome extension that do this: Keyboard Privacy.

@ghostwords ghostwords changed the title Detect tracking like what hotjar does Jan 3, 2019
@bcyphers
Copy link
Contributor

bcyphers commented Jun 24, 2020

Another culprit is luckyorange (https://luckyorange.com). Clickstream data is sent via websocket to wss://visitors.live. Examples: https://www.foxbrim.com/, https://www.percona.com/

@bcyphers
Copy link
Contributor

bcyphers commented Jun 25, 2020

have been looking at other similar services, and it looks like Lucky Orange is the exception in terms of not being blocked. For example, here are some other services:

Hotjar WTM DDG

Crazy Egg WTM DDG

Wingify (VWO) WTM DDG

Fullstory WTM DDG

  • 140 sites
  • uses HTTP requests with json payload to rs.fullstory.com
  • Blocked (3rd party cookies)
  • sites: lowes.com, officedepot.com, staples.com

Mouseflow WTM DDG

Quantum Metric WTM DDG

AB Tasty WTM DDG

Smartlook WTM

  • uses POST requests to smartlook.cloud subdomains
  • Blocked (not sure how, no cookies present)
  • example: https://paxful.com/

Lucky Orange WTM DDG

  • 38 sites
  • uses websockets to send data to wss://visitors.live
  • Not blocked (!)
  • Scripts hosted on 3rd party CDNs, so may be undercounted

Inspectlet WTM DDG

Contentsquare / Clicktale WTM DDG / WTM DDG

SessionCam WTM DDG

most of them are already blocked because they use third-party cookies as well. None of them that I could find use websockets besides lucky orange.

This list isn't exhaustive, there are other potential trackers here, but these were the ones I could find and confirm they are actually doing session-replay stuff.

Blocking lucky orange will be particularly difficult because WebRequest doesn't give us insight into the data sent over a websocket. (https://developer.chrome.com/extensions/webRequest). It looks like there are first-party cookies being shared over the websocket, but in order to see them, we'd need to instrument the actual websocket interface using a content script (like https://github.com/gorhill/chromium-websocket-wrapper/blob/master/chromium-websocket-wrapper.js). The other thing we could do is, like #715 (comment) says, to instrument the onKeyDown and onMouseMove listeners and try to figure out which domains are listening to them. I have not thought this through very much but imagine it would lead to a lot of false positives and be fairly hard to maintain.

All this is to say: Privacy Badger already blocks most session replay scripts, and I don't see an easy way to catch Lucky Orange with heuristics right now.


More session replay candidates for review:

Yandex DDG DDG WTM

Flashtalking DDG WTM

Bombora DDG WTM

Ezoic DDG

Permutive DDG WTM

Qualtrics DDG WTM

Perfect Market DDG WTM

Tealium DDG WTM

Heap DDG WTM

Mixpanel DDG WTM

ForeSee DDG WTM

@genebean
Copy link

FWIW, I found that on at least one site Fullstory was not blocked or even detected by privacy badger with Firefox 83 or 85.0b2 on macOS

@ghostwords ghostwords changed the title Prevent keystroke and mouse movement tracking (Hotjar) Jan 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement heuristic Badger's core learning-what-to-block functionality privacy General privacy issues; stuff that isn't about Privacy Badger's heuristic
8 participants