Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Privacy Badger blocks CDN, although no cookies etc #2003

Open
kajmagnus opened this issue May 5, 2018 · 8 comments
Open

Privacy Badger blocks CDN, although no cookies etc #2003

kajmagnus opened this issue May 5, 2018 · 8 comments
Labels
broken site DNT policy EFF's Do Not Track policy: www.eff.org/dnt-policy question Further information is requested

Comments

@kajmagnus
Copy link

kajmagnus commented May 5, 2018

I'm developing an embedded commenting system, like Disqus but open source, no ads, no tracking. There's SaaS hosting and a CDN.

Privacy Badger blocks the CDN — not immediately, but later, after I've visited a few (three?) different blogs, with the comments embedded. The CDN sets no cookies. The scripts from the CDN, reads from, but haven't written to, localStorage (they could do, though). No canvas fingerprinting. Still the CDN gets blocked.

Does the CDN need to be added to the yellowlist? (i.e.: https://github.com/EFForg/privacybadger/blob/master/src/data/yellowlist.txt )
Some other similar software (Discourse) is on the yellowlist. (e.g. cdn.discourse.org and disqus.com)

(Here's one place where the embeded comments are being used, in case you'd like to understand better what it is: https://www.kajmagnus.blog/new-embedded-comments. Scroll down to see the embedded comments. (it uses a different older & deprecated CDN CNAME though so won't get blocked))

Tech details:

The blog post HTML page loads an iframe for comments, and another iframe with a text editor. Both iframes have origin comments-for-(blog-addr).talkyard.net. And these two iframes, load scripts from cdn.talkyard.net. Here's how it looks, for this Jekyll demo blog post:

// The blog post HTML page loads:
installation-instructions.html  from jekyll-demo.talkyard.io  (normally, .net not .io)

// It loads a tiny script that creates the iframes:
talkyard-comments.min.js  from cdn.talkyard.net, initiator: installation-instructions.html

// The iframes load:
embedded-comments?discussionId=demo-and-inst-instr…yard.io/2018/01/09/installation-instructions.html  from: comments-for-jekyll-comments-demo.talkyard.io,  initiator: talkyard-comments.min.js
embedded-editor?discussionId=demo-and-inst-instrs&…yard.io/2018/01/09/installation-instructions.html 
  from: comments-for-jekyll-comments-demo.talkyard.io,  initiator: talkyard-comments.min.js

// Inside the iframes, more scripts are loaded:
polyfill.min.js          from: cdn.polyfill.io,  initiator: embedded-editor?discussionId=demo-and-inst-instrs&embeddingUrl=https://jekyll-demo.talkyard.io/2018…
i18n.min.js              cdn.talkyard.net initiator: embedded-editor?discussionId=demo-and-inst-instrs&embeddingUrl=https://jekyll-demo.talkyard.io/2018…
slim-bundle.min.js       cdn.talkyard.net initiator: embedded-editor?discussionId=demo-and-inst-instrs&embeddingUrl=https://jekyll-demo.talkyard.io/2018…
polyfill.min.js          cdn.polyfill.io  initiator: embedded-comments?discussionId=demo-and-inst-instrs&embeddingUrl=https://jekyll-demo.talkyard.io/20…
i18n.min.js              cdn.talkyard.net initiator: embedded-comments?discussionId=demo-and-inst-instrs&embeddingUrl=https://jekyll-demo.talkyard.io/20…
slim-bundle.min.js       cdn.talkyard.net initiator: embedded-comments?discussionId=demo-and-inst-instrs&embeddingUrl=https://jekyll-demo.talkyard.io/20…

// (cdn.talkyard.net is a DNS CNAME for Keycdn's CDN, b.t.w.)

Problem

So, PB thinks the CDN is a tracker, because 1) the CDN is present at many differen websites (blogs), plus 2) some other reason(s) that I don't understand? (or is 1) enough?).

The embedded HTML comments pages, on the comments-for-(blog-addr).talkyard.net domain, currently do set cookies, always: a XSRF security cookie, and a browser id cookie for assisting with rate limiting and blocking bad people, and a session cookie if one logs in. But this shouldn't matter? The embedded comments HTML pages (comments-for...) are reachable only via one website (i.e. the blog where the comments get embedded). So the [seen at 3 different places so it's a tracker] rule then cannot be broken? ...

... Nevertheless, the comments-for... requests also get blocked by Privacy Badger. So, it's both the CDN (although no script, no localStorage, no fingerprinting), and the embedded HTML pages (they do set cookies, but are only embedded at one singe blog, each.) — I'm surprised that any of them get blocked?

Going to chrome://settings/siteData in Chrome, there're lots of cookies when I search for talkyard.net — but they are all on subdomains like comments-for-...talkard.net. (And no cookies on cdn.talkyard.net.)

What do you think? What are the next steps to take, to fix this problem? or to continue trouobleshooting? I ran an "action map" script, see my comment below.

(Thanks for reading all this b.t.w. :- ))

@kajmagnus
Copy link
Author

Action map, after having installed Privacy Badger in Firefox and visited a few blogs with the embedded comments:

(function () {
  const STR = "talkyard.net";
  console.log("**** ACTION_MAP for", STR);
  _.each(badger.storage.getBadgerStorageObject('action_map').getItemClones(), (obj, domain) => {
    if (domain.indexOf(STR) != -1) console.log(domain, JSON.stringify(obj, null, 2));
  });
  console.log("**** SNITCH_MAP for", STR);
  _.each(badger.storage.getBadgerStorageObject('snitch_map').getItemClones(), (sites, domain) => {
    if (domain.indexOf(STR) != -1) console.log(domain, JSON.stringify(sites, null, 2));
  });
}());
**** ACTION_MAP for talkyard.net 
comments-for-blog-alexk-io.talkyard.net {
  "userAction": "user_cookieblock",
  "dnt": false,
  "heuristicAction": "allow",
  "nextUpdateTime": 1525529331461
} 
talkyard.net {
  "userAction": "",
  "dnt": false,
  "heuristicAction": "block",
  "nextUpdateTime": 0
} 
comments-for-horstmann-com.talkyard.net {
  "userAction": "",
  "dnt": false,
  "heuristicAction": "allow",
  "nextUpdateTime": 1525754335138
} 
comments-for-www-yellicode-com.talkyard.net {
  "userAction": "user_cookieblock",
  "dnt": false,
  "heuristicAction": "block",
  "nextUpdateTime": 1525608294557
} 
cdn.talkyard.net {
  "userAction": "",
  "dnt": false,
  "heuristicAction": "",
  "nextUpdateTime": 1525774069687
} 
**** SNITCH_MAP for talkyard.net 
talkyard.net [
  "alexk.io",
  "horstmann.com",
  "yellicode.com"
]
@ghostwords ghostwords added DNT policy EFF's Do Not Track policy: www.eff.org/dnt-policy broken extension labels May 7, 2018
@ghostwords
Copy link
Member

This is related to #963.

@ghostwords
Copy link
Member

Privacy Badger sees the XSRF-TOKEN session cookie (there is also a far-future dwCoBrId cookie) on each Talkyard-powered site, coming from a "comments-for-" talkyard.net subdomain, and marks that as a strike against talkyard.net. Once Privacy Badger sees this happen on three distinct sites, all talkyard.net requests get blocked.

@ghostwords
Copy link
Member

ghostwords commented May 7, 2018

If Talkyard is able and willing to abide by the EFF Do Not Track policy's requirements on all talkyard.net domains used by external sites, posting the policy on each talkyard.net domain will tell Privacy Badger to always allow loading of resources from that domain.

@ghostwords
Copy link
Member

Does this make sense? Let me know if you have any questions.

@kajmagnus
Copy link
Author

kajmagnus commented May 10, 2018

Hi @ghostwords! Thanks for the reply & for looking into this :- )
(I wrote this reply on Monday but then I didn't post it, was going to proofread what I wrote :- ))

1.

I'm curious about what's the rationale for blocking talkyard.net because the subdomains comments-for...takyard.net set cookies? I'm thinking that since the cookies are per subdomain (no cookies on talkyard.net), and each subdomain is embedded at only one blog / website, the cookies in this case cannot be used for tracking people across the internet?

There's a X-Content-Security-Policy: frame-ancestors ... HTTP header that prevents embedding the blog comments sudbomain, at places other than at the relevant blog.

I'm wondering, maybe this particular embedded-comments case wasn't taken into account, when Privacy Badger came up with the rules for what to block?

2.

About the DNT policy: To me, the DNT policy seems to conflict a bit, with security. The DNT policy seems to require Talkyard to forget people quite quickly — and I'm thinking this could make it easier to do certain DoS attacks and astroturfing things. Some governments' internet troll armies, create astroturfing accounts years in advance. To be able to investigate if a bunch of accounts were created from roughly the same place, a few years later, I think it'd be helpful to remember at least the first parts of the ip address. Not sure what DNT thinks about that — remembering e.g. 111.222.333.0 but zeroing the last octet. (What are your thoughts?)

3.

Anyway, I should avoid setting the XSRF and browser id cookies, until they're needed for sure (or maybe I can remove them, and include the XSRF token in a <html> tag instead).

But what about the session cookie — if someone signs up and posts comments at 3 different blogs, will P.B. then start believing that the comments are a tracker, and block them? Because of the session cookie?

(Actually, also Safari on iOS reportedly start blocking the embedded comments. I suppose these cookies is why.)

4.

The CDN. Maybe it'd be safer if I moved it to a completely diffrent domain (what do you think?). So, even if talkyard.net and all comments-for-...talkyard.net domains get blocked, the CDN will still work, for those who have their comments-for-... sites on their own custom domain.

@kajmagnus
Copy link
Author

kajmagnus commented May 24, 2018

@ghostwords What about transient cookies a.k.a. session cookies? I.e. with no max age, that disappear when the browser closes? Does P.B. react to such cookies and consider them maybe tracking, or is P.B. ok with them?

(I searched a bit for P.B + transient cookies / session cookies. Here I found a comment about those cookies: #1539 (comment) (by you b.t.w. :- )) — it doesn't clarify how things work though. )

@ghostwords
Copy link
Member

ghostwords commented May 24, 2018

1: It is very possible we overlooked this use case. Privacy Badger treats three different cookies (one per site) set by three different subdomains of the same third-party domain the same way Privacy Badger treats a single cookie set by one domain.

2: @alanton Could you chime in regarding balancing our DNT policy's privacy demands with service provider security considerations?

3: We are going to look into taking cookie duration into consideration (#1545), but we don't at this time.

4: Yes! Using a distinct, cookie-less domain for content delivery is also good from security and performance perspectives.

Sorry for not replying earlier!

@ghostwords ghostwords added the question Further information is requested label Nov 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken site DNT policy EFF's Do Not Track policy: www.eff.org/dnt-policy question Further information is requested
2 participants