-
COOKIEGRAPH: Understanding and Detecting First-Party Tracking Cookies
Authors:
Shaoor Munir,
Sandra Siby,
Umar Iqbal,
Steven Englehardt,
Zubair Shafiq,
Carmela Troncoso
Abstract:
As third-party cookie blocking is becoming the norm in browsers, advertisers and trackers have started to use first-party cookies for tracking. We conduct a differential measurement study on 10K websites with third-party cookies allowed and blocked. This study reveals that first-party cookies are used to store and exfiltrate identifiers to known trackers even when third-party cookies are blocked.…
▽ More
As third-party cookie blocking is becoming the norm in browsers, advertisers and trackers have started to use first-party cookies for tracking. We conduct a differential measurement study on 10K websites with third-party cookies allowed and blocked. This study reveals that first-party cookies are used to store and exfiltrate identifiers to known trackers even when third-party cookies are blocked.
As opposed to third-party cookie blocking, outright first-party cookie blocking is not practical because it would result in major functionality breakage. We propose CookieGraph, a machine learning-based approach that can accurately and robustly detect first-party tracking cookies. CookieGraph detects first-party tracking cookies with 90.20% accuracy, outperforming the state-of-the-art CookieBlock approach by 17.75%. We show that CookieGraph is fully robust against cookie name manipulation while CookieBlock's acuracy drops by 15.68%. While blocking all first-party cookies results in major breakage on 32% of the sites with SSO logins, and CookieBlock reduces it to 10%, we show that CookieGraph does not cause any major breakage on these sites.
Our deployment of CookieGraph shows that first-party tracking cookies are used on 93.43% of the 10K websites. We also find that first-party tracking cookies are set by fingerprinting scripts. The most prevalent first-party tracking cookies are set by major advertising entities such as Google, Facebook, and TikTok.
△ Less
Submitted 27 November, 2023; v1 submitted 25 August, 2022;
originally announced August 2022.
-
WebGraph: Capturing Advertising and Tracking Information Flows for Robust Blocking
Authors:
Sandra Siby,
Umar Iqbal,
Steven Englehardt,
Zubair Shafiq,
Carmela Troncoso
Abstract:
Millions of web users directly depend on ad and tracker blocking tools to protect their privacy. However, existing ad and tracker blockers fall short because of their reliance on trivially susceptible advertising and tracking content. In this paper, we first demonstrate that the state-of-the-art machine learning based ad and tracker blockers, such as AdGraph, are susceptible to adversarial evasion…
▽ More
Millions of web users directly depend on ad and tracker blocking tools to protect their privacy. However, existing ad and tracker blockers fall short because of their reliance on trivially susceptible advertising and tracking content. In this paper, we first demonstrate that the state-of-the-art machine learning based ad and tracker blockers, such as AdGraph, are susceptible to adversarial evasions deployed in real-world. Second, we introduce WebGraph, the first graph-based machine learning blocker that detects ads and trackers based on their action rather than their content. By building features around the actions that are fundamental to advertising and tracking - storing an identifier in the browser, or sharing an identifier with another tracker - WebGraph performs nearly as well as prior approaches, but is significantly more robust to adversarial evasions. In particular, we show that WebGraph achieves comparable accuracy to AdGraph, while significantly decreasing the success rate of an adversary from near-perfect under AdGraph to around 8% under WebGraph. Finally, we show that WebGraph remains robust to a more sophisticated adversary that uses evasion techniques beyond those currently deployed on the web.
△ Less
Submitted 17 August, 2021; v1 submitted 23 July, 2021;
originally announced July 2021.
-
Fingerprinting the Fingerprinters: Learning to Detect Browser Fingerprinting Behaviors
Authors:
Umar Iqbal,
Steven Englehardt,
Zubair Shafiq
Abstract:
Browser fingerprinting is an invasive and opaque stateless tracking technique. Browser vendors, academics, and standards bodies have long struggled to provide meaningful protections against browser fingerprinting that are both accurate and do not degrade user experience. We propose FP-Inspector, a machine learning based syntactic-semantic approach to accurately detect browser fingerprinting. We sh…
▽ More
Browser fingerprinting is an invasive and opaque stateless tracking technique. Browser vendors, academics, and standards bodies have long struggled to provide meaningful protections against browser fingerprinting that are both accurate and do not degrade user experience. We propose FP-Inspector, a machine learning based syntactic-semantic approach to accurately detect browser fingerprinting. We show that FP-Inspector performs well, allowing us to detect 26% more fingerprinting scripts than the state-of-the-art. We show that an API-level fingerprinting countermeasure, built upon FP-Inspector, helps reduce website breakage by a factor of 2. We use FP-Inspector to perform a measurement study of browser fingerprinting on top-100K websites. We find that browser fingerprinting is now present on more than 10% of the top-100K websites and over a quarter of the top-10K websites. We also discover previously unreported uses of JavaScript APIs by fingerprinting scripts suggesting that they are looking to exploit APIs in new and unexpected ways.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Actions speak louder than words: Semi-supervised learning for browser fingerprinting detection
Authors:
Sarah Bird,
Vikas Mishra,
Steven Englehardt,
Rob Willoughby,
David Zeber,
Walter Rudametkin,
Martin Lopatka
Abstract:
As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we…
▽ More
As online tracking continues to grow, existing anti-tracking and fingerprinting detection techniques that require significant manual input must be augmented. Heuristic approaches to fingerprinting detection are precise but must be carefully curated. Supervised machine learning techniques proposed for detecting tracking require manually generated label-sets. Seeking to overcome these challenges, we present a semi-supervised machine learning approach for detecting fingerprinting scripts. Our approach is based on the core insight that fingerprinting scripts have similar patterns of API access when generating their fingerprints, even though their access patterns may not match exactly. Using this insight, we group scripts by their JavaScript (JS) execution traces and apply a semi-supervised approach to detect new fingerprinting scripts. We detail our methodology and demonstrate its ability to identify the majority of scripts ($\geqslant$94.9%) identified by existing heuristic techniques. We also show that the approach expands beyond detecting known scripts by surfacing candidate scripts that are likely to include fingerprinting. Through an analysis of these candidate scripts we discovered fingerprinting scripts that were missed by heuristics and for which there are no heuristics. In particular, we identified over one hundred device-class fingerprinting scripts present on hundreds of domains. To the best of our knowledge, this is the first time device-class fingerprinting has been measured in the wild. These successes illustrate the power of a sparse vector representation and semi-supervised learning to complement and extend existing tracking detection techniques.
△ Less
Submitted 9 March, 2020;
originally announced March 2020.
-
Networks of Innovation in 3D Printing
Authors:
Harris Kyriakou,
Steven Englehardt,
Jeffrey V. Nickerson
Abstract:
Innovation inside companies is difficult to see. But an emerging online community of inventors who publicly post 3D CAD drawings of their work provide a way to observe - and perhaps amplify - innovation. In this paper we analyze the network structure of Thingiverse, a website oriented toward 3D printing. This form of printing blurs the line between creating information and manufacturing objects: d…
▽ More
Innovation inside companies is difficult to see. But an emerging online community of inventors who publicly post 3D CAD drawings of their work provide a way to observe - and perhaps amplify - innovation. In this paper we analyze the network structure of Thingiverse, a website oriented toward 3D printing. This form of printing blurs the line between creating information and manufacturing objects: drawings can be sent to devices that build 3D objects out of many materials, including resin, ceramics, and metal. As an exploratory study, we analyzed the structure of Thingiverse links. Our results suggest that analysis of remix network structure may provide ways of tracing innovation processes and detecting the emergence of new ideas, combination of disparate ideas.
△ Less
Submitted 3 November, 2013;
originally announced November 2013.