subscribe to arXiv mailings

P4: Towards private, personalized, and Peer-to-Peer learning

Authors: Mohammad Mahdi Maheri, Sandra Siby, Sina Abdollahi, Anastasia Borovykh, Hamed Haddadi

Abstract: Personalized learning is a proposed approach to address the problem of data heterogeneity in collaborative machine learning. In a decentralized setting, the two main challenges of personalization are client clustering and data privacy. In this paper, we address these challenges by developing P4 (Personalized Private Peer-to-Peer) a method that ensures that each client receives a personalized model… ▽ More Personalized learning is a proposed approach to address the problem of data heterogeneity in collaborative machine learning. In a decentralized setting, the two main challenges of personalization are client clustering and data privacy. In this paper, we address these challenges by developing P4 (Personalized Private Peer-to-Peer) a method that ensures that each client receives a personalized model while maintaining differential privacy guarantee of each client's local dataset during and after the training. Our approach includes the design of a lightweight algorithm to identify similar clients and group them in a private, peer-to-peer (P2P) manner. Once grouped, we develop differentially-private knowledge distillation for clients to co-train with minimal impact on accuracy. We evaluate our proposed method on three benchmark datasets (FEMNIST or Federated EMNIST, CIFAR-10 and CIFAR-100) and two different neural network architectures (Linear and CNN-based networks) across a range of privacy parameters. The results demonstrate the potential of P4, as it outperforms the state-of-the-art of differential private P2P by up to 40 percent in terms of accuracy. We also show the practicality of P4 by implementing it on resource constrained devices, and validating that it has minimal overhead, e.g., about 7 seconds to run collaborative training between two clients. △ Less

Submitted 31 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.05196 [pdf, other]

SINBAD: Saliency-informed detection of breakage caused by ad blocking

Authors: Saiid El Hajj Chehade, Sandra Siby, Carmela Troncoso

Abstract: Privacy-enhancing blocking tools based on filter-list rules tend to break legitimate functionality. Filter-list maintainers could benefit from automated breakage detection tools that allow them to proactively fix problematic rules before deploying them to millions of users. We introduce SINBAD, an automated breakage detector that improves the accuracy over the state of the art by 20%, and is the f… ▽ More Privacy-enhancing blocking tools based on filter-list rules tend to break legitimate functionality. Filter-list maintainers could benefit from automated breakage detection tools that allow them to proactively fix problematic rules before deploying them to millions of users. We introduce SINBAD, an automated breakage detector that improves the accuracy over the state of the art by 20%, and is the first to detect dynamic breakage and breakage caused by style-oriented filter rules. The success of SINBAD is rooted in three innovations: (1) the use of user-reported breakage issues in forums that enable the creation of a high-quality dataset for training in which only breakage that users perceive as an issue is included; (2) the use of 'web saliency' to automatically identify user-relevant regions of a website on which to prioritize automated interactions aimed at triggering breakage; and (3) the analysis of webpages via subtrees which enables fine-grained identification of problematic filter rules. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 19 pages, 13 figures, Appearing in IEEE S&P 2024

arXiv:2404.00190 [pdf, other]

GuaranTEE: Towards Attestable and Private ML with CCA

Authors: Sandra Siby, Sina Abdollahi, Mohammad Maheri, Marios Kogias, Hamed Haddadi

Abstract: Machine-learning (ML) models are increasingly being deployed on edge devices to provide a variety of services. However, their deployment is accompanied by challenges in model privacy and auditability. Model providers want to ensure that (i) their proprietary models are not exposed to third parties; and (ii) be able to get attestations that their genuine models are operating on edge devices in acco… ▽ More Machine-learning (ML) models are increasingly being deployed on edge devices to provide a variety of services. However, their deployment is accompanied by challenges in model privacy and auditability. Model providers want to ensure that (i) their proprietary models are not exposed to third parties; and (ii) be able to get attestations that their genuine models are operating on edge devices in accordance with the service agreement with the user. Existing measures to address these challenges have been hindered by issues such as high overheads and limited capability (processing/secure memory) on edge devices. In this work, we propose GuaranTEE, a framework to provide attestable private machine learning on the edge. GuaranTEE uses Confidential Computing Architecture (CCA), Arm's latest architectural extension that allows for the creation and deployment of dynamic Trusted Execution Environments (TEEs) within which models can be executed. We evaluate CCA's feasibility to deploy ML models by developing, evaluating, and openly releasing a prototype. We also suggest improvements to CCA to facilitate its use in protecting the entire ML deployment pipeline on edge devices. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: Accepted at the 4th Workshop on Machine Learning and Systems (EuroMLSys '24)

arXiv:2308.03417 [pdf, other]

PURL: Safe and Effective Sanitization of Link Decoration

Authors: Shaoor Munir, Patrick Lee, Umar Iqbal, Zubair Shafiq, Sandra Siby

Abstract: While privacy-focused browsers have taken steps to block third-party cookies and mitigate browser fingerprinting, novel tracking techniques that can bypass existing countermeasures continue to emerge. Since trackers need to share information from the client-side to the server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect… ▽ More While privacy-focused browsers have taken steps to block third-party cookies and mitigate browser fingerprinting, novel tracking techniques that can bypass existing countermeasures continue to emerge. Since trackers need to share information from the client-side to the server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect and sanitize tracking information in decorated links. To this end, we present PURL (pronounced purel-l), a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration. Our evaluation shows that PURL significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques. PURL's deployment on a sample of top-million websites shows that link decoration is abused for tracking on nearly three-quarters of the websites, often to share cookies, email addresses, and fingerprinting information. △ Less

Submitted 6 March, 2024; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2208.12370 [pdf, other]

COOKIEGRAPH: Understanding and Detecting First-Party Tracking Cookies

Authors: Shaoor Munir, Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, Carmela Troncoso

Abstract: As third-party cookie blocking is becoming the norm in browsers, advertisers and trackers have started to use first-party cookies for tracking. We conduct a differential measurement study on 10K websites with third-party cookies allowed and blocked. This study reveals that first-party cookies are used to store and exfiltrate identifiers to known trackers even when third-party cookies are blocked.… ▽ More As third-party cookie blocking is becoming the norm in browsers, advertisers and trackers have started to use first-party cookies for tracking. We conduct a differential measurement study on 10K websites with third-party cookies allowed and blocked. This study reveals that first-party cookies are used to store and exfiltrate identifiers to known trackers even when third-party cookies are blocked. As opposed to third-party cookie blocking, outright first-party cookie blocking is not practical because it would result in major functionality breakage. We propose CookieGraph, a machine learning-based approach that can accurately and robustly detect first-party tracking cookies. CookieGraph detects first-party tracking cookies with 90.20% accuracy, outperforming the state-of-the-art CookieBlock approach by 17.75%. We show that CookieGraph is fully robust against cookie name manipulation while CookieBlock's acuracy drops by 15.68%. While blocking all first-party cookies results in major breakage on 32% of the sites with SSO logins, and CookieBlock reduces it to 10%, we show that CookieGraph does not cause any major breakage on these sites. Our deployment of CookieGraph shows that first-party tracking cookies are used on 93.43% of the 10K websites. We also find that first-party tracking cookies are set by fingerprinting scripts. The most prevalent first-party tracking cookies are set by major advertising entities such as Google, Facebook, and TikTok. △ Less

Submitted 27 November, 2023; v1 submitted 25 August, 2022; originally announced August 2022.

arXiv:2203.07806 [pdf, other]

You get PADDING, everybody gets PADDING! You get privacy? Evaluating practical QUIC website fingerprinting protections for the masses

Authors: Sandra Siby, Ludovic Barman, Christopher Wood, Marwan Fayed, Nick Sullivan, Carmela Troncoso

Abstract: Website fingerprinting (WF) is a well-know threat to users' web privacy. New internet standards, such as QUIC, include padding to support defenses against WF. Previous work only analyzes the effectiveness of defenses when users are behind a VPN. Yet, this is not how most users browse the Internet. In this paper, we provide a comprehensive evaluation of QUIC-padding-based defenses against WF when u… ▽ More Website fingerprinting (WF) is a well-know threat to users' web privacy. New internet standards, such as QUIC, include padding to support defenses against WF. Previous work only analyzes the effectiveness of defenses when users are behind a VPN. Yet, this is not how most users browse the Internet. In this paper, we provide a comprehensive evaluation of QUIC-padding-based defenses against WF when users directly browse the web. We confirm previous claims that network-layer padding cannot provide good protection against powerful adversaries capable of observing all traffic traces. We further demonstrate that such padding is ineffective even against adversaries with constraints on traffic visibility and processing power. At the application layer, we show that defenses need to be deployed by both first and third parties, and that they can only thwart traffic analysis in limited situations. We identify challenges to deploy effective WF defenses and provide recommendations to address them. △ Less

Submitted 15 December, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

arXiv:2107.11309 [pdf, other]

WebGraph: Capturing Advertising and Tracking Information Flows for Robust Blocking

Authors: Sandra Siby, Umar Iqbal, Steven Englehardt, Zubair Shafiq, Carmela Troncoso

Abstract: Millions of web users directly depend on ad and tracker blocking tools to protect their privacy. However, existing ad and tracker blockers fall short because of their reliance on trivially susceptible advertising and tracking content. In this paper, we first demonstrate that the state-of-the-art machine learning based ad and tracker blockers, such as AdGraph, are susceptible to adversarial evasion… ▽ More Millions of web users directly depend on ad and tracker blocking tools to protect their privacy. However, existing ad and tracker blockers fall short because of their reliance on trivially susceptible advertising and tracking content. In this paper, we first demonstrate that the state-of-the-art machine learning based ad and tracker blockers, such as AdGraph, are susceptible to adversarial evasions deployed in real-world. Second, we introduce WebGraph, the first graph-based machine learning blocker that detects ads and trackers based on their action rather than their content. By building features around the actions that are fundamental to advertising and tracking - storing an identifier in the browser, or sharing an identifier with another tracker - WebGraph performs nearly as well as prior approaches, but is significantly more robust to adversarial evasions. In particular, we show that WebGraph achieves comparable accuracy to AdGraph, while significantly decreasing the success rate of an adversary from near-perfect under AdGraph to around 8% under WebGraph. Finally, we show that WebGraph remains robust to a more sophisticated adversary that uses evasion techniques beyond those currently deployed on the web. △ Less

Submitted 17 August, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

arXiv:1906.09682 [pdf, other]

Encrypted DNS --> Privacy? A Traffic Analysis Perspective

Authors: Sandra Siby, Marc Juarez, Claudia Diaz, Narseo Vallina-Rodriguez, Carmela Troncoso

Abstract: Virtually every connection to an Internet service is preceded by a DNS lookup which is performed without any traffic-level protection, thus enabling manipulation, redirection, surveillance, and censorship. To address these issues, large organizations such as Google and Cloudflare are deploying recently standardized protocols that encrypt DNS traffic between end users and recursive resolvers such a… ▽ More Virtually every connection to an Internet service is preceded by a DNS lookup which is performed without any traffic-level protection, thus enabling manipulation, redirection, surveillance, and censorship. To address these issues, large organizations such as Google and Cloudflare are deploying recently standardized protocols that encrypt DNS traffic between end users and recursive resolvers such as DNS-over-TLS (DoT) and DNS-over-HTTPS (DoH). In this paper, we examine whether encrypting DNS traffic can protect users from traffic analysis-based monitoring and censoring. We propose a novel feature set to perform the attacks, as those used to attack HTTPS or Tor traffic are not suitable for DNS' characteristics. We show that traffic analysis enables the identification of domains with high accuracy in closed and open world settings, using 124 times less data than attacks on HTTPS flows. We find that factors such as location, resolver, platform, or client do mitigate the attacks performance but they are far from completely stopping them. Our results indicate that DNS-based censorship is still possible on encrypted DNS traffic. In fact, we demonstrate that the standardized padding schemes are not effective. Yet, Tor -- which does not effectively mitigate traffic analysis attacks on web traffic -- is a good defense against DoH traffic analysis. △ Less

Submitted 6 October, 2019; v1 submitted 23 June, 2019; originally announced June 2019.

arXiv:1701.05007 [pdf, other]

IoTScanner: Detecting and Classifying Privacy Threats in IoT Neighborhoods

Authors: Sandra Siby, Rajib Ranjan Maiti, Nils Tippenhauer

Abstract: In the context of the emerging Internet of Things (IoT), a proliferation of wireless connectivity can be expected. That ubiquitous wireless communication will be hard to centrally manage and control, and can be expected to be opaque to end users. As a result, owners and users of physical space are threatened to lose control over their digital environments. In this work, we propose the idea of an… ▽ More In the context of the emerging Internet of Things (IoT), a proliferation of wireless connectivity can be expected. That ubiquitous wireless communication will be hard to centrally manage and control, and can be expected to be opaque to end users. As a result, owners and users of physical space are threatened to lose control over their digital environments. In this work, we propose the idea of an IoTScanner. The IoTScanner integrates a range of radios to allow local reconnaissance of existing wireless infrastructure and participating nodes. It enumerates such devices, identifies connection patterns, and provides valuable insights for technical support and home users alike. Using our IoTScanner, we attempt to classify actively streaming IP cameras from other non-camera devices using simple heuristics. We show that our classification approach achieves a high accuracy in an IoT setting consisting of a large number of IoT devices. While related work usually focuses on detecting either the infrastructure, or eavesdropping on traffic from a specific node, we focus on providing a general overview of operations in all observed networks. We do not assume prior knowledge of used SSIDs, preshared passwords, or similar. △ Less

Submitted 18 January, 2017; originally announced January 2017.

Comments: 12 pages

Showing 1–9 of 9 results for author: Siby, S