KnowPrivacy: a web privacy investigation

In June 2009, I published a landmark paper with two colleagues at the UC Berkeley School of Information examining the common practices among website operators of collecting, sharing and analyzing users’ data.  We compared industry practices with users’ expectations of privacy, identified points of divergence, and made recommendations for changes in industry practice and government regulation.

The goal of KnowPrivacy was to influence policy governing data collection and sharing practices employed by popular Internet sites. We identified deceptive practices that may be harmful to users’ privacy. The team not only published a paper, but also built a website that illustrated the prevalence of web tracking software among the most visited websites.

User expectations

We assessed users’ perceptions, expectations and knowledge based on three sources.  First, we gathered data from surveys of public opinions found in previous research done by various public policy and polling organizations. Next, we analyzed which practices upset them enough to file complaints with privacy watchdog organizations such as the FTC, the Privacy Rights Clearinghouse, the California Office of Privacy Protection, and TRUSTe. Finally, we looked at popular media to get a sense of what was discussed in articles about internet privacy.  We used this content as a proxy for what issues users are aware of and what they may not know.

Website practices

We conducted a survey of website privacy policies to get a corresponding understanding of website practices.  We identified the types of data that sites collect about users, the purposes for which that data is used, and with whom that data is shared. We looked specifically at the use of third-party tracking beacons, which are usually excluded from the provisions laid out in a website’s privacy policy. We also investigated the practice of sharing data with “affiliates.”

From these various sources of data we identified points of conflict between the privacy expectations of Internet users and the actual practices of website operators.

Key findings

User Expectations and Knowledge
  • Users are concerned about data collection online and want greater control over their personal information.
  • Users lack awareness of some data collection practices.
  • Users don’t know where to file their complaints.
Website Practices
  • Websites collect and analyze data about users, but only offer partial access and control to the users.
  • Website policies are unclear about important issues, such as retention and data enhancement.
  • Websites claim they do not share user data with third parties, however, they share with affiliates with whom users often have no relationship.
  • Web bug trackers are ubiquitous. Analytics and ad serving companies can track user behavior across large portions of the web.

Web bugs: prevalent

The paper and corresponding website at knowprivacy.org include data and charts that highlight the nearly ubiquitous presence of web bugs – pieces of code that track, monitor and report web usage back to website owners. Web bugs enable third parties to place cookies on a user‘s browser and track the user‘s navigation across the web.

Web bugs are typically a small graphic embedded in the page, usually an invisible 1-by-1 pixel, and are also called web beacons, clear GIFs, or pixel tags. Ad networks can use web bugs to aggregate information and create a profile of what sites a person has visited. The personal profile of a user is identified by the browser cookie of an ad network, allowing the network to track behavior across sites and over time.

tracking1_large

Because these web bugs are invisible, users are unlikely to notice them and cannot be relied upon to regulate this practice.  In fact, there are few effective controls for this tracking technology.  When this research was conducted, all of the top 50 websites contained at least one web bug in a one month time period. Some had as many as 100.

Of greater note was the depth of coverage that some tracking companies have. Several of the tracking companies had a web bug on the majority of the top 100 sites. Google, in particular, had extensive coverage – it had a web bug on 92 of the top 100 sites, and on 88% of the total domains reported in the data set of almost 400,000 unique domains.

What data is used for

Website operators use information about user behavior for various purposes. They can use the data for the development and improvement of the website, making it easier to use. They can customize a site to fit individual users‘ tastes. An e-commerce site can make product recommendations based on previous purchases or they can use the information to deliver targeted ads. Many of these uses benefit the visitors to the site and are actively sought by consumers.

Data sharing

Sometimes site operators will rent or sell personal and behavioral data about users to third parties. More often, the operators will share the data with marketing partners or corporate affiliates and subsidiaries. This means that user behavior may be profiled not only by sites visited by a user, but also by any other entities with whom those sites share this information.

Although these practices are sometimes noted in a website’s privacy policy, it is often unclear what a website means by the terms “affiliate”, “third party”, and “partner”. Our analysis of privacy policies found that many stated they do not share data with third parties, but they do share data with affiliates, suggesting that they only share data with companies under the same corporate ownership. Additionally, many of these websites also allow third parties to track user behavior directly through the use of web bugs. In a conversation with one of the website‘s Chief Privacy Officer, he claimed that they consider the advertising serving company DoubleClick to be a “marketing partner” and not a third party.

Site profiles

The KnowPrivacy project built site profiles for the top fifty most visited sites. Each site’s profile provides:

  • types of data collected from users
  • general data collection practices
  • data sharing practices
  • the number of web bugs found on the site in March 2009

Here is Google’s profile, based on our March 2009 findings. Click here to see additional profiles and rollover information.

knowprivacy.googleprofile

The problems with web bugs: pervasive yet invisible

Our analysis of user expectations found that users are concerned about data collection and want greater control over the process, but that they only voice their concerns when they perceive an invasion of privacy. Because web bugs are essentially invisible to users, they are not perceived as a threat, despite the fact that users have little control over the data collection by web bugs.

Some argue that if users do not like a website’s practices, they can simply avoid the website. However, users have effectively no ability to detect, understand, or avoid third-party tracking because third-party trackers are not governed by a website’s privacy policy. There is no opt-out, let alone opt-in. Website operators have no incentive to allow users to view or delete information collected about them, which means there is no transparency in what data is collected.

Users cannot avoid trackers by avoiding websites that use web bugs; KnowPrivacy’s data shows that trackers are ubiquitous on the web.  Many browsers give the user the option to block third party cookies, but this does not block JavaScript web bugs.  Browser technology could create a system by which users could block content coming from a server other than the one serving the web page.  However, that would also block a lot of desired content, such as embedded videos, or framed websites that result from a Google image search, and would disrupt web advertising norms. This is a case of market failure, as users have no options to protect their privacy.

The Network Advertising Initiative (NAI), a “cooperative of online marketing and analytics companies” [NAI, 2009], currently has an opt-out mechanism that requires users to download a cookie, which will let direct advertisers know not to install any third-party tracking cookies on the user’s computer.  This method of opt-out is unacceptable.  First, it only governs members of the NAI; tracking companies that are not members will still be able to use cookies and web bugs to collect data about users.  Second, users that decide to delete cookies on their machine may delete the NAI cookie inadvertently and open up their machine to third-party tracking again.This is obviously not an effective solution to the problem.

Recommendations

Transparency

The KnowPrivacy team recommended that the practice of third-party tracking be made more transparent. It currently operates in a policy loophole, by which neither the website nor the tracker are clearly accountable for the data collected. We recommended that websites define the policies of the third party trackers it allows on its site, or at a minimum, link to the appropriate policies on the tracking companies’ websites and specify which practices fall under each policy.

Access

We recommended regulation by which third-party trackers must allow users to see all the data that has been collected about them.

Salience

The presence and purpose of third party tracking should also be made more salient in the minds of users. The team recommended that all browser developers provide a Ghostery-like function in their browsers that alerts users to the presence of third-party trackers.

What happened after publication

Following this paper’s release, The Wall Street Journal asked me to conduct additional data research for articles they were writing related to consumers and web privacy.  What started as a one-time request for data from the paper turned into a series of articles called What They Know, published from 2010 to 2012.

Additionally, federal lawmakers took notice. The Federal Trade Commission (FTC) referenced KnowPrivacy’s findings in an agency report, Protecting Consumer Privacy in an Era of Rapid Change: A Proposed Framework for Businesses and Policymakers.

Next: What They Know