Using faulty data to demand settlements from innocent surfers

Content industries are pushing "three-strikes" Internet disconnection laws around the world, but how accurate are the detection methods used to bust online infringers? Princeton computer scientist Mike Freedman says that there's still a big need for improvement after one of his projects attracted 100 warning and settlement letters in September 2009 alone, despite not actually sharing the files in question.

The root of the problem, he noted in a blog entry last week, is that some investigators do an absolute minimum of work before dashing off a warning letter. In this case, the "Video Protection Alliance" sent Freedman's CoralCDN project letters without apparently verifying that CoralCDN was swapping the files in question; instead, it looks as though the Alliance grabbed IP addresses from a BitTorrent tracker and trusted the tracker to be totally accurate.

A BitTorrent tracker coordinates the various peers in a "swarm," keeping track of which users have which bits of a particular file. When a new machine joins a swarm looking for that file, it is given a list of all peers which are sharing parts of it. The system isn't necessary—newer "trackerless" BitTorrent setups can operate without a tracker—but it remains common, and it's a simple way for investigators to quickly grab long lists of "pirate" IP addresses.

This only works if a tracker is passing out reliable information—a faulty assumption. Tracker operators aren't interested in making their trackers simple enforcement tools for the content police, so some popular sites purposely add legitimate, non-infringing addresses to their trackers. The goal is to poison the well just enough that content owners can't use tracker data with impunity, but not so much that it degrades performance of the swarm.

It's hardly a secret, either, as groups like The Pirate Bay have publicly announced their support of the scheme. Academic researchers have known about it for years, too; a well-publicised study from mid-2008 found that "indirect" detection of infringement was a common practice, and that with little trouble researchers could even "frame" a device like a networked printer.

Some links to Freedman's CoralCDN project were added to trackers for just this reason, and it didn't take long before the letters began pouring in. According to Freedman, they aren't simple warning letters, either; the Video Protection Alliance also demands that letter recipients contact it in order to pay a settlement, or risk a federal copyright infringement trial.

Indeed, settlements appear to be VPA's entire business; the company exists to be the "fast, secure and convenient way to settle your copyright violations online" for a "nominal" fee. The fee is nominal enough to ensure that it's cheaper to settle than to hire a lawyer. As the company's FAQ says, "It is likely that the cost incurred to retain a lawyer will exceed the settlement amount offered. The decision to hire a lawyer is entirely up to you."

VPA is a newcomer to the "settle or we'll sue" world, launching earlier this year with a focus on "the adult industry." Other firms such as MediaSentry (which used to do RIAA investigations) make it a policy to connect directly to infringers' machines and download at least parts of files, but Freedman believes that VPA is taking a shortcut and just grabbing addresses from BitTorrent trackers, then blasting out its notices.

The takeaway is plain: we can argue about whether Internet sanctions and disconnections are good policy, and we can argue about how best to implement that policy and its safeguards, but any such policies that act on these accusations from rightsholders have to be grounded on good data. If not, printers, open content delivery networks, and plenty of individuals could find themselves in the Internet penalty box, trying frantically to prove a negative.

Policy —

Using faulty data to demand settlements from innocent surfers

A Princeton researcher finds himself bombarded with demands to pay up after …

Channel Ars Technica

reader comments

Channel Ars Technica