On the internet, there are certain institutions we have come to rely on daily to keep truth from becoming nebulous or elastic. Not necessarily in the way that something stupid like Verrit aspired to, but at least in confirming that you arenât losing your mind, that an old post or article you remember reading did, in fact, actually exist. It can be as fleeting as using Google Cache to grab a quickly deleted tweet, but it can also be as involved as doing a deep dive of a now-dead siteâs archive via the Wayback Machine. But what happens when an archive becomes less reliable, and arguably has legitimate reasons to bow to pressure and remove controversial archived material?
A few weeks ago, while recording my podcast, the topic turned to the old blog written by The Ultimate Warrior, the late bodybuilder turned chiropractic student turned pro wrestler turned ranting conservative political speaker under his legal name of, yes, âWarrior.â As described by Deadspinâs Barry Petchesky in the aftermath of Warriorâs 2014 passing, he was âan insane dick,â spouting off in blogs and campus speeches about people with disabilities, gay people, New Orleans residents, and many others. But when I went looking for a specific blog post, I saw that the blogs were not just removed, the site itself was no longer in the Internet Archive, replaced by the error message: âThis URL has been excluded from the Wayback Machine.â
Apparently, Warriorâs site had been de-archived for months, not long after Rob Rousseau pored over it for a Vice Sports article on the hypocrisy of WWE using Warriorâs image for their Breast Cancer Awareness Month campaign. The campaign was all about getting women to âUnleash Your Warrior,â complete with an Ultimate Warrior motif, but since Warriorâs blogs included wishing death on a cancer-survivor, this wasnât a good look. Rousseau was struck by how the archive was removed âalmost immediately after my piece went up, like within that week,â he told Gizmodo.
Rousseau suspected that WWE was somehow behind it, but a WWE spokesman told Gizmodo that they were not involved. Steve Wilton, the business manager for Ultimate Creations also denied involvement. A spokesman for the Internet Archive, though, told Gizmodo that the archive was removed because of a DMCA takedown request from the companyâs business manager (Wiltonâs job for years) on October 29, 2017, two days after the Vice article was published. (He has not replied to a follow-up email about the takedown request.)
Over the last few years, there has been a change in how the Wayback Machine is viewed, one inspired by the general political mood. What had long been a useful tool when you came across broken links online is now, more than ever before, seen as an arbiter of the truth and a bulwark against erasing history.
That archive sites are trusted to show the digital trail and origin of content is not just a must-use tool for journalists, but effective for just about anyone trying to track down vanishing web pages. With that in mind, that the Internet Archive doesnât really fight takedown requests becomes a problem. Thatâs not the only recourse: When a site admin elects to block the Wayback crawler using a robots.txt file, the crawling doesnât just stop. Instead, the Wayback Machineâs entire history of a given site is removed from public view.
In other words, if you deal in a certain bottom-dwelling brand of controversial content and want to avoid accountability, there are at least two different, standardized ways of erasing it from the most reliable third-party web archive on the public internet.
For the Internet Archive, like with quickly complying with takedown notices challenging their seemingly fair use archive copies of old websites, the robots.txt strategy, in practice, does little more than mitigating their risk while going against the spirit of the protocol. And if someone were to sue over non-compliance with a DMCA takedown request, even with a ready-made, valid defense in the Archiveâs pocket, copyright litigation is still incredibly expensive. It doesnât matter that the use is not really a violation by any metric. If a rightsholder makes the effort, you still have to defend the lawsuit.
âThe fair use defense in this context has never been litigated,â noted Annemarie Bridy, a law professor at the University of Idaho and an Affiliate Scholar at the Center for Internet and Society at Stanford Law School. âInternet Archive is a non-profit, so the exposure to statutory damages that they face is huge, and the risk that they run is pretty great ⊠given the scope of what they do; that theyâre basically archiving everything that is on the public web, their exposure is phenomenal. So you can understand why their impulse might be to act cautiously even if that creates serious tension with their core mission, which is to create an accurate historical archive of everything that has been there and to prevent people from wiping out evidence of their history.â
While the Internet Archive did not respond to specific questions about its robots.txt policy, its proactive response to takedown requests, or if any potential fair use defenses have been tested by them in court, a spokesperson did send this statement along:
Several months after the Wayback Machine was launched in late 2001, we participated with a group of outside archivists, librarians, and attorneys in the drafting of a set of recommendations for managing removal requests (the Oakland Archive Policy) that the Internet Archive more or less adopted as guidelines over the first decade or so of the Wayback Machine.
Earlier this year, we convened with a similar group to review those guidelines and explore the potential value of an updated version. We are still pondering many issues and hope that before too long we might be able to present some updated information on our site to better help the public understand how we approach take down requests. You can find some of our thoughts about robots.txt at http://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/.
At the end of the day, we strive to strike a balance between the concerns that site owners and rights holders sometimes bring to us with the broader public interest in free access for everyone to a history of the Internet that is as comprehensive as possible.
All of that said, the Internet Archive has always held itself out to be a library; in theory, shouldnât that matter?
âUnder current copyright law, although there are special provisions that give certain rights to libraries, there is no definition of a library,â explained Brandon Butler, the Director of Information Policy for the University of Virginia Library. âAnd thatâs a thing that rights holders have always fretted over, and theyâve always fretted over entities like the Internet Archive, which arenât 200-year-old public libraries, or university-affiliated libraries. They often raise up a stand that there will be faux libraries, that theyâd call themselves libraries but itâs really just a haven for piracy. That specter of the sort of sham library really hasnât arisen.â The lone exception that Butler could think of was when American Buddha, a non-profit, online library of Buddhist texts, found itself sued by Penguin over a few items that they asserted copyright over. âThe court didnât really care that this place called itself a library; it didnât really shield them from any infringement allegations.â That said, as Butler notes, while being a library wouldnât necessarily protect the Internet Archive as much as it could, âthe right to make copies for preservation,â as Butler puts it, is definitely a point in their favor.
That said, âlibraries typically donât get sued; itâs bad PR,â Butler says. So itâs not like thereâs a ton of modern legal precedent about libraries in the digital age, barring some outliers like the various Google Books cases.
As Bridy notes, in the United States, copyright is âa commercial right.â Itâs not about reputational harm, itâs about protecting the value of a work and, more specifically, the ability to continuously make money off of it. âThe reason we give it is we want artists and creative people to have an incentive to publish and market their work,â she said. âUsing copyright as a way of trying to control privacy or reputation ⊠it can be used that way, but you might argue thatâs copyright misuse, you might argue it falls outside of the ambit of why we have copyright.â
We take a lot of things for granted, especially as we rely on technology more and more. âThe internet is foreverâ may be a common refrain in the media, and the underlying wisdom about being careful may be sound, but it is also not something that should be taken literally. People delete posts. Websites and entire platforms disappear for business and other reasons. Rich, famous, and powerful bad actors donât care about intimidating small non-profit organizations. Itâs nice to have safeguards, but there are limits to permanence on the internet, and where there are limits, there are loopholes.