Prosecraft.io, a site that used novels to help power a data-driven project to display word count, passive voice, and other much more subjective, writing-style markers such as vividness, shut down today after authors protested the project. Prosecraft used the full text of over 25,000 booksâwhich is entirely copyrighted materialâin order to develop a library of data. Authors, once they caught wind of what was happening, immediately hated this.
How DARE you, @benji_smith
I demand you take my book off your site immediately. I do not consent to this, and never did. And I know my publisher never would pic.twitter.com/QvPkRme5pr
â Zattack The Block (@ZachRoseWriter) August 7, 2023
Zach Rosenberg was the author who first brought this site to the larger attention of authors on X, the site formerly known as Twitter. Pretty soon, more and more authors spoke out, including high-profile authors like Jeff VanderMeer (The Southern Reach trilogy), Indra Das (The Devourers), Gretchen Felker-Martin (Manhunt)
Remove all books and analysis for Jeff VanderMeer. You absolutely do not need a title by title run down. Just run a search on your own damn site.
â Jeff VanderMeer (@jeffvandermeer) August 7, 2023
I think you can safely assume that the default for any artist or writer is 'doesn't want them to be there' (there being any AI training project) unless you have their written and confirmed consent. Also, please remove my book (The Devourers) from this as well, thanks.
â Indrapramit Das (@IndrapramitDas) August 7, 2023
I've just discovered that MANHUNT has been uploaded to a content mining site so that it can be indifferently plagiarized by anyone who wants to feed it into their so-called "AI".@benji_smith, I demand you remove my work from your site immediately.
â Gretchen Felker-Martin (@scumbelievable) August 7, 2023
Part of this is because Prosecraft has admitted to using âAI algorithms.â In a blog post dated October 5, 2018, Benji Smith, the developer of both Prosecraft and the writing program Shaxpir that was based on the data mined from Prosecraftâs library, stated that âwe taught our machine-learning [AI] algorithms to recognize which kinds of words can be used in which kinds of contexts, by looking at the types of words and phrases that tend to occur within similar sentences and paragraphs.â Additionally, he wrote that Shaxpir â[analyzed] more than 560 million words of fiction, from more than 5,800 books, written by more than 3,300 popular authors.â He does not disclose where he received those works of fiction, or whether or not he received permission to do so.
While the technology used is not necessarily a large language generative model like ChatGPT, it is not a stretch to say that incorporating generative LLM algorithms could have been on the horizon for Prosecraft. And since the site had a massive library of books, authorâs fears are incredibly valid. In the wake of this backlash, Smith has written a lengthy blog on mediumexplaining why he voluntarily took down Prosecraft.
Although Prosecraft was only using portions of the text, it did not have permission from any authors or publishers to create a database based on the entire work of an author or the full text of a book. Smith wrote on the blog, âsince I was only publishing summary statistics, and small snippets from the text of those books, I believed I was honoring the spirit of the Fair Use doctrine, which doesnât require the consent of the original author.â
While this holds some water, Fair Use does not, by any stretch of the imagination, allow you to use an authorâs entire copyrighted work without permission as a part of a data training program that feeds into your own âAI algorithm.â While this situation is certainly going to be a lesson for many people, itâs clear that authors are not going to allow their work to be used to train LLMs and vector networks.
Update August 8, 11:35 a.m.: Fixed the mistaken legal definition where copyrighted works were referred to as âcopywritten.â io9 sincerely regrets the error.
Want more io9 news? Check out when to expect the latest Marvel, Star Wars, and Star Trek releases, whatâs next for the DC Universe on film and TV, and everything you need to know about the future of Doctor Who.