Digital Preservation at Risk: Major Media Outlets Block Wayback Machine Over AI Fears

Published on 15 April, 2026

In a significant shift for digital archiving, major news organizations are increasingly blocking the Wayback Machine from preserving their web pages. Driven by concerns that artificial intelligence companies might use archived content for training, outlets like The New York Times and the USA Today network have restricted access to the Internet Archive's crawler bots.


A Wave of Restrictions


According to an analysis by Originality AI, at least 23 major news sites have implemented blocks against the ia_archiver bot. The blockade is particularly impactful given the scale of the publishers involved; USA Today’s parent company, Gannett, operates over 200 media outlets. While some publishers like The Guardian allow crawling but restrict public access to the archives, others have severed the connection entirely. Reddit has also joined the growing list of platforms restricting the preservation tool.


Publishers defend the decision as a necessary measure against unauthorized scraping and copyright infringement. The New York Times has stated that allowing archived content creates direct competition, though it has not disclosed specific evidence of AI models utilizing the Wayback Machine for training data.


The Cost of Erasing History


The restrictions have sparked criticism regarding the long-term implications for accountability and journalism. Mark Graham, director of the Wayback Machine, highlighted a stark irony: news organizations frequently rely on the archive for their own investigative research while simultaneously preventing the preservation of their work. For instance, USA Today previously utilized the tool to investigate ICE detention policies.


Historical precedent underscores the importance of independent archiving. In 2016, the Internet Archive exposed The New York Times for quietly editing a Bernie Sanders article—a type of editorial accountability that becomes impossible if outlets control the sole record of their history.


Future Implications


Graham warns that the locking down of the public web creates an information void where only large corporations hold the keys to historical records. As the ability to verify claims or track editorial changes diminishes, society risks losing the tools necessary to understand the evolution of public discourse. Despite a coalition letter signed by over 100 journalists supporting the archive's mission, a resolution to the conflict between copyright enforcement and digital preservation remains uncertain.

Comments

Leave a comment