New York Times fights off AI

By Phillip Crews On Sep 18, 2023 119 0

The New York Times has recently updated its terms of service to disallow its website and services to be used to train artificial intelligence. The NYT also updated its website code to reflect the change. While not all bots and AIs will be affected by this change, links uploaded from the NYT to certain Generative AIs will result in errors.

Google might be exempt from this, as Google signed a deal with NYT allowing their websites to be featured on Google News, something that, to be done efficiently, will require data scraping bots.

For any new Generative AI, such as TikTok parent company ByteDance’s theorized Ai Bytespider, the New York Times is expecting its Terms of Service to prevent the AI from training using their data.

As AIs (in their current form) are unable to sign contracts, they are also unable to accept Terms of Services for websites because a human is required for a signature or agreement to become valid. In the case of Terms of Service, AI can click to accept terms of service or ignore them completely by reading the code of the website, but it cannot be expected to follow the Terms of Service as AI is just a mere program.

Instead, the legal consequences of AIs are meant to go to human engineers who are meant to oversee the production of the AI. Much like how a Tesla in autopilot mode cannot get tickets, but the vehicle owner can, as it is the responsibility of humans to oversee the actions of its property.

Yet, when training AI, engineers do not manually enter data. Often times large sets of data, with information worth more than a single library, can ever hope to hold automatically train data. From now on, these datasets are expected to be examined for New York Times information and removed. However, to check for New York Times information, an archive of New York Times (which isn’t freely available due to the New York Times blocking the Internet Archive’s bot scraper) data will need to be available to a program and distributed to companies and organizations that train AI so it can remove such content.