Google wants online content to train AIs

08/04/202425/08/2023 by Alexandre Marotel

Artificial intelligence (AI) astonishes us with its performance in many fields. But how do AIs learn to do what they do?

The answer is simple: by feeding on massive data from the web. It’s thanks to this data that AIs like ChatGPT and Bard have been able to acquire knowledge and skills that enable them to understand the real world with precision.

But this method of learning raises legal and ethical questions, notably concerning respect for copyright. Google, which makes extensive use of online data to train its AIs, has proposed amending current legislation on this subject.

In this article, we take a look at this proposal and its implications.

Google’s proposal on the accessibility of online content to AIs

Google has proposed that all online content should be accessible to AIs for training, unless publishers disagree. In a proposal to the Australian government, the search engine has argued for a change to existing copyright laws.

Google wants AI to be able to access all digital content, unless publishers choose not to participate. The web giant has presented this proposal to the Australian government to encourage it to change current copyright laws.

Should these laws change, it will be up to brands and publishers to prevent AI from exploiting their content.

If they fail to do so, they could find themselves competing with very similar content and unable to assert their rights, which could significantly damage a campaign in terms of image and identity.

In its letter to the Australian government, Google states:

“Copyright systems that enable appropriate and fair use of copyrighted content for training AI models in Australia on a wide and varied range of data, while supporting viable optional opt-outs for entities that prefer their data not to be used in AI systems.“

The search engine has previously presented similar cases to the Australian government, arguing that AI should be able to make fair use of online content for training purposes.

But this is the first time Google has suggested an opt-out clause to address previous concerns.

What is Google’s plan with content and AI?

Google doesn’t yet have a specific plan, but has expressed a desire to establish discussions to implement a community-developed web standard.

This could be something similar to the robots.txt system, which allows publishers to prevent search engines from crawling their content.

As for that, Danielle Romain, vice president of trust at Google Search, announced in a press release last month that:

“A dynamic content ecosystem benefits everyone. The key is that web publishers have choice and control over their content and opportunities to leverage their participation in the web ecosystem.

However, we recognize that existing web publisher controls were developed before the emergence of new uses for AI and search.

We believe it’s time for the web and AI community to explore additional, machine-readable means of web editor choice and control for emerging AI uses and search cases.“

In other words, it’s important to have a dynamic content ecosystem that benefits all parties involved.

The central idea is that web publishers should have the freedom to decide and control their content, while being able to benefit from their participation in the online ecosystem.

In a nutshell

In conclusion, Google wants legislation to be favorable to the use of online content to train artificial intelligence. The aim is to enable content creators to decide for themselves whether their content can be accessed by AIs.

Leave a comment Cancel reply