According to TheRegister.com, a committee formed by India’s Department of Promotion of Industry and Internal Trade has proposed a radical new system for AI companies to pay for training data. The plan, detailed in a working paper, would require AI developers to pay royalties to content creators, but with a crucial twist: payments would only be due once the AI models start generating revenue. The proposal suggests creating a central nonprofit body called The Copyright Royalties Collective for AI Training (CRCAT) to manage the collection and distribution of funds. This body would oversee a “Works Database” where creators must register their content to be eligible for payouts. The committee argues the current “zero price license model” for training data undermines human creativity, but also worries that complex licensing could stifle innovation, especially for startups.
The Devil in the Data Details
So, India wants to thread a very fine needle here. On one hand, they’re acknowledging the global uproar from publishers, artists, and writers who feel their life’s work is being vacuumed up for free to build trillion-dollar AI empires. The committee’s point about undermining incentives for human creativity is valid. Why would anyone produce high-quality content if it just becomes free fuel for a machine?
But here’s the thing: the “pay only after revenue” model is a fascinating gamble. It basically gives AI startups a free pass during the R&D phase, which could lower the barrier to entry. That’s good for innovation in theory. But it also kicks the monetization can way down the road. What counts as “revenue”? Is it when an API call is sold? When a enterprise contract is signed? The potential for loopholes and accounting gymnastics is enormous. And who defines when that threshold is crossed?
A Bureaucratic Behemoth in the Making?
Now, let’s talk about the proposed CRCAT. The paper points to performing rights organizations (PROs) as a precedent—like ASCAP or BMI for music. And look, the author even mentions being part of an Australian scheme for news reprints that nets them “a couple of hundred dollars a year.” That’s my first red flag. These systems are notorious for being opaque, inefficient, and distributing pennies to most creators while the administrative body takes a cut.
Imagine the complexity for India. Twenty-two scheduled languages? A massively fragmented media landscape? Building a “Works Database” that can accurately track and attribute billions of pieces of content across text, images, video, and audio sounds like a recipe for a bureaucratic nightmare. The transaction costs they’re trying to avoid might just get internalized into a giant, slow-moving government-mandated apparatus. Will the payout for a Tamil-language poet or a Hindi blogger even be worth the hassle of registration?
Will Big Tech Play Ball?
This is the billion-dollar question. India’s government is famously ambitious about AI leadership and has often been friendly to tech giants. But this proposal directly challenges the core argument of OpenAI, Google, and others that training on publicly available data is “fair use.”
I think the key is in that last line from the source: this “may go down well with Big Tech, if New Delhi makes royalty payments worth their while.” Basically, if the per-data-unit fee is low enough and the legal certainty is high enough, the tech giants might accept it as a cost of doing business in a massive market. They’re already cutting deals with publishers left and right. A single, predictable system could be preferable to a thousand lawsuits. But if the rates are punitive? Then the lobbying and legal challenges will begin immediately.
And one more skeptical thought: this feels like a solution designed for a world where AI models are trained once and then deployed. But we’re moving towards continuous learning, where models are constantly updated with new data. How does a royalty collective handle a real-time, flowing data stream? It’s a 20th-century collective rights model trying to solve a 21st-century data consumption problem. The intent is interesting, but the execution seems fraught with peril.
