Once the dust has settled after a difficult lawmaking process, commentators may succumb to the temptation of simply accepting and rubberstamping whatever result has been achieved. After all, much time and effort has been spent on developing the newly adopted rules. The legislator has spoken. It makes perfect sense to explore the full potential of the regulatory approach that has become the new law.
However, sometimes the compromise formula that evolved from the legislative machinery seems dysfunctional and unlikely to yield beneficial results to such an extent that – in parallel with an analysis of how to apply the new legislation in the best way – it is of particular importance to keep an open mind and also explore potential alternative avenues that might be better. While it is important to give the new legislation a chance, it can be even more important to leave room for a critical evaluation and be ready to switch to a different regulatory approach that has more potential to provide satisfactory solutions.
AI training under the 2019 Copyright Directive
Article 4 of the 2019 Directive on Copyright in the Digital Single Market (CDSMD) can serve as an example. It is a well-known fact that this provision – which has become the cornerstone of commercial AI training in the EU – was not part of the initial legislative proposal tabled by the European Commission. Instead, Article 4 entered the picture at the final stages of the legislative process – without a solid assessment of its impact on the development of generative AI systems (GenAI) and its potential to ensure a proper remuneration for authors and rightsholders. As today’s manifestations of GenAI had not even been known back in 2019, it was simply impossible to take an informed decision on whether the Article 4 approach would lead to appropriate solutions in practice.
Call for caution
This fact alone already indicates that there is an urgent need for both analysing the full potential of Article 4 on the one hand, and exploring alternative, potentially better solutions on the other. In the case of GenAI, this is of particular societal importance because the technology can be expected to have an increasing and fundamental impact on human creativity. In more and more segments of the creative industries, it disrupts the market for human literary and artistic works. Future generations of GenAI “natives” (born now and not knowing a world without GenAI) are likely to base their own literary and artistic expression on templates produced by the machine. As a result, literary and artistic productions will increasingly reflect GenAI components. Our whole literary and artistic discourse – and its societal functions – increasingly depend on GenAI products and services.
In a nutshell: it would seem irresponsible to align the regulation of this groundbreaking technology with a regulatory solution that has not been specifically developed for this scenario simply because the regulatory tool in question – Article 4 CDSMD – happens to be readily available and does not require a new, probably controversial legislative initiative.
Opt-out as a problem scenario
In the specific case of Article 4 CDSMD, there is even more reason to be particularly careful before jumping to the conclusion that the chosen regulatory path will be fine in the end. The reason for this lies in the so-called opt-out mechanism enshrined in paragraph 3 of the provision:
The exception or limitation [for commercial and other non-scientific TDM] provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.
In assessing whether this Article 4 feature is fit for purpose, two competing policy objectives must be taken into account. On the one hand, it can be assumed that the EU wants to offer its citizens high-quality and non-biased GenAI systems. On the other hand, it is imperative to ensure that authors and rightsholders are properly remunerated for the use of human training resources in AI development. When the opt-out mechanism is viewed through the prism of these competing objectives, several points of deep concern come to light:
Opt-out leads to a lose-lose scenario
The rights reservation mechanism in Article 4(3) CDSMD does not automatically set in motion a new revenue stream to authors and rightsholders. It is a very simple mechanism: declaring an opt-out, the copyright owner neutralises the TDM permission in the first paragraph of Article 4 and restores the exclusive right to prohibit the use of human works for AI training. From the perspective of the two central societal objectives (the best AI and fair remuneration), this is a lose-lose scenario: no training resources for AI trainers (the copyright owner has said “no”) and no extra income for authors and rightsholders (Article 4 does not provide for the payment of remuneration).
Opt-out disadvantages small repertoire owners and niche content
Of course, the equation is different when the opt-out leads to a licensing deal. In that case, AI developers obtain access to training resources and copyright owners get paid. However, very large numbers of human works are necessary for building GenAI. The moment individual licensing agreements must be concluded, the issue of transaction costs enters the picture. It would come as a surprise if AI trainers managed to enter into negotiations with each and every rightsholder. Most probably, they will focus on large repertoire owners. This strategy offers the advantage of getting access to a large volume of literary and artistic works with only one licensing agreement. Owners of small and niche repertoires – small producers and small collecting societies – may find it much more difficult to attract the attention of AI trainers and conclude profitable deals. The opt-out approach, thus, is in favour of big players in the creative industry. It disadvantages owners of small and niche work portfolios.
Opt-out enhances the risk of biased GenAI
The spectrum of training resources that becomes available under the opt-out approach, thus, is limited to work repertoires covered by a licensing agreement. Inevitably, this limited access to human training resources restricts the ability of AI trainers to develop models capable of producing fair, unbiased results – in the sense of AI output that reflects all cultures, traditions and values expressed in human artworks. AI training based on mainstream works, for instance, will lead to mainstream AI output that marginalizes niche repertoires and opinions. AI training based on a specific segment of literary and artistic production will lead to AI output focusing on this specific segment and neglecting other expressions. In an EU context, this result is particularly worrisome. Considering the wide variety of languages and cultural traditions in the Union, it is surprising that the lawmaker adopts a regulatory approach that enhances the risk of small and niche repertoires remaining invisible in GenAI output – and inaccessible for future generations of GenAI natives who take a GenAI production as a starting point for their own creativity.
Opt-out further strengthens data hegemony of big tech
For the opt-out to be effective, it will often be necessary to provide copyright metadata. If, for instance, a collecting society declares an opt-out for its members, the collecting society website containing the opt-out is unlikely to also contain the works that must not be mined for AI training purposes. Hence, it will be necessary to provide additional information: which authors and rightsholders does the collecting society represent? Which works are covered? For which territory? Where can the relevant works be found on the internet? If a licensing agreement is concluded for AI training, a similar metadata stream will have to flow to the AI developer to make sure that the training is carried out in accordance with the use permission. The metadata may be enriched with descriptive components. Which genre? Which contents?
Inevitably, these metadata flows further strengthen the data hegemony which big tech companies, such as online platforms and AI developers, already have. For big tech, the opt-out is not only a burden. It has positive side effects. They receive even more data about content that can be offered to consumers. Arguably, the metadata stream makes them the perfect intermediary for bringing literary and artistic content to the attention of the audience. This dynamic, however, is likely to enhance the dependency of the creative industry on AI products and services. The opt-out, thus, is not only an empowerment and emancipation tool but also an additional risk factor.
Opt-out may fuel global warming
A final concern comes to the fore when the AI Act provisions that supplement Article 4(3) CDSMD are factored into the equation. The AI Act seeks to bypass the principle of territoriality and universalize the obligation to ensure compliance with opt-outs in the EU – regardless of where on the planet the AI system has been trained:
Providers that place general-purpose AI models on the Union market should ensure compliance with the relevant obligations in this Regulation. To that end, providers of general-purpose AI models should put in place a policy to comply with Union law on copyright and related rights, in particular to identify and comply with the reservation of rights expressed by rightsholders pursuant to Article 4(3) of Directive (EU) 2019/790. (AI Act, Recital 106 and Article 53(c)(1))
With regard to this feature of the new legislation, the AI Act itself makes no secret of the fact that a “Brussels effect” is intended:
Any provider placing a general-purpose AI model on the Union market should comply with this obligation, regardless of the jurisdiction in which the copyright-relevant acts underpinning the training of those general-purpose AI models take place. This is necessary to ensure a level playing field among providers of general-purpose AI models where no provider should be able to gain a competitive advantage in the Union market by applying lower copyright standards than those provided in the Union. (AI Act, Recital 106)
This additional facet of the EU approach raises concerns about an unattractive, deterrent legal framework that is not conducive to AI innovation in the EU. Perhaps even more importantly, however, it may even have environmental consequences. What if lawmakers worldwide copy the EU model and implement the same obligation to observe opt-outs declared with regard to the relevant territory? The rights reservation approach means that no training must take place with human works that fall under an opt-out. If this principle is applied strictly, AI trainers may have to develop not just one but several GenAI models: one specific model for each territory in which the opt-out mosaic differs from those following from opt-outs elsewhere. As GenAI training absorbs lots of energy and natural resources, it is self-evident that the opt-out approach – when exported to other countries and applied strictly (no training with opt-out resources, output filters not sufficient) – increases global warming instead of reducing it.
Need for alternative regulatory models
In light of these concerns, there can be little doubt that – alongside an analysis of how to best implement the Article 4 opt-out approach – it is imperative to also explore alternative solutions. In literature, proposals have already been made for statutory licenses and remuneration regimes at the AI training stage (input perspective) and the AI marketing stage(output perspective arising when fully trained AI systems are finally brought to the market). Instead of individual licensing deals, these solutions would introduce an AI levy system that requires the payment of remuneration to collecting societies which, in turn, distribute the money among authors and rightsholders. In contrast to the Article 4 approach, this alternative model leads to a win-win situation from the outset: AI trainers obtain access to diverse human resources for AI training; authors and rightsholders receive remuneration – not in the form of a one-time buy-out licensing fee, but as a continuous revenue stream administered by collecting societies. Needless to say: this approach would also resolve all other problems that have been outlined above.