Uncovering the True Nature of Microsoft’s Copyright Claim Coverage for AI Solutions using LLMs

Uncovering the True Nature of Microsoft’s Copyright Claim Coverage for AI Solutions using LLMs

One way or another, we all know that the biggest language models (LLMs) have been trained on all the data their creators found available on the web. It is not very difficult to imagine that the chances this data contains copyrighted material are very high. This is also evident in the output of these models, where we often find similarities to works of specific authors. Consequently, intellectual property lawsuits have emerged, particularly in the US, with authors claiming that LLM creators have used their licensed material without permission to train these models.

Instead of discussing whether AI-generated output constitutes copyright material or whether vectorized data can still be considered original work, my focus here is to shed light on the actual implications of Microsoft’s intellectual property claims coverage. This discussion is particularly relevant for start-ups and other organizations navigating the complexities of copyright and intellectual property issues.

The Reality Behind Microsoft’s Customer Copyright Commitment (CCC)

Microsoft’s Customer Copyright Commitment (CCC) is a provision specific for claims related to the output of the models created by their customers. The CCC is part of the Microsoft Product Terms, detailing Microsoft’s pledge to defend customers against certain third-party intellectual property claims related to content generated by AI models built using their Azure OpenAI Service. However, this coverage comes with a catch: a list of stringent mitigations that customers must implement in their AI solutions.

At Rhite, we guide organizations of all sizes through the technical and legal challenges of AI. Lastly, we’ve been receiving numerous inquiries about copyright issues, especially from start-ups. Many of these start-ups are developing solutions using Microsoft’s Azure OpenAI platform, primarily leveraging GPT-4 models. As their (prospect) clients increasingly demand assurances against copyright issues before adopting these solutions, start-ups are finding themselves under pressure to provide such guarantees.

To qualify for Microsoft’s intellectual property claims coverage, start-ups must implement several practices to meet Microsoft’s requirements. These include conducting a comprehensive Risk and Harms assessment, performing targeted Red Teaming exercises to test for vulnerabilities and compliance issues, and establishing robust Incident Response, Testing, and Monitoring processes. While these steps are crucial for compliance and enhancing AI application security, the reality is that they also impose significant burdens on organizations.

The Scope and Limitations of the CCC

Microsoft’s provision specifically covers claims related to the output of models created by customers. Outputs generated from user prompts may lead to copyright infringement concerns if they mimic copyrighted works. Proving infringement, however, can be complex, requiring evidence of substantial similarity to the protected work. The international legal landscape also remains unclear regarding AI-generated content and copyright. While the EU’s AI Act takes a stricter approach, other regions are still formulating their positions, creating uncertainty for both users and providers of AI models. Additionally, the CCC does not cover copyright material included in models’ training data or input data from clients/users, such as when fine-tuning a model or using RAG. Therefore, it is client’s responsibility to exercise caution when uploading data into the system.

Microsoft’s coverage would apply if their customers (start-ups for instance) face intellectual property claims from clients, provided the customers have followed all required mitigations as detailed in the Azure OpenAI Service documentation. However, it is important to understand that this does not guarantee that the output content will be entirely free of copyrighted material that is what most clients are requesting to AI solutions providers. Start-ups can assure their clients that they have implemented Microsoft’s necessary measures, once done, but cannot guarantee the complete absence of copyright issues.

The Hidden Challenges of Compliance

Implementing Responsible AI is not only required by Microsoft, but also necessary for complying with regulations like the EU AI Act. What is problematic for start-ups is when Responsible AI measures require a lot of extra resources and costs without offering the guarantees needed. Besides, Microsoft’s ongoing updates to the required mitigations needed to get CCC coverage mean that customers must continually adapt to maintain that coverage. New services, features, models, or use cases will need the implementation of new mitigations within six months of their publication.

And if this was not sufficient, start-ups must also adhere to Microsoft’s Code of Conduct, which includes extra requirements for the content processed (input) and generated (output). This involves creating a metaprompt to guide the behavior of the model, implementing UX and user-centered design, mitigating identified risks, developing robust operational readiness plans, implementing meaningful human oversight, fraud detection measures, and comprehensive testing and evaluation reports. And all this is mandatory to fall under the CCC coverage.

Don´t take me wrong, these are excellent requirements to achieve trustworthy AI!  But what we see in practice is that especially small organizations find the list of requirements long and the amount of work and rework they entail very frustrating.

The Uncertainty of Red Teaming exercises

Under the CCC, start-ups are required to conduct Red Teaming exercises to test security, adversarial scenarios, copyright issues, content abuse and other potential harms. These exercises, suggested by Microsoft to be performed by an external party, must be based on the results of a previously conducted Risk and Harms assessment, and their scope should include not only the AI application but also the base model used from the Azure OpenAI platform. All findings from the exercise must be shared with Microsoft so they can mitigate the issues found as soon as possible.

Why start-ups should be responsible for testing the base model through Red Teaming exercises? Shouldn’t this be responsibility of Microsoft? One would expect that the model has already been tested before being offered on their platform. And that is in fact the case. But security and safety should be enforced at multiple layers and though Microsoft’s testing should ensure the base model is robust, developers must ensure their specific implementation maintains these standards. This is what we call Shared Responsibility.

When examining the scope of Red Teaming exercises, which range from security to content abuse, it’s natural to wonder why such a broad approach is necessary, especially when just dealing with copyright claims. After all, when you build an application on top of a GPT-4 model, the base model itself doesn’t change unless some customization is applied. So, why is this broad scope necessary?

One reason could be that AI applications can have unique use cases, user interactions, and contexts that Microsoft’s testing of the base model may not have considered. These specifics could introduce new risks that need to be tested. Since developers and providers could hold legal responsibility for the outputs of their AI applications, ensuring compliance with copyright laws is crucial to avoid legal repercussions. In that respect, by conducting their own testing, they can identify and mitigate risks specific to their deployment, protecting better their organization from potential liabilities.

All this also bring us to another reason: the absence of benchmarks for AI Red Teaming. With no guarantees that one exercise will cover all potential issues, the responsibility falls also on Microsoft´s customers to ensure thorough testing. After all, the more people testing, the more issues can be found and addressed so the more guarantees, one would say. However, this concept of shared responsibility is not considering the parties asymmetry since these exercises are too labor-intensive and resource-draining, particularly for small organizations. One would at least expect Microsoft to cover part of the cost of testing their base model, right? But this is not the case.

The Unspoken Reality of what Sufficient means

When a client files a copyright claim, Microsoft’s customers might assume they are covered after complying with the Code of Conduct and CCC requirements, and after investing in Red Teaming exercises. However, without established benchmarks, there’s no assurance that their efforts will meet Microsoft’s standards. Implementing all these requirements undoubtedly makes AI products more responsible and compliant, which is beneficial for everyone, and it also undoubtedly helps Microsoft improve their base models. But the goal was to get coverage from Microsoft for intellectual property claims, right? And after all this effort, achieving this goal remains uncertain.

According to the CCC, in the event of a claim, customers are required to hand over the results of their Risk and Harms assessments and Red Teaming exercises for review. However, without a specific standard in place, what guarantees do customers have that their documentation will be considered sufficient by Microsoft? The criteria for what constitutes “sufficient” documentation is not clearly specified, leaving customers uncertain about the adequacy of their compliance efforts.

At Rhite, we receive requests to conduct Red Teaming exercises as external third party. With AI Red Teaming you can only offer assurances within the scope of the tested scenarios and current methodologies.  Typically, the purpose of such an exercise is to identify risks and issues to contribute to the Trustworthy AI journey. However, in this cases, organizations also need to do this to get CCC coverage. Red Teaming exercises are resource intensive, costly and often not a one-time activity but a recurring evaluation requirement throughout the lifecycle of an AI solution. It is probably not fair that organizations, especially small ones, invest substantial resources, only to face the possibility that Microsoft deems their efforts insufficient.

One crucial aspect often overlooked in red teaming exercises is finding a good balance in terms of resources spent. There comes a point where continuing to search for vulnerabilities becomes too costly relative to the likelihood of finding additional issues. This is why scoping is so important. Still, the scope is highly subjective and varies greatly between companies. What may be too costly for one organization might be negligible for another.

This variability highlights the ambiguity in Microsoft’s clause, making it unclear how judgments are made without explicit standards. This lack of clarity can lead to small start-ups spending disproportionate resources—both time and money—in a panic effort to comply, which could be detrimental to their operations.

That is why we have attempted to seek clarity from Microsoft about the assurances they provide regarding their intellectual property claim coverage so we can properly advise our clients on this. I will be honest, despite speaking with multiple representatives over several weeks, we have not received yet a definitive answer. So far, contacting their legal department has proven difficult and nobody we spoke with could provide a clear answer. The truth is, that they could not even understand our questions and, even for no reason, at some point our emails were shared with two third parties that had nothing to do with the matter.

But, privacy issues apart, we will continue to seek answers and will update this blog as soon as we have any findings.

To conclude

Organizations should not assume that Microsoft’s CCC will automatically protect them against their copyright claims. While implementing Responsible AI measures is crucial, it does not guarantee that Microsoft will find your efforts adequate and sufficient. We are recommending our clients to also try to contact Microsoft’s legal department to inquire about these specific legal guarantees. So far, neither we nor our clients have received any clear answer about this matter.


Update 17-08-2024
We received a new answer from Microsoft yesterday:
¨Microsoft provides resources and guidelines to help customers understand and implement responsible AI practices, including red teaming and copyright compliance. However, for specific legal advice, we recommend consulting with your legal counsel. While Microsoft’s documentation and support can guide you on best practices and required mitigations, it does not replace professional legal advice.¨

Unfortunately, this response falls short of providing the clarity we were hoping for. By advising us to consult our own legal counsel, Microsoft has shifted the responsibility without offering the concrete legal assurances we were expecting. While their resources and guidelines are valuable, they don’t address the core concern of ensuring compliance and legal protection.
It’s disappointing that Microsoft hasn’t provided the concrete assurances we sought. We will continue to seek clarity on this issue and update this blog as we learn more.