Post

Copilot Policy Changes - The Future of Open Source Licenses in the AI Era

Copilot Policy Changes - The Future of Open Source Licenses in the AI Era

Recently, GitHub announced an update to Copilot’s privacy policy: starting April 24, 2026, Copilot will by default use interaction data from Copilot Free, Pro, and Pro+ users (including input prompts, output suggestions, code snippets, and related context) to train and improve AI models. Unless users actively opt-out in settings (turning off the “Allow GitHub to use my data for AI model training” option), their data will be used for model training. Copilot Business and Enterprise users are not affected by this change.

This policy adjustment quickly sparked discussion in the developer community, with many concerned that their code interaction data might unknowingly become “free fuel” for Microsoft’s AI.

My View: How Should AI Training Respect Open Source Licenses?

Let me start with the conclusion: Under current legal and license frameworks, AI models (like Copilot) can relatively freely crawl and learn from projects with permissive licenses like MIT and BSD, as these agreements have almost no additional restrictions on use, modification, and derivatives - only requiring basic copyright notices to be preserved.

For Apache 2.0 licensed projects, AI should at least preserve original attribution and source mentions in generated code comments to respect its “preserve NOTICE files and modification statements” requirement.

For GPL series (strong copyleft agreements), the situation is more complex - if training leads to generating substantial derivative code, open source obligations may need to be considered, but actual implementation is difficult and relies more on AI providers’ compliance self-discipline.

In reality, most developers often ignore complete attribution when reusing open source code, let alone AI models. But precisely because AI is a large-scale “learning” tool, we have more reason to strengthen copyright respect mechanisms from the training source. For example, through prompt engineering or post-processing, let AI automatically add source annotations when outputting. This not only reduces potential disputes but also cultivates copyright awareness throughout the ecosystem.

Most of my own projects use the BSD-2-Clause license (not BSD-3), whose high inclusiveness makes me welcome AI crawling and learning - the BSD protocol’s all-embracing spirit inherently encourages widespread use and innovation. For projects where I use Apache 2.0, my attitude is more cautious: AI should at least briefly mention the original source and license in comments, which is both basic respect and avoids later compliance risks.

Currently, existing open source licenses (including legal documents) are inadequately prepared for this new AI training scenario. Permissive agreements like MIT/BSD appear “too friendly” in the AI era, while Apache and GPL lack clear terms for machine learning.

I believe that in the near future, major open source foundations and communities will likely conduct a systematic update for AI training, such as adding “AI training authorization” sub-clauses, requiring mandatory attribution preservation in model outputs, or introducing “training-time desensitization + traceability” mechanisms.

Additional Dissatisfaction and Reflections on Copilot Policy

What makes me even more dissatisfied is: Microsoft uses interaction data from a large number of developers (including free users) to train Copilot and improve model capabilities; on the other hand, Copilot’s free quota and feature limitations remain quite stingy. This is equivalent to making ordinary developers “free labor” unknowingly or by default consent, without receiving equivalent returns.

Whether such a business model is fair is worth developers’ deep consideration.

Microsoft has indeed “destroyed” excellent products multiple times in history - from Nokia, Windows Phone, to today’s GitHub ecosystem. But history also tells us that technological change often comes with growing pains. The future isn’t set in stone, so we don’t need to be blindly pessimistic or negative.

What Developers Can Do

Developers can take positive action:

  1. Immediately check and opt-out of Copilot data training settings
  2. Explicitly add “prohibited for AI training” statements in open source projects (though legal effectiveness remains to be verified)
  3. Support alternative AI coding tools that focus more on privacy and copyright
  4. Participate in open source license update discussions, pushing the community to develop AI-friendly rules

The essence of open source spirit is sharing and reciprocity, not one-way extraction. I hope GitHub and Microsoft, while pursuing AI progress, can listen more to developer voices and let the ecosystem truly achieve win-win outcomes.

This post is licensed under CC BY 4.0 by the author.