Key Takeaways:
- Microsoft’s Copilot AI has exposed over 20,000 private GitHub repositories.
- These repos belong to more than 16,000 organizations, including Google, Intel, and Microsoft itself.
- Many of these repositories were made private after developers discovered they contained sensitive data like passwords.
- An AI security firm called Lasso found that Copilot still shows these private repos in full, even after they were hidden.
- The issue was discovered in late 2024 and could put companies at risk of cyberattacks.
If you’ve ever used GitHub, you know it’s a place where developers store and share code for their projects. Sometimes, they make their repositories public so others can see and learn from their work. But when they realize they’ve accidentally shared sensitive information, like passwords or secret keys, they quickly make the repositories private.
But here’s the problem: Microsoft’s Copilot AI, a tool designed to help developers by suggesting code, has been keeping copies of these private repositories. Even months after they were made private, Copilot can still show their contents. This means that anyone using Copilot could potentially access sensitive data from companies like Google, Intel, Huawei, PayPal, IBM, Tencent, and even Microsoft itself.
How Did This Happen?
When developers make a GitHub repository public, Copilot can access it and learn from the code. But when they later realize they’ve shared something they shouldn’t have, they make the repository private. The issue is that Copilot doesn’t forget the data it has already seen. It continues to store and use that information, even after the repository is no longer public.
This means that if a developer accidentally shared a password or a secret key in their code before making it private, Copilot might still suggest that code to other users. This could give hackers access to sensitive information, putting companies at risk of cyberattacks.
What Did Researchers Find?
An AI security firm called Lasso discovered this problem in the second half of 2024. They wanted to know how big the issue really was, so they started investigating. What they found was shocking: over 20,000 private GitHub repositories, belonging to more than 16,000 organizations, were still accessible through Copilot.
These repositories were originally public, but many were made private after developers realized they had shared sensitive data. However, Copilot had already stored the information, and it was still available for anyone using the AI to see.
Why Is This a Big Deal?
The companies affected by this include some of the biggest names in tech, like Google, Intel, and Microsoft. Even smaller organizations are at risk because their private code is being exposed. If hackers get access to this information, they could use it to break into systems, steal data, or take control of accounts.
This is especially embarrassing for Microsoft, whose own private repositories were exposed by its own AI tool. It’s like the company’s left hand didn’t know what its right hand was doing.
What Is Microsoft Doing About It?
Microsoft hasn’t publicly commented on this issue yet, but experts say the company needs to act quickly. They should find a way to make sure Copilot doesn’t store or share private data. One solution could be to regularly update Copilot’s training data and remove any private repositories that were previously public.
In the meantime, developers are being advised to double-check their code before making repositories public. If they’ve already shared sensitive information, they should change their passwords and secret keys immediately.
What Does This Mean for the Future of AI?
This incident raises important questions about how AI systems like Copilot handle private data. While AI can be incredibly useful for developers, it also has the potential to accidentally expose sensitive information.
As AI becomes more common in tech, companies need to find ways to balance its usefulness with privacy and security. This might mean developing better systems for removing private data or giving users more control over what AI tools can access.
How Can Developers Protect Themselves?
If you’re a developer who uses GitHub, here are a few things you can do to protect yourself:
- Be careful what you share publicly. Always check your code for sensitive information before making a repository public.
- Act quickly if you’ve shared private data. If you realize you’ve accidentally shared something sensitive, make the repository private and change any affected passwords or keys immediately.
- Use tools to scan for sensitive data. There are tools available that can automatically check your code for things like passwords or secret keys before you share it.
- Stay updated on privacy issues. Keep an eye on any news about AI tools like Copilot and how they handle private data.
A Growing Problem in Cybersecurity
This incident is just one example of how AI can accidentally put companies at risk. As AI becomes more powerful and widely used, these kinds of issues might become more common.
For now, the best thing developers and companies can do is to be cautious and take steps to protect themselves. If you’re using AI tools like Copilot, make sure you understand how they work and what data they have access to. Always double-check your code before sharing it, and be prepared to act quickly if something goes wrong.
In the end, this situation is a reminder that while AI can be a powerful tool, it’s not perfect. It’s up to all of us to use it responsibly and keep our data safe.
This article is written to be easy to understand while covering all the important points. It’s designed to rank well in search engines by using clear language and focusing on the key details readers care about most.