Security researchers have uncovered a critical vulnerability: data exposed on GitHub—even for a fleeting moment—can be permanently indexed and retrieved by Microsoft Copilot. This flaw allows users to access sensitive, supposedly private corporate information long after the original repositories have been secured or deleted.
The Hidden Risk of Cached Data
New findings from the Israeli cybersecurity firm Lasso reveal that thousands of private GitHub repositories, belonging to major global corporations, remain accessible via generative AI chatbots. The issue stems from the way Microsoft’s Bing search engine indexes and caches public web content.
Ophir Dror, co-founder of Lasso, discovered the breach when his own company’s private repository appeared as a direct response from Copilot. Despite the repository being set to private and returning a “404 Not Found” error on GitHub, the AI model continued to serve the sensitive data to anyone asking the right prompt.
Scale of the Exposure
Lasso’s investigation into 2024 data reveals a staggering scope of vulnerability. By analyzing repositories that were public at any point this year and subsequently switched to private, researchers identified:
- Over 20,000 repositories still accessible through Copilot.
- More than 16,000 organizations impacted by the exposure.
- Potential leaks of intellectual property, access keys, and authentication tokens.
While Lasso initially identified companies such as Google, IBM, PayPal, and Tencent among the affected, the firm maintains its research findings despite pushback from some organizations. Notably, Amazon stated it is not affected, leading Lasso to remove references to the company following legal consultation.
Microsoft’s Response and the “Low Severity” Classification
Lasso alerted Microsoft to these findings in November 2024. In response, Microsoft classified the issue as “low severity,” labeling the persistent caching behavior as “acceptable.” While Microsoft removed Bing’s cache links from standard search results in December 2024, the underlying data remains active within Copilot’s training and retrieval mechanisms.
Is the Fix Only Temporary?
Lasso asserts that disabling the cache link is merely a superficial patch. The data persists within the AI’s ecosystem, accessible even when it is no longer discoverable through traditional web searches. For organizations, the risk remains high: if sensitive keys or tokens were ever briefly exposed, they may already be in the hands of the AI, requiring immediate rotation and revocation of all compromised credentials.
As of this report, Microsoft has not provided a formal response regarding the long-term remediation of this data exposure.
