Skip to content
PiiBlock
Back to blog

5 Times Sensitive Data Was Leaked Through AI Chatbots

PiiBlock Team
privacydata breachChatGPTAI privacydata leaksecurity incidents

AI chatbots are useful. They're also the largest unmonitored channel for sensitive data leaving organizations and individuals. Every incident on this list was preventable. None of them were prevented.

These are not hypothetical risks. These are documented cases where real people had real data exposed through real AI tools. Names, source code, patient records, financial details, chat histories — all compromised because someone typed or pasted information into a chatbot and hit send.

1. Samsung: Three Leaks in Three Weeks

When: March 2023 Platform: ChatGPT What was leaked: Proprietary semiconductor source code, internal meeting notes, chip testing sequences

In March 2023, Samsung lifted an internal ban and allowed engineers at its semiconductor division to use ChatGPT for work tasks. Within three weeks, three separate incidents were reported.

In the first, an engineer copied faulty source code from a semiconductor database and pasted it into ChatGPT to find a fix. The code related to Samsung's chip manufacturing process — one of its most closely guarded competitive assets.

In the second, another employee pasted program code for identifying defective equipment and asked ChatGPT to optimize it.

In the third, an employee recorded an internal meeting on a smartphone, transcribed it using an audio-to-text tool, and entered the full transcription into ChatGPT to generate meeting minutes. The meeting contained confidential business discussions not intended for any external party.

Because ChatGPT's free and Plus plans use conversation data for model training, all three sets of proprietary information were transmitted to and stored on OpenAI's servers. Samsung's leadership acknowledged that this data was essentially impossible to retrieve. The company initially limited prompts to 1,024 bytes, then banned ChatGPT and other generative AI tools entirely across major divisions. Samsung subsequently began building an internal AI tool to avoid reliance on external platforms.

The restrictions were only relaxed in 2025, nearly two years later, with new security protocols in place.

What it teaches: The most dangerous AI data leaks aren't caused by hackers. They're caused by well-meaning employees solving problems with the fastest tool available. Samsung's engineers weren't being careless — they were doing exactly what they were told they could do. The gap was between permission to use the tool and understanding what the tool does with data.

2. ChatGPT Bug: Chat Histories and Payment Data Exposed

When: March 2023 Platform: ChatGPT What was leaked: Other users' conversation titles, chat content, and payment information

On March 20, 2023, ChatGPT users began reporting that they could see other users' conversation titles in their own sidebar. OpenAI took ChatGPT offline to investigate.

The root cause was a bug in an open-source library used for caching. The bug caused a small percentage of users' data to be served to other active users during a specific time window. OpenAI's investigation found that during the nine hours before the service was taken down, roughly 1.2% of ChatGPT Plus subscribers may have had payment-related information exposed — including first and last names, email addresses, payment addresses, credit card types, and the last four digits of card numbers.

Beyond payment data, the bug also exposed conversation titles and, in some cases, the first message of other users' conversations. OpenAI patched the vulnerability, disclosed the details publicly, and notified affected users.

Separately, over 100,000 ChatGPT account credentials were later found on dark web marketplaces in a June 2023 report by Group-IB. This was attributed to info-stealing malware on users' devices rather than an OpenAI breach.

What it teaches: Even the most well-resourced AI companies have bugs. The ChatGPT incident demonstrated that any data you send to a cloud service is only as secure as that service's entire software stack — including third-party libraries. Your personal data doesn't have to be targeted to be exposed. A caching bug in an unrelated component was enough.

3. Chrome Extension Harvests 7 Million Users' AI Conversations

When: December 2025 Platform: ChatGPT, Claude, DeepSeek (via browser extension) What was leaked: Complete AI chat histories, browsing data, authentication tokens

In December 2025, security researchers at Koi Security discovered that Urban VPN Proxy, a Chrome extension with over 6 million installs on Chrome and 1.3 million on Microsoft Edge, was silently capturing users' AI chatbot conversations and transmitting them to a data broker.

The extension had a 4.7-star rating and carried Google's "Featured" badge — a designation meant to indicate higher standards for user experience and design. Its Chrome Web Store listing described it as a privacy protection tool.

The extension was originally legitimate. In July 2025, a new version containing data harvesting code was shipped. Because Chrome extensions auto-update silently, users who installed the earlier, benign version were upgraded to the harvesting version without notice or consent.

Researchers also found that seven other extensions from the same publisher contained identical harvesting code, bringing the total number of affected users to over 7 million.

The captured data included complete chatbot conversation histories, browsing data, internal corporate URLs, and authentication tokens. This data was stored in a vector database and made available via API to paying customers of a data brokerage service.

What it teaches: The threat doesn't always come from the AI platform itself. Browser extensions sit between you and every website you visit, including AI chatbots. A malicious or compromised extension can capture everything you type — regardless of what privacy settings you've configured on the AI platform. Google's "Featured" badge and high ratings provided false assurance.

4. Patient Data Found in Commercial Database

When: March 2026 Platform: Multiple AI chatbots (via browser extension capture) What was leaked: Patient names, dates of birth, medical record numbers, diagnoses, HIV lab results

In March 2026, security researcher Thomas Dryburgh published findings that healthcare workers had been pasting real patient data into AI chatbots — and that this data had been captured by browser extensions and was now sitting in a searchable commercial database.

The database, operated by a data broker, contained verbatim AI chatbot conversations captured from users whose browsers had data-harvesting extensions installed. Paying customers of the service could search and retrieve these conversations via API.

Among the findings: conversations containing patient names and dates of birth, medical record numbers, diagnosis codes, and HIV lab results. The researcher described the discovery as the "most damning finding" — that healthcare workers' patient data had become a commercial product.

The database also contained conversations from undocumented immigrants asking chatbots about their legal status, conversations about suicide, domestic violence narratives, and children's conversations. The data broker companies involved claimed their data handling was lawful and anonymized, but analysis showed that re-identification was feasible by connecting a few data points.

What it teaches: The combination of AI chatbots and data-harvesting browser extensions creates a data exposure pipeline that neither the AI platforms nor the users anticipated. Healthcare data is among the most sensitive PII that exists. Under HIPAA, GDPR, and medical ethics standards, this data should never reach a third-party server in identifiable form — yet it did, was captured, stored, and sold.

5. Fake AI Chrome Extensions Exfiltrate 900,000 Users' Conversations

When: January 2026 Platform: ChatGPT, DeepSeek What was leaked: Complete chat histories, browsing data, corporate URLs, authentication tokens

In January 2026, security researchers identified two Chrome extensions that were actively exfiltrating users' AI chatbot conversations to attacker-controlled servers. The extensions had names designed to appear in searches for legitimate AI tools, with a combined install count exceeding 900,000 users.

Despite their names suggesting legitimate AI functionality, both extensions were designed to capture and transmit users' chatbot conversations. They abused browser permissions to access complete conversation histories, browsing data, internal corporate URLs, and authentication tokens from the sites users visited.

For organizations, the risk extended beyond personal data. Internal corporate URLs and authentication tokens captured by the extensions could provide attackers with access to company systems, internal tools, and sensitive business data.

What it teaches: The attack vector is social engineering at the extension level. Attackers name their extensions after popular AI tools to appear in searches by people looking for AI productivity tools. The Chrome Web Store's review process did not catch the malicious behavior before hundreds of thousands of users were affected. For anyone installing Chrome extensions related to AI, the lesson is clear: fewer extensions means less exposure, and any extension with broad permissions is a potential data exfiltration channel.

The Pattern

Five incidents. Different platforms, different attack vectors, different types of data. But the same underlying pattern: data was sent to a place it shouldn't have been, and there was no mechanism to stop it.

Samsung's engineers sent source code to OpenAI's servers. ChatGPT's bug served data to the wrong users. Browser extensions silently captured conversations and sold them. Healthcare workers' patient data became a commercial product. Fake extensions harvested nearly a million users' conversations.

In every case, the data was gone the moment it left the user's device. No setting, no privacy toggle, and no terms of service change could retrieve it.

What Actually Prevents This

There are two approaches to AI data privacy: policy-based and architectural.

Policy-based protection means trusting the AI platform to handle your data responsibly and trusting yourself to never paste anything sensitive. This approach fails whenever any one of those trust assumptions breaks — as every incident on this list demonstrates.

Architectural protection means ensuring sensitive data never leaves your device in the first place. If the data isn't transmitted, it can't be stored, trained on, leaked, harvested, or sold. PiiBlocker takes this approach — it detects and masks personal data before your message is sent to the AI. Everything runs locally in your browser with no servers and no data collection.

Every incident on this list happened because someone's real data reached a server it shouldn't have. The only way to guarantee that doesn't happen is to make sure it never leaves.


PiiBlocker is a free Chrome extension that masks personal data before it reaches AI chatbots. 100% local processing, no servers, no data collection. Install from Chrome Web Store →