How PiiBlocker Works: A Technical Deep Dive
PiiBlocker masks personal data before it reaches AI chatbots. This post explains exactly how it does that: the detection pipeline, the masking logic, the encryption layer, and why everything runs locally in the browser with zero server involvement.
This is written for developers, security engineers, and anyone who wants to understand what happens between typing a prompt and clicking send.
The problem PiiBlocker solves
When you type a prompt into ChatGPT, Claude, or Gemini, the text travels to the AI provider's servers. If that text contains personal data — names, email addresses, credit card numbers, medical conditions — that data now exists on someone else's infrastructure, subject to their retention policies, training pipelines, and breach risk.
The standard advice is "don't paste personal data into chatbots." The reality is that people do it constantly, often without realizing it. A message like "Draft a professional bio for Sarah Johnson who lives at 42 Elm Street, London" contains a person name and a street address. Most users don't think of that as sensitive data. Under GDPR, it is.
PiiBlocker sits between the user and the chatbot. It intercepts the message before it leaves the browser, identifies personal data, replaces it with safe placeholders, and sends the sanitized version to the AI. When the AI responds using those placeholders, PiiBlocker swaps the real values back in so the user sees their original data seamlessly.
Architecture overview
PiiBlocker is a Chrome extension built on Manifest V3. The entire application runs in the browser. There is no backend server, no API, no telemetry endpoint, and no analytics. PiiBlock does not operate any servers.
The extension consists of four main components:
The orchestrator coordinates the detection, masking, and unmasking pipeline. It manages the lifecycle of a message from the moment the user starts typing to when the AI's response is fully unmasked. The orchestrator is site-aware: it uses adapter modules for each supported platform, handling the specific DOM structure and event model of that site.
The detection engine analyzes text to find personal data. It combines two complementary approaches: Named Entity Recognition (NER) for contextual detection and regex pattern matching for structured data formats.
The encryption layer stores the mapping between real values and placeholders using AES-256-GCM encryption with ephemeral keys. Mappings are never written to disk in plaintext and auto-expire after 4 hours.
The site adapters handle platform-specific integration. Each adapter implements a common interface for intercepting user input, modifying the outbound message, and processing the AI's response. This abstraction allows new platforms to be added without modifying the core detection or encryption logic.
Detection pipeline
Detection runs in two phases, each targeting a different class of personal data.
Phase 1: Regex pattern matching for structured PII
Structured PII has predictable formats. Credit card numbers follow specific digit patterns. Social Security Numbers follow a standard format. API keys match vendor-specific prefixes and character sets.
PiiBlocker uses pattern matching with validation logic for each structured PII type. Each pattern includes validators that go beyond simple format matching to reduce false positives. For example, credit card detection includes checksum validation to eliminate random digit sequences, and phone number detection includes guards to prevent date strings from being misidentified.
These patterns cover 15+ PII types including credit cards, SSNs, API keys, passwords, bank account numbers, and UK National Insurance Numbers.
Structured PII is classified as critical and is auto-masked by default. The user sees it in the confirmation dialog with masking already applied. The user can override this, but the default is protection.
Phase 2: NER-based contextual detection for soft PII
Names, addresses, and medical conditions don't follow predictable formats. "Sarah Johnson" is a person name, but "Johnson Controls" is a company. "Baker Street" is an address, but "Baker" alone could be a surname. Context matters.
PiiBlocker uses Named Entity Recognition to classify text spans by their semantic role. The NER model identifies entities such as person names, locations, organizations, and other contextual markers. When a span is classified as a person name with sufficient confidence, it's flagged as soft PII.
Soft PII is treated differently from critical PII. Instead of auto-masking, the confirmation dialog presents each detected item with two choices: "Mask as [PERSON_A]" or "Send as is." The user decides.
This two-tier approach — auto-mask critical, prompt for soft — balances security with usability. Auto-masking everything would break legitimate use cases (asking about public figures, discussing company names). Prompting for everything would create alert fatigue.
False positive mitigation
Detection systems face a constant tension between catching real PII and flagging innocent text. PiiBlocker uses multiple strategies to reduce false positives, including post-detection filtering, contextual validation, format-specific guards, and domain-aware heuristics. These filters are continuously refined based on real-world usage feedback.
Getting false positive rates low enough for daily use without sacrificing detection coverage is one of the hardest engineering challenges in PII detection. It requires ongoing tuning and testing across diverse input patterns.
Masking and placeholder system
When PII is confirmed for masking (either auto-masked or user-approved), PiiBlocker replaces each item with a typed, indexed placeholder:
| Real data | Placeholder |
|---|---|
| Sarah Johnson | [PERSON_A] |
| 42 Elm Street, London | [ADDRESS_A] |
| [email protected] | [EMAIL_A] |
| 4111-1111-1111-1111 | [CARD_A] |
The placeholder format is deliberately simple and AI-friendly. Language models understand bracketed tokens and can use them naturally in responses: "Here is a professional bio for [PERSON_A], who is based at [ADDRESS_A]."
Indexes increment alphabetically (PERSON_A, PERSON_B, PERSON_C) to handle messages with multiple entities of the same type. The type prefix ensures the AI retains semantic context — it knows [PERSON_A] is a name and [ADDRESS_A] is a location, even though it never sees the actual values.
Encryption layer
The mapping between real values and placeholders needs to persist long enough for the AI's response to be unmasked, but should not survive longer than necessary. PiiBlocker uses AES-256-GCM encryption with the following design:
Ephemeral keys are generated per session and stored in memory-only browser storage, which is cleared when the browser closes. Keys never touch persistent storage.
Encrypted mappings are stored locally. Each mapping entry contains the ciphertext, a unique initialization vector, and a creation timestamp. The plaintext value exists in memory only during the encryption/decryption operation.
Auto-expiry removes mappings after 4 hours. Even if the browser stays open, stale mappings are purged automatically.
One-click purge lets the user manually clear all mappings at any time from the extension popup.
This design ensures that even if someone gains access to local storage, they cannot read the mapped values without the session key — which exists only in volatile memory and is destroyed when the browser closes.
Response unmasking
When the AI responds using placeholders, PiiBlocker's unmasking pipeline activates. The orchestrator monitors the AI's response for placeholder tokens. When [PERSON_A] appears in the response, PiiBlocker decrypts the corresponding mapping and replaces the placeholder with the real value in the rendered output.
This happens in real time as the AI streams its response. Each chunk of streamed text is scanned for placeholders and unmasked in place. The user sees their real data in the AI's response without any manual step.
Unmasking also handles single-page application navigation. ChatGPT, Claude, and Gemini all use client-side routing. When a user navigates between conversations, the unmasking pipeline re-processes the newly rendered content to ensure placeholders from the current session are replaced.
Right-click manual masking
Automated detection cannot catch everything. A project codename, an internal system identifier, or a client nickname might be meaningful PII in context but undetectable by pattern matching or NER.
PiiBlocker provides a right-click context menu option: select any text, right-click, and choose "Mask with PiiBlocker." The selected text is added to the masking pipeline with a generic placeholder. This gives users full control over what gets protected beyond what automation detects.
Personal PII dictionary
Users can teach PiiBlocker their own terms. The personal dictionary accepts custom entries — your name, your company's project codenames, client identifiers, anything you want consistently masked. Dictionary entries are matched before the main detection pipeline runs, ensuring they are always caught. All dictionary data is stored locally with the same encryption as other sensitive data.
Site adapter architecture
Each supported platform has a dedicated adapter that implements a common interface. The adapter handles input interception, submit interception, response monitoring, and dynamic content updates. The adapter pattern isolates platform-specific code from the core pipeline.
This means adding support for a new chatbot platform requires implementing the adapter interface without modifying the detection engine, encryption layer, or masking logic. PiiBlocker shipped Gemini support without changing a single line of the ChatGPT or Claude adapters.
What PiiBlocker does not do
Transparency about limitations matters as much as explaining capabilities:
PiiBlocker does not guarantee 100% detection. NER models have accuracy limits. Unusual names, ambiguous text, or novel PII formats may not be caught. The right-click manual mask and personal dictionary exist specifically to cover these gaps.
PiiBlocker does not modify data in transit. It modifies the content before the platform's own code reads the input. The AI provider's application sends the already-masked text. PiiBlocker does not intercept network requests.
PiiBlocker does not work on mobile. Chrome extensions are desktop-only. Mobile users of ChatGPT, Claude, or Gemini are not protected.
PiiBlocker does not protect against screenshots or shoulder surfing. The real data is visible in the browser before masking and after unmasking. Physical security is outside PiiBlocker's scope.
PiiBlocker does not anonymize data. It pseudonymizes it. The placeholder [PERSON_A] is linked back to the real name via the encrypted mapping. True anonymization would require one-way transformation, which would make unmasking impossible.
Why local-only matters
Running entirely in the browser is a deliberate architectural decision, not a limitation.
If PiiBlocker operated a server, every piece of detected PII would need to transit through that server. That creates a new attack surface, a new data processor under GDPR, and a trust dependency on PiiBlock's infrastructure. It would undermine the entire value proposition.
By running locally, PiiBlocker eliminates these concerns by design. There is no server to breach, no data to subpoena, no processing agreement required, and no trust dependency beyond the extension code itself. The code is installed from the Chrome Web Store and runs in the user's browser sandbox.
For organizations evaluating PiiBlocker for employee use, this architecture simplifies the compliance assessment considerably. There is no third-party data processing to evaluate because no data leaves the browser.
PiiBlocker is a free Chrome extension that masks personal data before it reaches AI chatbots. 100% local processing, no servers, no data collection. Install from Chrome Web Store →