Self-hosted AI is a great start. It's not enough.

Running AI in your own cloud solves the hosting question. It does not solve the data question. Sensitive information still flows into the model, metadata still leaks, and GDPR still applies. Sanitica is the missing layer.

7 misconceptions about AI and data privacy

These beliefs are widespread. Every one of them puts your organization at risk.

THE MYTH

If the AI runs on our servers, personal data never leaves our control, so there's no privacy risk.

THE REALITY

GDPR requires data minimization, meaning you may not process more personal data than strictly necessary, regardless of where it's hosted. If your RAG pipeline ingests social security numbers, salary figures, or health data that the model doesn't need, you're violating GDPR Article 5(1)(c) even on your own hardware.

Sanitica's pseudonymize mode solves this by replacing names with consistent aliases (e.g. “Individual-A7”) so your AI keeps full context without processing real identities. Data minimization, satisfied.

☁️🔒
YOUR PRIVATE CLOUD
SSN: 451090-2389 Salary: €85,000 Home address Medical records
⚠ Locked doors, but sensitive data that shouldn't be there
THE MYTH

Our company has strict IT policies. Our employees would never use unauthorized AI tools.

THE REALITY

Research shows that over 70% of employees use external AI tools without approval. It's human nature. People choose the fastest, most convenient tool available. Even organizations with their own AI infrastructure have employees who open ChatGPT in their browser because it's quicker. Policies alone cannot solve this.

Sanitica's full clean mode on shared folders ensures that even when employees copy-paste documents into external AI, the sensitive data is already gone.

🏢
SANCTIONED PATH
Internal AI system
⚠ PII still enters the model
THE SHORTCUT
ChatGPT in the browser
⚠ Data leaves the organization
Both paths leak data without Sanitica
THE MYTH

We redact sensitive text with black boxes before sharing documents. That's enough.

THE REALITY

AI doesn't read pixels on a screen. It reads the underlying binary data of the document. A black rectangle in a PDF hides text visually, but the data remains fully intact in the text layer beneath. Any AI tool, text extractor, or even a simple copy-paste can retrieve the “redacted” information.

All three Sanitica modes operate at the binary level. Whether removing, pseudonymizing, or stripping metadata, the data is destroyed in the document's source code. No text layer, no residual data, no way back.

WHAT YOU SEE
Name: Jón Jónsson
SSN: 451090-2389
Salary: €85,000
Department: Engineering
WHAT AI READS
Name: Jón Jónsson
SSN: 451090-2389
Salary: €85,000
Department: Engineering
THE MYTH

We're a small company. GDPR and AI regulations don't really apply to us.

THE REALITY

The NIS2 directive covers organizations in energy, transport, water supply, manufacturing, and more, many of which are small and medium-sized businesses. GDPR applies to every organization that processes personal data, regardless of size. A 20-person company faces the same legal obligations as a multinational.

Sanitica's three modes scale from a 10-person company to a multinational. Full Clean for external sharing, Pseudonymize for internal AI, Metadata Only for client documents.

🏢 Large enterprises, “obviously regulated”
🏠 SMEs in energy, transport, water, manufacturing, healthcare, finance...
All subject to GDPR • NIS2 • AI Act
THE MYTH

Our team reviews documents before they go to AI systems. Manual oversight is sufficient.

THE REALITY

Manual review does not scale. Organizations process thousands of documents per month. Each document can contain hidden metadata, tracked changes, embedded comments, and dozens of PII data points. One overlooked meeting note with a client name in the metadata is enough for a breach. Human errors are inevitable at volume.

Sanitica automates all three modes: full clean, pseudonymize, and metadata-only, processing thousands of documents per month with zero manual effort and zero human error.

2,400
docs / month
vs.
15 min
manual review each
=
600 hrs
per month
⚠ And one missed metadata field is all it takes
THE MYTH

We send emails with personal data all the time. Uploading a document to AI is essentially the same thing.

THE REALITY

Email has a clear GDPR legal basis. AI does not. Here is why they are fundamentally different:

Email 🤖 AI Tool
Recipient One specific person Unknown: other users, training pipelines
Content control You choose what to include AI ingests everything including metadata
Deletion Both parties can delete Erasure practically impossible (Art. 17)
GDPR legal basis Contractual necessity or legitimate interest (Art. 6) No clear basis. “Easier to summarize” is not legitimate interest
Responsibility Shared: sender and recipient both have GDPR obligations One-sided: you bear all liability, AI provider disclaims
Like telling your doctor your symptoms vs. reading your medical file aloud in a café.
Sanitica enforces data minimization automatically. The right mode ensures only necessary data reaches each destination.
THE MYTH

We set up sensitivity labels in Microsoft Purview, banned external AI tools, and told everyone to use Copilot only. GDPR compliance sorted.

THE REALITY

Purview controls who can access data. It does not control what is in the data. These are fundamentally different problems.

When an employee asks Copilot to summarize a SharePoint folder, Copilot reads every document that employee has access to: names, national IDs, salary figures, tracked changes, hidden comments. Sensitivity labels don't remove a single character. And if you restrict access so aggressively that Copilot can't reach anything useful, you've paid for an expensive tool that can't do its job.

GDPR Article 5(1)(c) requires data minimization: you may not process more personal data than the task requires. A summary request does not need social security numbers. Article 25 goes further and explicitly names pseudonymization as a recommended technical measure. Purview offers neither.

Sanitica's pseudonymize mode is what makes Copilot truly safe. Real names become consistent aliases, context is preserved, and Copilot works exactly as intended, without violating data minimization. Read the full breakdown →

PURVIEW (ACCESS CONTROL)
Controls who can open the document
🔒 Sensitivity: Confidential
Name: Jón Jónsson
SSN: 451090-2389
Salary: €85,000
⚠ Copilot reads all of this
SANITICA (CONTENT CONTROL)
Controls what is in the document
🛡 Pseudonymized copy
Name: Individual-A7
SSN: XXXXXX-XXXX
Salary: [REMOVED]
✅ Copilot works, PII protected
Purview locks the door. Sanitica makes sure there's nothing dangerous inside the room.

What changes when you add Sanitica

Sanitica is not a replacement for your AI infrastructure. It's the missing layer that makes it safe.

Aspect Self-Hosted AI Only Self-Hosted AI + Sanitica
PII in documents Enters the model unmodified Removed (Full Clean) or pseudonymized (Pseudonymize)
Metadata leaks Tracked changes, author info, comments remain Stripped in all three modes (Metadata Only for text-intact sharing)
Shadow AI protection No coverage for external tools Full Clean on shared folders prevents data leaks
Internal AI (Copilot) Copilot reads all PII the user can access Pseudonymize gives Copilot context without real identities
GDPR data minimization Relies on employee discipline Automatically enforced, with the right mode for each use case
Audit trail No proof of what was sanitized Detailed log of every action in every mode

You lock the front door. Sanitica makes sure nothing dangerous is inside the house.

Multiple paths for the same document

Follow a document as it moves through your organization. Click any step to learn more.

PATH A: WITHOUT SANITICA
📄
Document created
Contains PII, metadata, tracked changes
A hiring contract with the candidate's name, national ID, salary, home address, and 14 hidden metadata fields including the author's name and company network path.
📁
Stored in SharePoint
Accessible to team members
The document sits in a shared folder. Anyone with access can download it and upload it to an external AI tool.
🤖
Sent to AI system
All data included, nothing filtered
Whether it's your internal Copilot or an employee using ChatGPT, the full document enters the model: PII, metadata, and all.
PII processed by the model
Data minimization principle violated
The AI now has access to personal data it doesn't need. GDPR Article 5(1)(c) violation, regardless of whether the AI is self-hosted or external.
GDPR violation
No audit trail, no proof of compliance
If a regulator asks how you handle personal data in AI workflows, you have no evidence of sanitization. Potential fine: up to 4% of annual revenue or €20M.
💰
Fine or breach notification
72-hour notification window under GDPR
PATH B: WITH SANITICA
📄
Document created
Contains PII, metadata, tracked changes
Same document, same sensitive data. The difference is what happens next.
📁
Stored in SharePoint
Accessible to team members
The original document stays in place. Sanitica processes it before anything else happens.
🛡
Sanitica removes all PII
Names, IDs, salaries, metadata: permanently destroyed at binary level
Full Clean mode works at the binary level. Names, IDs, salaries, tracked changes, and metadata are all permanently destroyed. A clean copy is produced.
🤖
Clean document enters AI
Zero personal data, full utility preserved
📋
Audit trail recorded
Every action logged automatically
Sanitica logs exactly what was removed, when, and from which document. This is the evidence regulators and auditors need.
Full GDPR compliance
Data minimization enforced, audit ready

The scale of the problem

73%
of companies have employees using unauthorized AI tools
Industry research
4%
of annual revenue: the maximum GDPR fine for violations
GDPR Art. 83
204
days: the average time to discover a data breach
IBM Cost of Data Breach
50+
hidden metadata fields can exist in a single PDF document
Document forensics

Ready to close the gap?

See how exposed your organization is, or get in touch to learn how Sanitica fits into your AI infrastructure.

Take the Risk Quiz → Get Early Access