Why Sanitica

Self-hosted AI is a great start. It's not enough.

Running AI in your own cloud solves the hosting question. It does not solve the data question. Sensitive information still flows into the model, metadata still leaks, and GDPR still applies. Sanitica is the missing layer.

Myth vs. Reality

7 misconceptions about AI and data privacy

These beliefs are widespread. Every one of them puts your organization at risk.

THE MYTH

If the AI runs on our servers, personal data never leaves our control, so there's no privacy risk.

THE REALITY

GDPR requires data minimization, meaning you may not process more personal data than strictly necessary, regardless of where it's hosted. If your RAG pipeline ingests social security numbers, salary figures, or health data that the model doesn't need, you're violating GDPR Article 5(1)(c) even on your own hardware.

Sanitica's pseudonymize mode solves this by replacing names with consistent aliases (e.g. “Individual-A7”) so your AI keeps full context without processing real identities. Data minimization, satisfied.

☁️🔒

YOUR PRIVATE CLOUD

SSN: 451090-2389 Salary: €85,000 Home address Medical records

⚠ Locked doors, but sensitive data that shouldn't be there

THE MYTH

Our company has strict IT policies. Our employees would never use unauthorized AI tools.

THE REALITY

Research shows that over 70% of employees use external AI tools without approval. It's human nature. People choose the fastest, most convenient tool available. Even organizations with their own AI infrastructure have employees who open ChatGPT in their browser because it's quicker. Policies alone cannot solve this.

Sanitica's full clean mode on shared folders ensures that even when employees copy-paste documents into external AI, the sensitive data is already gone.

🏢

SANCTIONED PATH

Internal AI system

⚠ PII still enters the model

⚡

THE SHORTCUT

ChatGPT in the browser

⚠ Data leaves the organization

Both paths leak data without Sanitica

THE MYTH

We redact sensitive text with black boxes before sharing documents. That's enough.

THE REALITY

AI doesn't read pixels on a screen. It reads the underlying binary data of the document. A black rectangle in a PDF hides text visually, but the data remains fully intact in the text layer beneath. Any AI tool, text extractor, or even a simple copy-paste can retrieve the “redacted” information.

All three Sanitica modes operate at the binary level. Whether removing, pseudonymizing, or stripping metadata, the data is destroyed in the document's source code. No text layer, no residual data, no way back.

WHAT YOU SEE

Name: Jón Jónsson

SSN: 451090-2389

Salary: €85,000

Department: Engineering

WHAT AI READS

Name: Jón Jónsson

SSN: 451090-2389

Salary: €85,000

Department: Engineering

THE MYTH

We're a small company. GDPR and AI regulations don't really apply to us.

THE REALITY

The NIS2 directive covers organizations in energy, transport, water supply, manufacturing, and more, many of which are small and medium-sized businesses. GDPR applies to every organization that processes personal data, regardless of size. A 20-person company faces the same legal obligations as a multinational.

Sanitica's three modes scale from a 10-person company to a multinational. Full Clean for external sharing, Pseudonymize for internal AI, Metadata Only for client documents.

🏢 Large enterprises, “obviously regulated”

🏠 SMEs in energy, transport, water, manufacturing, healthcare, finance...
All subject to GDPR • NIS2 • AI Act

THE MYTH

Our team reviews documents before they go to AI systems. Manual oversight is sufficient.

THE REALITY

Manual review does not scale. Organizations process thousands of documents per month. Each document can contain hidden metadata, tracked changes, embedded comments, and dozens of PII data points. One overlooked meeting note with a client name in the metadata is enough for a breach. Human errors are inevitable at volume.

Sanitica automates all three modes: full clean, pseudonymize, and metadata-only, processing thousands of documents per month with zero manual effort and zero human error.

2,400

docs / month

vs.

15 min

manual review each

600 hrs

per month

⚠ And one missed metadata field is all it takes

THE MYTH

We send emails with personal data all the time. Uploading a document to AI is essentially the same thing.

THE REALITY

Email has a clear GDPR legal basis. AI does not. Here is why they are fundamentally different:

	✉ Email	🤖 AI Tool
Recipient	One specific person	Unknown: other users, training pipelines
Content control	You choose what to include	AI ingests everything including metadata
Deletion	Both parties can delete	Erasure practically impossible (Art. 17)
GDPR legal basis	Contractual necessity or legitimate interest (Art. 6)	No clear basis. “Easier to summarize” is not legitimate interest
Responsibility	Shared: sender and recipient both have GDPR obligations	One-sided: you bear all liability, AI provider disclaims

Like telling your doctor your symptoms vs. reading your medical file aloud in a café.

Sanitica enforces data minimization automatically. The right mode ensures only necessary data reaches each destination.

THE MYTH

We set up sensitivity labels in Microsoft Purview, banned external AI tools, and told everyone to use Copilot only. GDPR compliance sorted.

THE REALITY

Purview controls who can access data. It does not control what is in the data. These are fundamentally different problems.

When an employee asks Copilot to summarize a SharePoint folder, Copilot reads every document that employee has access to: names, national IDs, salary figures, tracked changes, hidden comments. Sensitivity labels don't remove a single character. And if you restrict access so aggressively that Copilot can't reach anything useful, you've paid for an expensive tool that can't do its job.

GDPR Article 5(1)(c) requires data minimization: you may not process more personal data than the task requires. A summary request does not need social security numbers. Article 25 goes further and explicitly names pseudonymization as a recommended technical measure. Purview offers neither.

Sanitica's pseudonymize mode is what makes Copilot truly safe. Real names become consistent aliases, context is preserved, and Copilot works exactly as intended, without violating data minimization. Read the full breakdown →

PURVIEW (ACCESS CONTROL)

Controls who can open the document

🔒 Sensitivity: Confidential

Name: Jón Jónsson

SSN: 451090-2389

Salary: €85,000

⚠ Copilot reads all of this

SANITICA (CONTENT CONTROL)

Controls what is in the document

🛡 Pseudonymized copy

Name: Individual-A7

SSN: XXXXXX-XXXX

Salary: [REMOVED]

✅ Copilot works, PII protected

Purview locks the door. Sanitica makes sure there's nothing dangerous inside the room.

Side by Side

What changes when you add Sanitica

Sanitica is not a replacement for your AI infrastructure. It's the missing layer that makes it safe.

Aspect	Self-Hosted AI Only	Self-Hosted AI + Sanitica
PII in documents	Enters the model unmodified	Removed (Full Clean) or pseudonymized (Pseudonymize)
Metadata leaks	Tracked changes, author info, comments remain	Stripped in all three modes (Metadata Only for text-intact sharing)
Shadow AI protection	No coverage for external tools	Full Clean on shared folders prevents data leaks
Internal AI (Copilot)	Copilot reads all PII the user can access	Pseudonymize gives Copilot context without real identities
GDPR data minimization	Relies on employee discipline	Automatically enforced, with the right mode for each use case
Audit trail	No proof of what was sanitized	Detailed log of every action in every mode

You lock the front door. Sanitica makes sure nothing dangerous is inside the house.

Document Journey

Multiple paths for the same document

Follow a document as it moves through your organization. Click any step to learn more.

PATH A: WITHOUT SANITICA

📄

Document created

Contains PII, metadata, tracked changes

📁

Stored in SharePoint

Accessible to team members

🤖

Sent to AI system

All data included, nothing filtered

⚠

PII processed by the model

Data minimization principle violated

⚖

GDPR violation

No audit trail, no proof of compliance

💰

Fine or breach notification

72-hour notification window under GDPR

PATH B: WITH SANITICA

📄

Document created

Contains PII, metadata, tracked changes

📁

Stored in SharePoint

Accessible to team members

🛡

Sanitica removes all PII

Names, IDs, salaries, metadata: permanently destroyed at binary level

🤖

Clean document enters AI

Zero personal data, full utility preserved

📋

Audit trail recorded

Every action logged automatically

✅

Full GDPR compliance

Data minimization enforced, audit ready

The Numbers

The scale of the problem

73%

of companies have employees using unauthorized AI tools

Industry research

of annual revenue: the maximum GDPR fine for violations

GDPR Art. 83

204

days: the average time to discover a data breach

IBM Cost of Data Breach

50+

hidden metadata fields can exist in a single PDF document

Document forensics

Ready to close the gap?

See how exposed your organization is, or get in touch to learn how Sanitica fits into your AI infrastructure.

Take the Risk Quiz → Get Early Access