Security

How Protostar AI keeps regulated data safe

Protostar AI is a privacy and security layer around the best available AI models. The goal is simple: frontier-quality AI without exposing the data you are not allowed to expose. This page describes the security model. For measured results see the validation report; for the full design see the white paper.

The core guarantee

The most sensitive class of data never leaves your secure boundary. This is structural, not procedural: regulated (C1) data is served only by a self-hosted model with no outbound network route, so it cannot reach any external service even if detection misses something. It is the strongest guarantee we make, and it does not depend on perfect anonymization.

How it works

Every request runs through the gateway inside your boundary:

Classify. Each request is sorted into a data class. Classification is conservative: when unsure, data is treated as more sensitive, not less.
Anonymize before egress. A self-hosted model finds names, identifiers, dates, account numbers, and the HIPAA Safe Harbor identifiers, and replaces them with placeholders. The real values stay in an encrypted vault that never leaves the boundary.
Route by sensitivity. Regulated data stays on the self-hosted, no-egress tier. Only de-identified data is eligible for managed or frontier models, and only when policy permits.
Recombine inside. Real values are restored into the answer inside your boundary. The model never sees them.
Audit. Every request is recorded in a tamper-evident, hash-chained audit trail that stores counts, never de-anonymized values.

Guarantee, then defense in depth

The zero-egress guarantee for regulated data is the foundation. On top of it:

Anonymization, so external models receive placeholders, not real identities.
Contractual controls: zero-data-retention agreements, and Business Associate Agreements for regulated data, so prompts are not retained or used for training.
Compartmentalization: de-identified work can be split across providers so no single one holds enough to reconstruct it.

De-identification recall on real text is never perfect for any system, so we treat it as a layer on top of the guarantee, never as the guarantee itself. We publish our benchmark results honestly, including what we have not yet run.

Data and model provenance

We do not train, fine-tune, or distill on frontier-model outputs. Frontier models are inference-only.
Self-hosted models are chosen for clear, permitted licensing and auditable origin.
Secrets and credentials are kept in a managed secrets store, never in code or logs, and de-anonymized values are never logged.

Standards

Protostar AI is designed to HIPAA, SOC 2, ISO/IEC 42001 and 27001, IEC 62304, ISO 14971, and MiFID II RTS 6 controls, and operates under Business Associate Agreements for regulated data.

Responsible disclosure

Found a security issue? Email hello@protostarai.com. We welcome coordinated disclosure and will work with you on a fix and a timeline.