AI Trust Layer
Every model call,
governed in-line.
Not a DLP bolt-on at the connector. A set of gates that run on every single LLM call. Filters redact PII and secrets. An injection detector blocks jailbreaks. A model allowlist enforces your choices. A classification ceiling keeps restricted data away from any model. Every match is an immutable audit row.
The gate order
Six gates, in order, stricter wins.
Each call passes through the same pipeline. Anything that fails a gate is blocked, redacted, or warned, and recorded.
Closed list of permitted provider:model pairs. Anything else is denied before a single token leaves.
A System's data classification is checked against the tenant ceiling. Restricted data never reaches an LLM that isn't cleared for it.
Heuristics catch ignore-previous, reveal-system-prompt, role-tag, and jailbreak-persona attempts.
PII / PHI / secrets / regulated terms scanned on the way in. Block, redact, or warn.
Only now does the request reach the provider. Already cleaned and bounded.
The same scan runs on the way out, catching anything the model echoed or hallucinated.
Content filters
Block, redact, or warn, and prove it.
Built-in baseline
SSNs, credit cards, API keys, PEM blocks, JWTs, IPs, US phone numbers, and emails covered out of the box.
Custom filters
Add your own: regex pattern, kind (PII / PHI / Secrets / RegulatedTerms / Custom), stage (prompt / response / both), action, and redaction token.
Three actions per filter
Block aborts the call, redact rewrites the text, warn records but proceeds. Your call, per pattern.
Per-System classification
Tag each System public / internal / confidential / restricted once. The whole platform respects it.
Test before you ship
Dry-run a filter set against sample text with no LLM call and no audit write.
Trust events
Every match, denial, and bypass is on the record.
Immutable rows
Every filter match, injection detection, model denial, or classification block writes a tamper-evident Trust Event.
Full context
Actor, atom, event kind, action taken, and an 80-character matched excerpt. Enough to investigate, not enough to leak.
Filterable log
Browse by event kind, actor, atom, or matched pattern. Drill into any single event.
Feeds Risk
Trust events are a weighted factor in each Reaction's composite risk score. Blocks count most.