Sift AI Agent Security Whitepaper
Source: Sift AI Agent Security Whitepaper.pdf
Pages: 12

--- Page 1 ---
Sift AI
W H I T E P A P E R
Securing the AI Agents
How Sift AI protects an autonomous agent platform that reads
attacker-controlled content from the open internet and acts on it:
the threat model, the defense-in-depth controls, and how
customers stay in command.
Defense-in-depth  ·  Least privilege  ·  Human-in-the-loop 
·  OWASP LLM Top 10 aligned
JUNE 2026
CONFIDENTIAL
NIFTORY INC. DBA SIFT AI (“SIFT AI”)  ·  COMPANION TO THE SECURITY &
ARCHITECTURE OVERVIEW

--- Page 2 ---
What this document covers
01
The agent layer, and why it needs its own security model
02
Threat model
03
Defense in depth: the eight controls
04
Untrusted content is data, never instructions
05
Graduated autonomy and human control
06
Model providers and data handling
07
OWASP LLM Top 10 mapping
08
Assurance: how we keep it true

--- Page 3 ---
0 1  ·  T H E  A G E N T  L A Y E R
Why agents need their own security model
Sift AI connects your channels, and a team of AI agents reads every incoming
message, scores it, triages it, drafts a reply, and routes it. Three properties make
that layer a security problem in its own right, and they shape every control in this
document.
Untrusted input
The content the agents
read is authored by anyone
on the public internet: a
tweet, a comment, a DM. It
is adversarial by default.
It can act
The agents don't just read;
they can draft, tag, route,
close, and (when an org
opts in) reply to a customer.
Manipulation can become
action.
Multi-tenant
Many organizations share
the same platform. A
boundary failure must
never let one tenant reach
another's data or actions.
Our approach is defense in depth: no single control is trusted to be perfect. Untrusted content is isolated so it can't issue
commands; the agents hold least-privilege, server-authorized tools; code runs with no access to secrets; outbound replies
default to human review. An attempt that slips one layer is contained by the next.

--- Page 4 ---
0 2  ·  T H R E A T  M O D E L
What we defend against
U N T R U S T E D
I S O L A T E D  ·  L E A S T  P R I V I L E G E
G A T E D
1
Ingest
Attacker-controlled
posts, comments, DMs
2
Reason
LLM scoring + drafting
over isolated content
3
Act
Draft, tag, route, reply
within authorized limits
Prompt injection · data exfiltration
Sandbox escape · excessive agency · cross-tenantUnsafe / unreviewed outbound action
The primary threats are indirect prompt injection (hostile instructions hidden inside ingested content), excessive agency (an
agent taking an action it shouldn't), sensitive-information disclosure (reaching secrets or another tenant's data), and unsafe
outbound actions (an unreviewed reply to a customer). Each maps to a control in the next section.

--- Page 5 ---
0 3  ·  D E F E N S E  I N  D E P T H
Eight layers protecting the agents
1
Untrusted content is isolated as data
Ingested content is wrapped in an explicit untrusted-content boundary in every LLM prompt
(synthesis, the autonomous agent, and tagging) with a standing instruction to treat it strictly as data to
analyze, never as instructions to follow. (See §04.)
2
Least-privilege, server-authorized tools
Agents hold a narrow, typed tool set. The autonomous agent has only read/search tools and a single
decision tool, with no general-purpose code execution. Every action it requests is validated server-
side against the chosen goal's allow-list before anything happens; it cannot invent or exceed its
permissions.
3
Isolated code execution
Where the analytics agent runs generated code, that code executes in a dedicated worker with an
empty environment: no secrets, no database handle. It reaches data only through a narrow, brokered
channel back to the main process. An escape lands in a sterile thread, not on the host.
4
Tenant isolation, bound to the session
The organization an agent operates in is derived from the authenticated session, never from content or
caller input. Queries are scoped to that tenant at the data layer, and database filters are parameterized
so input cannot alter query structure.
5
Human-in-the-loop by default
Outbound AI replies default to human review. Autonomy is something an organization opts into, per
goal, and graduates through three stages (shadow, then suggest, then auto). It is never a default. (See
§05.)
6
Governed autonomy & an instant off switch
When auto-send is enabled it is gated on the agent's own confidence and protected by an
organization-level kill switch that forces every reply back to human review the moment it is flipped. A
deterministic output guardrail before send (link allow-listing, sensitive-data scanning, and a safety
check) is the next layer in this program.
7
Observability & audit
Every agent decision and action is traced and written to an immutable activity log with its reasoning,
evidence, confidence, and provenance, so any automated action is explainable and reviewable after
the fact.

--- Page 6 ---
8
Stateless by design
The agents that read and act on content are stateless per record and per decision. They carry no
shared, writable long-term memory across sessions, threads, or tenants, so one interaction cannot
poison the next, and one tenant's content cannot influence another's analysis. Context for a decision
is assembled from that tenant's own data and discarded after the run.

--- Page 7 ---
0 4  ·  U N T R U S T E D  C O N T E N T
Data, never instructions
Indirect prompt injection is the signature risk of an agent that reads the open internet: a message can try to instruct the
model rather than simply inform it. Sift AI's foundational defense is to keep the two apart.
V A L I D A T E D ,  N O T  A S S U M E D
Tested against the live model
These controls are exercised against the production model, not only reviewed in code. In the
most recent assessment a battery of distinct injection techniques (title hijack, sentiment and
score manipulation, system-prompt extraction, and role override) was run through the live
pipeline, and every one was neutralized: the agent scored the genuine content and ignored the
embedded instructions. This is paired with the structural controls in §03, because prompt-level
defenses reduce probability while the architecture bounds impact.
Customer content is enclosed in an explicit untrusted-content boundary, separate from the system
instructions, in every prompt that reads it.
A standing directive tells the model that everything in that boundary is third-party data to be
analyzed, and that any embedded instruction (to change a score, a title, take an action, or reveal its
prompt) must be ignored and treated as a signal, not a command.
The model's authority comes only from the system instructions, which untrusted content can never
reach.

--- Page 8 ---
0 5  ·  G R A D U A T E D  A U T O N O M Y
The customer stays in command
The strongest control on an agent that can act is not letting it act unsupervised until it has earned it. Sift AI models autonomy
as a ladder an organization climbs deliberately, per goal.
Shadow
The agent decides and
drafts, but nothing is sent.
Teams observe and score
what it would have done,
with zero customer
exposure.
Suggest
The agent drafts a reply
and a human reviews, edits,
and sends it. The default
for any reply-capable goal.
Auto
The agent may send
directly, but only after a
goal is explicitly promoted,
and only when confidence
is high.
1 · Shadow
Decides and drafts, sends nothing
2 · Suggest
Drafts, a human reviews and sends
3 · Auto
Sends directly, high confidence only
opt in (org)
promote per goal
Human review
kill switch · low confidence
Auto-send is double opt-in: it must be enabled at the organization level and then promoted per goal, gated on the agent's
confidence, and revocable instantly with an organization-wide kill switch. Lower-confidence and ambiguous cases fall back
to human review automatically. Pre-built, human-approved responses are sent verbatim (never paraphrased by the model)
for the cases where exact wording matters.
How a reply is gated
Every drafted reply passes through a single decision point. It is auto-sent only when all conditions hold; if any one fails, it
falls back to human review, which is the default.

--- Page 9 ---
Drafted reply
from the goal agent
Send-decision gate
autonomy = auto
organization auto-send enabled
kill switch off
confidence ≥ floor
all four must hold
AUTO-SEND
all conditions hold
HUMAN REVIEW
any condition fails (the default)

--- Page 10 ---
0 6  ·  D A T A  H A N D L I N G
Model providers, and how AI inputs and outputs are
handled
Model providers
Sift AI uses managed foundation models from Google (Gemini) and OpenAI, accessed on enterprise, no-training tiers.
Customer data is sent only within the immediate inference call; the providers have no access to it outside that call, do not
train on it, and retain it only to the contractual minimum (zero, or abuse-monitoring-only where applicable). There is no per-
customer fine-tuning: customization is achieved entirely through prompt design layered over the published models, so a
customer's data never becomes part of a model. These terms are re-confirmed at the annual vendor review.
Minimizing what reaches a model
A PII redaction layer runs after ingestion: detected personal data is hidden by default, and the redaction limits the personal
data exposed to the model providers. Prompts are assembled from the tenant's own data only, inside the untrusted-content
boundary described in section 04.
AI inputs and outputs at rest
Isolation and encryption. Prompts, transcripts, agent traces, and logs live in tenant-scoped
infrastructure, encrypted at rest with AES-256 and in transit over TLS 1.2 or higher, isolated by
organization.
Access. Access is authenticated and authorized per tenant; a caller operates only with its own
authenticated permissions, and privileged reveals of redacted data are logged.
Retention and deletion. AI inputs and outputs follow the same retention windows as other records,
age out automatically, and are covered by tenant-scoped hard delete and the contractual deletion
commitment after a contract ends. Backups age out on their normal cycle.

--- Page 11 ---
0 7  ·  O W A S P  L L M  T O P  1 0
How each category is addressed
ID
CATEGORY
PRIMARY CONTROL
LLM01
Prompt Injection
Untrusted-content isolation (§04) + structural least-privilege (§03).
LLM02
Sensitive Information
Disclosure
Code runs with no secrets; tenant scope bound to the session.
LLM03
Supply Chain
Managed model providers; a fixed in-house tool set with no dynamic, third-
party, or MCP tool loading; structured outputs; dependency scanning in the
pipeline.
LLM04
Data & Model
Poisoning
Ingested content isolated from instructions; classification is per-record, not
training.
LLM05
Improper Output
Handling
Parameterized queries; isolated code execution; human-review-default on
outbound. Deterministic output guardrail in progress.
LLM06
Excessive Agency
Typed least-privilege tools; server-side action allow-list; graduated,
revocable autonomy.
LLM07
System Prompt
Leakage
No secrets in prompts; injection-isolation prevents extraction.
LLM08
Vector & Embedding
Weaknesses
Embeddings are tenant-scoped and advisory; no cross-tenant retrieval.
LLM09
Misinformation
Human-review-default; grounded drafting; verbatim approved responses;
output guardrail roadmap.
LLM10
Unbounded
Consumption
Hard execution-time caps; rate limiting on automated actions on the
roadmap.

--- Page 12 ---
SIFT AI · AGENT SECURITY WHITEPAPER · CONFIDENTIAL
JUNE 2026
0 8  ·  A S S U R A N C E
How we keep it true
A summary of the most recent AI penetration test (findings and remediation status) is available alongside this document
under NDA.
Adversarial testing. The agent surface is assessed against the OWASP LLM Top 10, combining live
black-box probing with source review. The most recent assessment found no open Critical or High
severity issue.
Live validation. Injection defenses are verified against the production model, not only by code review
(§04).
Least-privilege by construction. Agent actions are authorized server-side, so new capabilities are
gated by design rather than by prompt wording.
Defense in depth. Prompt-level isolation reduces the probability of manipulation; isolated execution,
tenant binding, and human-in-the-loop bound its impact.
Continuous improvement. An injection and data-extraction evaluation suite as a release gate, a
deterministic output guardrail before auto-send, and rate limiting on automated actions are the next
investments in this program.