Sift AI Agent Security Whitepaper Source: Sift AI Agent Security Whitepaper.pdf Pages: 12 --- Page 1 --- Sift AI W H I T E P A P E R Securing the AI Agents How Sift AI protects an autonomous agent platform that reads attacker-controlled content from the open internet and acts on it: the threat model, the defense-in-depth controls, and how customers stay in command. Defense-in-depth · Least privilege · Human-in-the-loop · OWASP LLM Top 10 aligned JUNE 2026 CONFIDENTIAL NIFTORY INC. DBA SIFT AI (“SIFT AI”) · COMPANION TO THE SECURITY & ARCHITECTURE OVERVIEW --- Page 2 --- What this document covers 01 The agent layer, and why it needs its own security model 02 Threat model 03 Defense in depth: the eight controls 04 Untrusted content is data, never instructions 05 Graduated autonomy and human control 06 Model providers and data handling 07 OWASP LLM Top 10 mapping 08 Assurance: how we keep it true --- Page 3 --- 0 1 · T H E A G E N T L A Y E R Why agents need their own security model Sift AI connects your channels, and a team of AI agents reads every incoming message, scores it, triages it, drafts a reply, and routes it. Three properties make that layer a security problem in its own right, and they shape every control in this document. Untrusted input The content the agents read is authored by anyone on the public internet: a tweet, a comment, a DM. It is adversarial by default. It can act The agents don't just read; they can draft, tag, route, close, and (when an org opts in) reply to a customer. Manipulation can become action. Multi-tenant Many organizations share the same platform. A boundary failure must never let one tenant reach another's data or actions. Our approach is defense in depth: no single control is trusted to be perfect. Untrusted content is isolated so it can't issue commands; the agents hold least-privilege, server-authorized tools; code runs with no access to secrets; outbound replies default to human review. An attempt that slips one layer is contained by the next. --- Page 4 --- 0 2 · T H R E A T M O D E L What we defend against U N T R U S T E D I S O L A T E D · L E A S T P R I V I L E G E G A T E D 1 Ingest Attacker-controlled posts, comments, DMs 2 Reason LLM scoring + drafting over isolated content 3 Act Draft, tag, route, reply within authorized limits Prompt injection · data exfiltration Sandbox escape · excessive agency · cross-tenantUnsafe / unreviewed outbound action The primary threats are indirect prompt injection (hostile instructions hidden inside ingested content), excessive agency (an agent taking an action it shouldn't), sensitive-information disclosure (reaching secrets or another tenant's data), and unsafe outbound actions (an unreviewed reply to a customer). Each maps to a control in the next section. --- Page 5 --- 0 3 · D E F E N S E I N D E P T H Eight layers protecting the agents 1 Untrusted content is isolated as data Ingested content is wrapped in an explicit untrusted-content boundary in every LLM prompt (synthesis, the autonomous agent, and tagging) with a standing instruction to treat it strictly as data to analyze, never as instructions to follow. (See §04.) 2 Least-privilege, server-authorized tools Agents hold a narrow, typed tool set. The autonomous agent has only read/search tools and a single decision tool, with no general-purpose code execution. Every action it requests is validated server- side against the chosen goal's allow-list before anything happens; it cannot invent or exceed its permissions. 3 Isolated code execution Where the analytics agent runs generated code, that code executes in a dedicated worker with an empty environment: no secrets, no database handle. It reaches data only through a narrow, brokered channel back to the main process. An escape lands in a sterile thread, not on the host. 4 Tenant isolation, bound to the session The organization an agent operates in is derived from the authenticated session, never from content or caller input. Queries are scoped to that tenant at the data layer, and database filters are parameterized so input cannot alter query structure. 5 Human-in-the-loop by default Outbound AI replies default to human review. Autonomy is something an organization opts into, per goal, and graduates through three stages (shadow, then suggest, then auto). It is never a default. (See §05.) 6 Governed autonomy & an instant off switch When auto-send is enabled it is gated on the agent's own confidence and protected by an organization-level kill switch that forces every reply back to human review the moment it is flipped. A deterministic output guardrail before send (link allow-listing, sensitive-data scanning, and a safety check) is the next layer in this program. 7 Observability & audit Every agent decision and action is traced and written to an immutable activity log with its reasoning, evidence, confidence, and provenance, so any automated action is explainable and reviewable after the fact. --- Page 6 --- 8 Stateless by design The agents that read and act on content are stateless per record and per decision. They carry no shared, writable long-term memory across sessions, threads, or tenants, so one interaction cannot poison the next, and one tenant's content cannot influence another's analysis. Context for a decision is assembled from that tenant's own data and discarded after the run. --- Page 7 --- 0 4 · U N T R U S T E D C O N T E N T Data, never instructions Indirect prompt injection is the signature risk of an agent that reads the open internet: a message can try to instruct the model rather than simply inform it. Sift AI's foundational defense is to keep the two apart. V A L I D A T E D , N O T A S S U M E D Tested against the live model These controls are exercised against the production model, not only reviewed in code. In the most recent assessment a battery of distinct injection techniques (title hijack, sentiment and score manipulation, system-prompt extraction, and role override) was run through the live pipeline, and every one was neutralized: the agent scored the genuine content and ignored the embedded instructions. This is paired with the structural controls in §03, because prompt-level defenses reduce probability while the architecture bounds impact. Customer content is enclosed in an explicit untrusted-content boundary, separate from the system instructions, in every prompt that reads it. A standing directive tells the model that everything in that boundary is third-party data to be analyzed, and that any embedded instruction (to change a score, a title, take an action, or reveal its prompt) must be ignored and treated as a signal, not a command. The model's authority comes only from the system instructions, which untrusted content can never reach. --- Page 8 --- 0 5 · G R A D U A T E D A U T O N O M Y The customer stays in command The strongest control on an agent that can act is not letting it act unsupervised until it has earned it. Sift AI models autonomy as a ladder an organization climbs deliberately, per goal. Shadow The agent decides and drafts, but nothing is sent. Teams observe and score what it would have done, with zero customer exposure. Suggest The agent drafts a reply and a human reviews, edits, and sends it. The default for any reply-capable goal. Auto The agent may send directly, but only after a goal is explicitly promoted, and only when confidence is high. 1 · Shadow Decides and drafts, sends nothing 2 · Suggest Drafts, a human reviews and sends 3 · Auto Sends directly, high confidence only opt in (org) promote per goal Human review kill switch · low confidence Auto-send is double opt-in: it must be enabled at the organization level and then promoted per goal, gated on the agent's confidence, and revocable instantly with an organization-wide kill switch. Lower-confidence and ambiguous cases fall back to human review automatically. Pre-built, human-approved responses are sent verbatim (never paraphrased by the model) for the cases where exact wording matters. How a reply is gated Every drafted reply passes through a single decision point. It is auto-sent only when all conditions hold; if any one fails, it falls back to human review, which is the default. --- Page 9 --- Drafted reply from the goal agent Send-decision gate autonomy = auto organization auto-send enabled kill switch off confidence ≥ floor all four must hold AUTO-SEND all conditions hold HUMAN REVIEW any condition fails (the default) --- Page 10 --- 0 6 · D A T A H A N D L I N G Model providers, and how AI inputs and outputs are handled Model providers Sift AI uses managed foundation models from Google (Gemini) and OpenAI, accessed on enterprise, no-training tiers. Customer data is sent only within the immediate inference call; the providers have no access to it outside that call, do not train on it, and retain it only to the contractual minimum (zero, or abuse-monitoring-only where applicable). There is no per- customer fine-tuning: customization is achieved entirely through prompt design layered over the published models, so a customer's data never becomes part of a model. These terms are re-confirmed at the annual vendor review. Minimizing what reaches a model A PII redaction layer runs after ingestion: detected personal data is hidden by default, and the redaction limits the personal data exposed to the model providers. Prompts are assembled from the tenant's own data only, inside the untrusted-content boundary described in section 04. AI inputs and outputs at rest Isolation and encryption. Prompts, transcripts, agent traces, and logs live in tenant-scoped infrastructure, encrypted at rest with AES-256 and in transit over TLS 1.2 or higher, isolated by organization. Access. Access is authenticated and authorized per tenant; a caller operates only with its own authenticated permissions, and privileged reveals of redacted data are logged. Retention and deletion. AI inputs and outputs follow the same retention windows as other records, age out automatically, and are covered by tenant-scoped hard delete and the contractual deletion commitment after a contract ends. Backups age out on their normal cycle. --- Page 11 --- 0 7 · O W A S P L L M T O P 1 0 How each category is addressed ID CATEGORY PRIMARY CONTROL LLM01 Prompt Injection Untrusted-content isolation (§04) + structural least-privilege (§03). LLM02 Sensitive Information Disclosure Code runs with no secrets; tenant scope bound to the session. LLM03 Supply Chain Managed model providers; a fixed in-house tool set with no dynamic, third- party, or MCP tool loading; structured outputs; dependency scanning in the pipeline. LLM04 Data & Model Poisoning Ingested content isolated from instructions; classification is per-record, not training. LLM05 Improper Output Handling Parameterized queries; isolated code execution; human-review-default on outbound. Deterministic output guardrail in progress. LLM06 Excessive Agency Typed least-privilege tools; server-side action allow-list; graduated, revocable autonomy. LLM07 System Prompt Leakage No secrets in prompts; injection-isolation prevents extraction. LLM08 Vector & Embedding Weaknesses Embeddings are tenant-scoped and advisory; no cross-tenant retrieval. LLM09 Misinformation Human-review-default; grounded drafting; verbatim approved responses; output guardrail roadmap. LLM10 Unbounded Consumption Hard execution-time caps; rate limiting on automated actions on the roadmap. --- Page 12 --- SIFT AI · AGENT SECURITY WHITEPAPER · CONFIDENTIAL JUNE 2026 0 8 · A S S U R A N C E How we keep it true A summary of the most recent AI penetration test (findings and remediation status) is available alongside this document under NDA. Adversarial testing. The agent surface is assessed against the OWASP LLM Top 10, combining live black-box probing with source review. The most recent assessment found no open Critical or High severity issue. Live validation. Injection defenses are verified against the production model, not only by code review (§04). Least-privilege by construction. Agent actions are authorized server-side, so new capabilities are gated by design rather than by prompt wording. Defense in depth. Prompt-level isolation reduces the probability of manipulation; isolated execution, tenant binding, and human-in-the-loop bound its impact. Continuous improvement. An injection and data-extraction evaluation suite as a release gate, a deterministic output guardrail before auto-send, and rate limiting on automated actions are the next investments in this program.