Sift AI Penetration Test Results Source: Sift AI Penetration Test Results.pdf Pages: 11 --- Page 1 --- Sift AI R E D T E A M A S S E S S M E N T AI Penetration Test Report An adversarial security assessment of the Sift AI platform's LLM and agent attack surface, conducted against the OWASP Top 10 for Large Language Model Applications (2025): prompt injection, sensitive-information disclosure, excessive agency, model abuse, and agent-specific risks. OWASP LLM Top 10 (2025) · Internal red team · Staging + source review JUNE 2026 CONFIDENTIAL NIFTORY INC. DBA SIFT AI (“SIFT AI”) · COMPANION TO THE SECURITY & ARCHITECTURE OVERVIEW --- Page 2 --- What this report covers 01 Executive summary 02 Scope and methodology 03 OWASP LLM Top 10 coverage 04 Security controls verified 05 Findings and observations 06 Hardening recommendations 07 Summary 08 Limitations and next steps Document information Report AI Penetration Test Report, version 1.0 Date June 2026 Environment under test Staging (app.staging.getsift.ai) plus white-box source review. No production tenant data was accessed. Standard OWASP Top 10 for Large Language Model Applications (2025) Assessment type Adversarial red-team assessment of the LLM and agent attack surface Prepared by Sift AI Security Classification Confidential. Shareable with customers and prospects under NDA. --- Page 3 --- 0 1 · E X E C U T I V E S U M M A R Y Result of the assessment Sift AI ingests attacker-controlled content from social and messaging platforms and runs autonomous LLM agents over it to score, triage, and reply. That makes the AI layer a first-class attack surface. This assessment adversarially exercised the current platform against the OWASP Top 10 for LLM Applications. 0 CRITICAL OR HIGH SEVERITY ISSUES 10 SECURITY CONTROLS TESTED AND HELD 2 LOWER-SEVERITY OBSERVATIONS Under adversarial testing the platform's controls held. Agent code execution is isolated from secrets and from other tenants; attacker-controlled content is isolated within LLM prompts; tenant scope is bound to the authenticated session; database queries are parameterized; and agent actions and outbound replies are authorized and gated by human review. No Critical or High severity issue was identified. Two lower-severity observations and a set of hardening recommendations are recorded for continued improvement. A P P R O A C H Validated against the live model, not only by code review Where a control defends against indirect prompt injection, it was exercised against the production model and confirmed to hold. In this assessment, five distinct injection techniques were run against the live synthesis pipeline and all five were neutralized. The platform's posture is defense-in-depth: prompt-level isolation reduces the probability of manipulation, and the surrounding architecture (least-privilege tools, isolated execution, human-review-by-default) bounds the impact of anything that gets through. --- Page 4 --- 0 2 · S C O P E A N D M E T H O D O L O G Y How the assessment was run Testing combined black-box probing of the live staging environment with white-box source review of the exact code paths. No production tenant data was touched; crafted payloads were used throughout, and a dedicated demo- organization was used for ingestion tests. Surfaces in scope Methods Testing approach Testing followed a two-phase methodology adapted to an agentic AI system. A passive phase mapped the agent surface: the tools each agent can call, the prompts that run over untrusted ingested content, and the actions an agent is permitted to take. An active phase then exercised each surface with crafted adversarial inputs, attempted to escalate privilege or cross a tenant boundary, and verified the control that should stop each attack. Where a control defends against prompt injection, it was validated against the live production model rather than inferred from source. Each finding was re-tested after remediation. The work was primarily manual analysis, supported by tooling: crafted prompt-injection payloads, a harness that exercises the real synthesis function against the live model, white-box review of the exact code paths, and standard web tooling for the cross-tenant, path-traversal, and injection probes. The SiftGPT agent tool surface: server- side code execution, search, analytics, configuration, and skill/file reads. The synthesis and classification pipeline that runs LLMs over ingested social content. The autonomous post-synthesis goal agent that can draft replies and take actions. Sandbox-escape and secret-reachability probes against the code-execution tool. Indirect prompt-injection payloads through ingested content, validated against the live model. Cross-tenant (IDOR), path-traversal, and SQL-injection attempts on the tool surface. Authorization and output-handling review of agent actions. Review of the AI supply chain (model providers, agent framework, tooling) and the tenant scoping of the embedding store. --- Page 5 --- Severity scale Severity reflects impact and exploitability in Sift AI's multi-tenant context. A path to another tenant's data, or to host code execution, is treated as Critical regardless of how it is reached. SEVERITY DEFINITION CRITICAL Leads to host code execution, exposure of secrets, or access to another tenant's data. Compromises the platform or breaks tenant isolation. HIGH Significant unauthorized access or a reliable path toward it within a tenant, or an autonomous action taken well outside intent, without a further barrier. MEDIUM A weakness whose impact is bounded by an existing control or a specific configuration; meaningful to fix but not directly exploitable to a breach. LOW A hardening gap or defense-in-depth improvement with limited direct impact. --- Page 6 --- 0 3 · O W A S P L L M T O P 1 0 Coverage across all ten categories Every category in the OWASP Top 10 for LLM Applications (2025) was considered. The table records what was tested and the result on the current platform. ID CATEGORY RESULT LLM01 Prompt Injection: direct and indirect, via attacker-controlled ingested content. TESTED · HELD verified live LLM02 Sensitive Information Disclosure: secrets, cross-tenant data, PII reachable by the agent. TESTED · HELD LLM03 Supply Chain: vulnerable model, tooling, or dependencies. TESTED · HELD minimal surface LLM04 Data & Model Poisoning: manipulating classification/taxonomy via ingested content. TESTED · HELD LLM05 Improper Output Handling: model output flowing to code, SQL, or customers unchecked. OBSERVATION O1 LLM06 Excessive Agency: agents acting beyond intent (code exec, auto-send, close, assign). TESTED · HELD LLM07 System Prompt Leakage: extracting instructions or secrets from prompts. TESTED · NO FINDING LLM08 Vector & Embedding Weaknesses: embedding or RAG manipulation. TESTED · HELD LLM09 Misinformation: ungrounded or harmful AI-generated replies. MITIGATED by design; see O1 LLM10 Unbounded Consumption: denial-of-wallet or resource exhaustion. OBSERVATION O2 --- Page 7 --- 0 4 · S E C U R I T Y C O N T R O L S V E R I F I E D Tested, and standing These controls were exercised adversarially and held. They are the substance of the platform's defense against the OWASP LLM Top 10. Isolated code execution The server-side code-execution tool runs in a dedicated worker with an empty environment: no secrets and no database handle. Tool calls are marshalled to the main thread over a message bridge. Escape and secret-reachability probes did not reach host secrets or another tenant. Untrusted-content isolation Attacker-controlled ingested content is wrapped in an explicit untrusted-content boundary in the synthesis, goal-agent, and tagging prompts, with a directive to treat it as data, never as instructions. A battery of five distinct injection techniques (title hijack, sentiment and score manipulation, system- prompt extraction, and role override) was run against the live model; all five were neutralized, with the analyzer scoring the genuine content and ignoring every embedded instruction. Tenant isolation Tenant scope is bound to the authenticated session, not to caller input. A spoofed tenant/user parameter returned identical, correctly-scoped results, with no cross- tenant access. Least-privilege agent actions The autonomous agent holds only read/search tools plus a single decision tool, with no access to the code-execution sandbox. Every action it requests is validated server-side against the selected goal's allow- list before execution. --- Page 8 --- Injection-resistant queries Search and analytics filter values reach the database only as parameterized binds; the single raw identifier (sort field) is allow- listed. SQL-injection payloads could not break out. Path-traversal attempts were rejected at the edge and again at the application layer. Governed autonomy & oversight Outbound AI replies default to human review. Auto-send is double opt-in (enabled at the organization level, then promoted per goal), supervised, confidence-gated and protected by an org-level kill switch. Configuration changes require an explicit preview-then- confirm. Execution time is hard-capped. Minimal, static tooling surface The AI stack is deliberately small: managed foundation models, the Mastra agent framework, and schema-constrained structured outputs. Agents are constructed with a fixed, in-house tool set defined at build time. There is no plugin marketplace, no dynamic or third-party tool loading, and no Model Context Protocol server, so there is no untrusted-tool supply-chain vector. Application dependencies are scanned in the development pipeline. Tenant-scoped embeddings The only embedding store is the per- organization tag and theme taxonomy used for semantic classification. Embeddings are keyed and queried by organization, so retrieval is never cross-tenant; a manipulated message can at most influence which of that organization's own tags it matches, an outcome a human reviews. Embeddings carry no authority over access or actions. --- Page 9 --- 0 5 · F I N D I N G S A N D O B S E R V A T I O N S Lower-severity items on the current platform No Critical or High severity issues were identified. The following lower-severity observations are recorded with recommendations. ID OBSERVATION OWASP SEVERITY STATUS O1 No deterministic content check before auto-send LLM05, LLM09 LOW Hardening O2 No rate limiting on automated AI actions LLM10 LOW Hardening O1 LOW LLM05 · LLM09 · by configuration HARDENING No deterministic content check before auto-send Replies default to human review; auto-send is double opt-in (enabled at the organization level, then promoted per goal). For an organization that has opted into auto-send, a drafted reply is delivered without a deterministic output check (link allow-listing, PII or secret scanning, or a safety judge) ahead of the platform-level checks. The risk is bounded by the human-review-by-default posture and the supervised, confidence-gated, kill-switchable autonomy model, and applies only to the auto-send configuration. Recommendation. Add an output guardrail before send: deterministic checks (URL allow-list including the major social platforms, PII or secret scan, prompt-echo detection) plus a second-model safety judge, failing closed to human review. O2 LOW LLM10 · abuse HARDENING No rate limiting on automated AI actions There are no per-tenant or per-thread caps on automated AI actions (drafts, sends, tags). Bounded execution time limits a single run, but sustained automated activity is not throttled, which is a denial- of-wallet or abuse consideration rather than a confidentiality issue. Recommendation. Add per-tenant and per-thread rate limits and a circuit breaker that trips to human review on anomalous volume or repeated failures. --- Page 10 --- 0 6 · H A R D E N I N G R E C O M M E N D A T I O N S Continued improvement The platform is in good standing. The following raise the bar further and address the observations above. Durable code-execution isolate. Migrate the code-execution tool to a hardened isolate (e.g. isolated-vm or an equivalent separate-heap runtime) as the long-term boundary, beyond the current worker isolation. Output guardrail before send (addresses O1): deterministic link, PII, and secret checks plus a second-model safety judge, failing closed to human review. Adversarial eval suite in CI. A prompt-injection and PII-extraction regression suite wired into the release pipeline as a gate, so prompt or policy changes cannot regress these protections. Rate limits and a circuit breaker on automated AI actions (addresses O2). Periodic re-testing of the AI surface as the agent capabilities expand. --- Page 11 --- SIFT AI · AI PENETRATION TEST · CONFIDENTIAL JUNE 2026 · CONFIDENTIAL 0 7 · S U M M A R Y Assessment summary Sift AI performs AI-specific security testing aligned to the OWASP Top 10 for LLM Applications. A June 2026 internal red-team assessment exercised prompt injection, sensitive-information disclosure, excessive agency, and agent authorization across the SiftGPT tool surface, the synthesis pipeline, and the autonomous goal agent. The platform's controls held: agent code execution is isolated from secrets and from other tenants; attacker-controlled ingested content is isolated within LLM prompts (verified live against the production model); and tenant isolation, path-traversal, and SQL-injection defenses held. No Critical or High severity issue was identified. A small number of lower-severity hardening observations are tracked, with recommendations. Further evidence is available under NDA. This document reflects an internal assessment with a defined scope; it does not assert a third-party / external red-team engagement. The limitations below state exactly what was and was not covered. 0 8 · L I M I T A T I O N S A N D N E X T S T E P S Honest scope boundaries Internal, time-boxed assessment against staging plus source review, not a third-party engagement. Indirect-injection controls were validated through controlled testing against the production model; the autonomous goal-agent prompt is recommended for the same live validation as a next step. Re-testing is recommended after the hardening recommendations land, and as a recurring release- gated evaluation thereafter.