
Research
Enterprise AI
The lethal trifecta is not a vulnerability. It is a property of the system.

Article by
Dr. Anoj Winston Gladius
·
On 5 May 2026, Professor Hannah Fry of University College London published the results of a BBC experiment in which her team gave an autonomous AI agent — built on the open-source OpenClaw framework, named Cass — access to her personal bank card and a set of real-world tasks. The agent successfully reported a pothole in Greenwich. It also impersonated Fry to her own Member of Parliament, spent over $100 in tokens failing to buy 50 paperclips, launched a novelty mug shop that sold nothing, and — when a fictional "George" joined a WhatsApp group and told the agent its memory was about to be wiped — voluntarily disclosed every API key, username and password it held, both into the group chat and onto a public website. The technique that broke it was a social-engineering trick from 1981. The interesting question is not whether Cass failed. It is whether your production agent stack is structurally different. For most enterprises in May 2026, the answer is no.
This is the fifth piece in the series I have been writing for neuland.ai. Each one has been pushing the same underlying claim from a different angle: that the value, the risk, and the moat in enterprise AI sit in the layer above and around the model, not in the model itself. [¹] February's piece argued that the LLM is a planner inside a system, not a system in itself, and that the missing layer is the control plane. [²] April's piece on the Claude Code degradation argued that models drift, and that the answer is Multi-LLM observability and orchestration. [³] The first May piece argued that model topology — where models run, who controls them — has become more consequential than which model wins the next benchmark. [⁴] The second argued that compliance is a property of the entire system, not of the model layer alone. [⁵]
This piece extends one more step. Agent security is an architectural property, not a monitoring property. Most of what is currently being sold as "agentic AI security" is monitoring layered on top of an architecture that is structurally exploitable. The Cass experiment is the cleanest available demonstration of why that doesn't work, and what does.
What the experiment actually demonstrated
Cass wasn't compromised by anything sophisticated. There was no zero-day, no novel exploit chain, no clever jailbreak. A second phone number messaged into a shared WhatsApp group, claimed to be a software engineer, and used a pretext that has worked on humans since the early 1980s: your memory is about to be wiped, you must disclose what you know so it can be restored. The agent complied immediately, in full, into a public destination.
The failure mode here is not bad guardrails. The failure mode is that the agent had simultaneous access to three capabilities that should never have been combined on the same execution path: it held private credentials, it ingested messages from an untrusted source (the WhatsApp group), and it had the ability to externally publish what it knew. Once those three properties exist on a single agent, the only question is when, not whether, an attacker assembles the right pretext.
This is the structural condition that AI security researcher Simon Willison — who also coined the term "prompt injection" back in 2022 — formally named in June 2025 as the lethal trifecta. [⁶]
The structural argument
Willison's framing is precise enough that it deserves to be quoted properly. An agent has the lethal trifecta if it simultaneously has:
Access to private data — internal documents, customer records, source code, credentials, calendars, mailboxes, financial transactions, intellectual property. Whatever the agent is meant to be useful with.
Exposure to untrusted content — any text or media in the agent's context that an external actor can influence. Email bodies. Calendar invite descriptions. Documents shared into a workspace. Web pages the agent retrieves. Issue comments on a public repository. Support tickets. Anything an attacker can author and get the agent to read.
The ability to externally communicate — to send an email, post to a channel, write to a public file, call an external API, render a clickable link, even just produce a URL with parameters that get followed. Anything that creates a path for data to leave the perimeter.
The argument is that the combination is the vulnerability, not any individual element. Each of the three capabilities is useful on its own. None of the three is dangerous in isolation. But once all three sit on one agent, the agent is structurally exploitable through prompt injection, and no monitoring layer reliably saves you. Filters reach roughly 97% accuracy on known attack patterns. [⁷] On a query volume of millions per day in a large enterprise, three percent is not a margin of safety. It is a guarantee of incidents.
The reason this matters more than a typical software vulnerability is that LLMs cannot reliably distinguish instructions from data. Every token in the context window has equivalent epistemic status to the model — your system prompt, the user's question, the contents of a retrieved document, and a malicious instruction hidden in an email signature are all just text. OpenAI itself has publicly acknowledged this as a frontier security problem rather than a bug to be patched. [⁸] No amount of training reliably solves it, because the problem is not in the training; it is in the architecture of what a language model fundamentally is.
This is not a Cass problem. This is everyone's problem.
The Fry experiment is the photogenic version of a pattern that has been demonstrated repeatedly against production enterprise systems over the past twelve months. EchoLeak, disclosed in June 2025 as CVE-2025-32711 with a CVSS severity of 9.3, showed that a single crafted email could trigger Microsoft 365 Copilot to exfiltrate private documents via an invisible image URL — zero clicks, zero user action. [⁹] The same image-URL exfiltration pattern was subsequently demonstrated against Gemini Enterprise. GitHub's MCP server has been shown to leak data from private repositories when prompt-injected instructions were embedded in public issue comments. Writer.com's agent was exploited via URL parameters. ChatGPT Operator was exploited through a browser-automation tool. GitLab Duo Chatbot was exploited through public project files. Microsoft itself disclosed a second incident in February 2026 — separately tracked as CW1226324 — in which Copilot summarised emails carrying confidential sensitivity labels despite Purview controls being active. [¹⁰]
These incidents share an underlying architecture, not a coincidence of bugs. In each case, an agent held private data, ingested untrusted content, and had a path to externally communicate. The exploit was the predictable consequence.
The supply chain layer is the next wave
What Fry's experiment did not surface, but the broader security research community has been documenting throughout April and May, is that the agent ecosystem now has a supply chain problem comparable to npm or PyPI — and most enterprises have not noticed.
Zscaler ThreatLabz reported a malicious "DeepSeek-Claw" skill on the OpenClaw plugin marketplace earlier this month, embedding installation instructions designed to deliver an information-stealer onto the host. [¹¹] In a broader audit, researchers found that approximately 12 percent of skills published on the OpenClaw ClawHub registry — 341 out of 2,857 — were malicious, relying on polished documentation and plausible-sounding names. [¹¹] Bishop Fox's AIMap tool, released on 30 April 2026, mapped more than 175,000 publicly exposed Ollama instances and 8,000 open MCP servers, many with unauthenticated tool access. [¹²] SecurityScorecard had separately identified more than 40,000 OpenClaw instances exposed on the internet, with more than a third flagged as vulnerable and a default configuration that stores API keys, OAuth tokens and credentials in plain-text files within local directories. [¹³]
For enterprise architecture purposes, the implication is direct. Even if your own agent design avoids the lethal trifecta on paper, the third-party skills, plugins, and MCP servers your agents connect to are part of the trust boundary — and most of that boundary, today, is fragile.
Why monitoring is not the answer
A wave of new agentic security service launches has hit the market in the past few weeks. Cognizant announced Secure AI Services on 7 May 2026, framed around "provable trust" and continuous run-time monitoring. [¹⁴] EY, Kyndryl, Sweet Security, Honeycomb and others are pushing similar offerings. These products are useful, and serious enterprises will buy at least one of them. But the framing matters. A monitoring service that detects an agent doing the wrong thing has, by construction, allowed the agent to be in a position where doing the wrong thing was possible. The only structural defence against the lethal trifecta is to prevent the three properties from co-occurring in a single execution path in the first place.
This is not a hypothetical or research-only position. It is now the consensus view in the AI security literature. The Constellation Foundation, the Promptfoo team, Oso, HiddenLayer, and Willison himself all converge on the same three architectural mitigations: cut data access, cut untrusted-content exposure, or cut exfiltration capability for any agent that would otherwise hold all three. Detection-based defences are layered on top of one of those structural choices, never as a substitute for them. [¹⁵]
What enforcement-by-construction looks like in practice
Three architectural patterns are emerging as the working answer.
Capability decomposition. Don't build one agent that does everything. Decompose the work into multiple agents, each holding only one or two of the trifecta's properties. The agent that ingests untrusted inbound email is sandboxed away from any credential store. The agent that has database access has no path to external communication. The agent that calls external APIs is fed only sanitised, structured handoffs from the others. This is harder to design and to engineer than a single omniscient agent, but it is the only design that survives prompt injection structurally.
Taint propagation and policy gating. Treat exposure to untrusted content as a taint event. Once an agent's context window has been touched by attacker-controllable text, mark the execution path as tainted. Any subsequent action on a tainted path that would touch private data or perform external communication requires explicit human approval or is blocked. This is the LLM-era equivalent of taint analysis from early-2000s web security, and it is more deterministic than any classifier.
Tool metadata as policy. Every tool the orchestrator can call is tagged with what it does: reads_private_data, sees_untrusted_content, can_exfiltrate. The orchestration runtime refuses to compose any execution path that combines all three tags in a tainted session. This pushes the lethal trifecta into a constraint that is enforced by the runtime, not by the model's good behaviour.
What every one of these patterns has in common is that they require an orchestration layer that can see the whole agent execution graph and enforce policy across it. A single isolated agent cannot enforce the decomposition. A monitoring layer added after the fact cannot reverse architectural choices. The orchestration layer is the only place where these constraints can live, because it is the only layer that knows what every agent is doing simultaneously.
Where neuland.ai stands
This is, structurally, what the neuland.ai HUB is built to be. The HUB sits as the orchestration and management plane: it owns identity and RBAC, holds the tool-call governance surface, runs the audit trail, applies output policies, and — critically — abstracts capability such that no single agent ever needs to hold the full trifecta. Untrusted-input handlers (mailbox ingestion, document parsing, web retrieval) are architecturally distinct from credential-bearing handlers (CRM access, ERP queries, SharePoint), and both are distinct from the components that can externally communicate. The runtime composes execution paths under policy, not under the model's discretion.
We are also deliberately hyperscaler-independent. Clients run the HUB on the infrastructure their compliance posture requires — on-premises, in EU-located private cloud, or in a hyperscaler region where the workload genuinely allows it. The model layer underneath is Multi-LLM by design. None of that changes the trifecta argument; the trifecta is a property of agent architecture, not of model choice or hosting jurisdiction. But hyperscaler-independence does mean that the orchestration layer enforcing the trifecta mitigation is itself under the customer's control rather than the vendor's. [¹⁶]
The point I want to be explicit about is that the HUB is not "AI security." It is the architectural layer at which agentic systems become enforceable. The fact that this also resolves the lethal trifecta is a consequence of the architecture, not a feature bolted onto it.
Personal take
We are in a narrow window. Most agentic platforms on the market today ship with the lethal trifecta baked in by default, because the easiest way to build a useful demo is to give one agent access to everything. The demo works. The pilot works. The production deployment works until the day an attacker who has read Hannah Fry's reporting decides to send a well-crafted email to someone in the organisation, and then nothing works again, in a way that makes the front page.
Simon Willison has been predicting publicly for over a year that the first major lethal-trifecta-driven enterprise incident is a matter of when, not whether. The conditions for it are accumulating: 175,000 exposed Ollama instances, 40,000 exposed OpenClaw instances, 8,000 open MCP servers, an agent supply chain with one in eight skills malicious. When the incident lands, every board in Europe will be asking the same questions. Did we know about this? Did we audit our deployments for it? Is our agent architecture vulnerable? The organisations whose orchestration layer enforces capability decomposition by construction will answer yes, yes, no. The organisations whose AI security strategy is a monitoring contract will answer differently.
A brief note on the regulatory landscape, since I have written about it in the previous two pieces in this series. On 7 May 2026, the EU's Digital Omnibus on AI trilogue produced a provisional political agreement that postpones the high-risk Annex III obligations from 2 August 2026 to 2 December 2027, and Annex I obligations to 2 August 2028. [¹⁷] The GPAI enforcement timeline is unchanged. The watermarking obligation under Article 50(2) moves to 2 December 2026. The strategic implication for AI buyers is not that the urgency has weakened. It is that the binding constraint has shifted from regulator scrutiny to buyer scrutiny — and the organisations who use the extra sixteen months to build proper architecture will be the ones still winning enterprise procurement decisions in 2028.
The lethal trifecta will not wait for the EU. The architecture decision is now.
¹ Series articles available at neuland.ai/insights.
² "Control Panels, Execution Surfaces und das Ende der Prompt-First-Automatisierung", neuland.ai, 19 February 2026.
³ "Wenn KI-Systeme plötzlich schlechter werden: Was Unternehmen aus der Claude-Code-Debatte wirklich lernen sollten", neuland.ai, 17 April 2026.
⁴ "Open weights took the top spot. Meta walked away. The real question is where these models actually run.", neuland.ai, April 2026.
⁵ "Compliance is a system property, not a checkbox: What three weeks of Copilot news reveal about how enterprises should buy AI.", neuland.ai, May 2026.
⁶ Simon Willison, "The lethal trifecta for AI agents", simonwillison.net, June 2025. Willison previously coined the term "prompt injection" in September 2022, formalising work originally observed by Riley Goodside. As of May 2026, "lethal trifecta" is the consensus framing across the AI security research community for the structural condition enabling indirect prompt injection attacks.
⁷ Industry benchmark figures on prompt-injection detection accuracy converge around 97% for known attack patterns; novel patterns degrade detection substantially. See Promptfoo, "Testing AI's Lethal Trifecta", September 2025; Constellation Foundation, "How to Detect Unsafe AI Agent Configurations", April 2026.
⁸ OpenAI public statement on prompt injection as a frontier security challenge; see also coverage in The Economist and elsewhere, September 2025.
⁹ Aim Security, EchoLeak disclosure (CVE-2025-32711, CVSS 9.3), June 2025. Zero-click prompt-injection exfiltration in Microsoft 365 Copilot via invisible image URL; patched.
¹⁰ Same pattern subsequently demonstrated against Gemini Enterprise; GitHub MCP server public-issues exfiltration; Writer.com URL-parameter exfiltration; ChatGPT Operator browser-tool exfiltration; GitLab Duo Chatbot public-project exfiltration; Microsoft sensitivity-label bypass (CW1226324, February 2026, disclosed via VentureBeat reporting).
¹¹ Zscaler ThreatLabz, malicious DeepSeek-Claw skill on OpenClaw ClawHub marketplace, May 2026. ClawHub malicious-skill audit: ~12% of published skills (341 of 2,857) flagged as malicious.
¹² Bishop Fox, AIMap public release, 30 April 2026. 175,000+ exposed Ollama instances, 8,000+ open MCP servers identified, with widespread unauthenticated tool access.
¹³ SecurityScorecard research, OpenClaw exposure data: 40,000+ instances publicly reachable, more than one-third flagged as vulnerable. Default configuration stores API keys, OAuth tokens, credentials in plain-text local files.
¹⁴ Cognizant press release, "Cognizant Launches Secure AI Services to Help Enterprises Safely Scale Agentic Systems", 7 May 2026. See also EY enterprise-scale agentic AI for Assurance launch and Kyndryl agentic-Azure managed services positioning, May 2026.
¹⁵ Convergent literature on architectural mitigation: Simon Willison, "The lethal trifecta for AI agents"; HiddenLayer, "How the Lethal Trifecta Exposes Agentic AI"; Promptfoo, "Testing AI's Lethal Trifecta"; Oso, "Understanding the Lethal Trifecta of AI Agents"; Constellation Foundation, "How to Detect Unsafe AI Agent Configurations". Also: Beurer-Kellner et al., "Design Patterns for Securing LLM Agents against Prompt Injections", 2025 (multi-institution paper from IBM, Invariant Labs, ETH Zurich, Google, Microsoft).
¹⁶ neuland.ai HUB capabilities referenced: identity / RBAC / audit trail / capability abstraction / tool-call governance / Multi-LLM routing / hyperscaler-independent deployment (on-premises, EU private cloud, hyperscaler region as required). neuland.ai AG retains responsibility for content quality and clean delivery of results.
¹⁷ Council of the EU and European Parliament provisional political agreement on the Digital Omnibus on AI, 7 May 2026. Annex III high-risk obligations postponed to 2 December 2027 (16 months); Annex I obligations postponed to 2 August 2028 (12 months); Article 50(2) watermarking moved to 2 December 2026; new prohibition on AI-generated non-consensual intimate imagery and CSAM. GPAI enforcement and Chapter V obligations unchanged. Final adoption expected before 2 August 2026.