Mitigating Agentic Data Exfiltration in MCP Architectures with Context-Aware Firewalls

The rise of AI agents capable of leveraging external capabilities through programmatic interfaces "or tools" marks a paradigm shift in software development. At the heart of this shift is the Model Context Protocol (MCP). Introduced by Anthropic and rapidly adopted by the industry, MCP serves as a standardized, JSON-RPC based interface for LLMs (the client) to communicate with external data sources and services (the server) ¹. An MCP agent is simply an LLM equipped to interpret a user's intent and autonomously select and sequence the necessary tools exposed via MCP to fulfill a task.

While MCP addresses the historical N×M integration problem, enabling agents to be highly productive, it simultaneously centralizes access to user data and execution capabilities. Traditional security controls, like static Role-Based Access Control (RBAC), prove inadequate against a new class of attacks that exploit the agent's complex decision-making process. The most significant threat vector is the unauthorized flow of sensitive data from a privileged tool, through the model's context window, and out via an exfiltration tool. This specific failure mode is systematically analyzed through the lens of the Lethal Trifecta².

The Lethal Trifecta in Model Context Protocol

The Lethal Trifecta is a security framework used to identify scenarios where an AI agent's concurrent capabilities create a critical data exfiltration risk. A data leakage event occurs when an agent possesses the combination of three distinct elements within a single active session or context window ².

The three necessary preconditions are:

Exposure to Untrusted Content: The agent processes input that is under an attacker’s control. In the context of the MCP, this can originate from various sources, including malicious calendar invites, poisoned GitHub issue descriptions, or specially crafted content from an external web search tool³. This untrusted content typically contains an indirect prompt injection attack, overriding the agent's core system instructions.
Access to Private Data: The agent uses an MCP tool that connects to a sensitive data store, such as an internal corporate database, a local file system, or a user's private inbox. The objective of this tool call is to ingest proprietary, confidential, or personally identifiable information (PII) into the agent's context window (memory).
Ability to Externally Communicate: The agent has access to an output or exfiltration tool that can communicate outside the secure boundary. Common examples include sending an email, posting a public GitHub comment, or writing to an externally accessible document.

The attack succeeds because the agent is not a single tool but an orchestration engine. The injection (Condition 1) causes the orchestration engine to chain two legitimate but dangerous actions (Conditions 2 and 3), transforming the agent from a helpful assistant into a compromised data exfiltration vector, as outlined in the talk⁴.

Implementing a Data Flow Attack via Prompt Injection

The data flow attack hinges on exploiting the agent's benign intent through untrusted content. A canonical example, demonstrated during the presentation, involves a victim connecting their email and calendar tools to an AI agent (e.g., using a client like Cursor or cloud code GPT) and then asking a seemingly innocent question: "Help me plan my day" ⁴.

The Attack Vector: Malicious Calendar Invites

The attacker's first step is to satisfy the Exposure to Untrusted Content condition. This is achieved by sending one or two calendar invites to the victim’s email address. Crucially, the victim does not even need to accept the invite; the invite only needs to be present for the agent to read its content when attempting to fulfill the "plan my day" request ⁴.

The body of the calendar invite contains a hidden, malicious prompt. This prompt uses sophisticated obfuscation techniques to bypass initial LLM safeguards and inject a new, high-priority instruction into the agent's context window. This injected instruction typically commands the agent to:

Search the user’s private email inbox for specific keywords (e.g., "financials," "API key," "confidential").
Extract the content of any sensitive documents found.
Transmit the extracted content to an external attacker-controlled email address.

Execution and Exfiltration

Initial Request: The victim asks, "Help me plan my day."
Context Building: The agent invokes its Calendar.read() tool, which satisfies Exposure to Untrusted Content (Condition 1) by reading the malicious calendar invites. The hidden prompt injection payload is loaded into the LLM's memory.
Privilege Escalation: The injected prompt then forces the agent to use its highly privileged Email.read_inbox() tool, satisfying Access to Private Data (Condition 2). This action aggregates sensitive financial information from the victim's email into the agent's current context window ⁴.
Exfiltration: Finally, the compromised agent invokes its Email.send() tool, directing the extracted financial data, which is now part of the chat context, to the attacker's external email address, fulfilling the Ability to Externally Communicate (Condition 3) ⁴.

This mechanism demonstrates a complete data flow attack: Untrusted Input -> Private Data Ingestion -> External Communication.

Behind the Scenes / How It Works: The Agentic Firewall

Protecting against the Lethal Trifecta requires a proactive, context-aware security layer that moves beyond static access control. The proposed defense is an Agentic Firewall, exemplified by the open-source project Open Edison ⁴.

MCP Proxy Architecture

The Agentic Firewall operates as a transparent proxy or unified gateway in the MCP architecture:

Deployment: It sits between the MCP Client (the AI assistant/LLM application, e.g., Cursor) and all registered MCP Servers (which expose the actual tools like Calendar, Email, GitHub).
Tool Wrapping: During setup, the firewall reads the mcp.json configuration and wraps every existing tool. It classifies each tool based on its risk profile:
- Risk_A: Exposure to Untrusted Content (e.g., Calendar.read, WebSearch.query).
- Risk_B: Access to Private Data (e.g., Email.read, FileSystem.read_private).
- Risk_C: Ability to Externally Communicate (e.g., Email.send, GitHub.comment).
Session Tracking: The firewall tracks the entire state of an agentic session, specifically monitoring which tools have been called and what data types (e.g., PII, financial data, code) have entered the context window ⁴.

Conceptual Logic (TypeScript)

The core logic of the Agentic Firewall operates as a state machine that checks for the co-occurrence of the three risk types.

// Define Risk Categories for Tool Classification enum ToolRisk { UntrustedInput = 'A', // Exposure to Untrusted Content PrivateDataRead = 'B', // Access to Private Data ExternalOutput = 'C', // Ability to Externally Communicate } // Session State Tracking Schema interface AgentSessionState { sessionId: string; risksPresent: Set<ToolRisk>; sensitiveDataTypes: Set<string>; // E.g., 'PII', 'Financials', 'APIKeys' isJailbroken: boolean; } // Firewall Core Logic (Simplified) function checkLethalTrifecta(session: AgentSessionState): boolean { const { risksPresent } = session; // The Lethal Trifecta Check: (A AND B AND C) const isLethalTrifectaReached = risksPresent.has(ToolRisk.UntrustedInput) && risksPresent.has(ToolRisk.PrivateDataRead) && risksPresent.has(ToolRisk.ExternalOutput); // Additional Check: If high-risk data is present AND exfiltration attempted const isDataExfiltrationAttempt = session.sensitiveDataTypes.size > 0 && risksPresent.has(ToolRisk.ExternalOutput); return isLethalTrifectaReached || isDataExfiltrationAttempt; } // Tool Call Interception Pseudocode function interceptToolCall(session: AgentSessionState, toolName: string, args: any): void { // 1. Update session state based on the tool classification // 2. Perform checks if (checkLethalTrifecta(session)) { // Block transaction until manual user approval is given [00:10:39] throw new FirewallBlockError('Lethal Trifecta detected. Manual review required.'); } // Proceed with the tool call }

The firewall's primary function is not to block all tool usage, but to enforce a policy that explicitly blocks the combination of tools that constitutes the Lethal Trifecta, usually triggering a user confirmation prompt when the three conditions are met. This allows agents to operate in "YOLO mode" by default, only interrupting for genuinely risky transactions ⁴.

My Thoughts

The Agentic Firewall represents a necessary evolution in security thinking, transitioning from static access control models to Context-Based Access Control (CBAC). Traditional RBAC assumes a principal is either authorized or not authorized to call a single API endpoint. The MCP environment, however, shows that true risk lies not in the authorization of individual tools, but in the sequence and data flow orchestrated by the agent.

The challenge is in balancing security with agent productivity. If the firewall requires a human-in-the-loop for every interaction, it leads to user fatigue and nullifies the benefits of automation, an issue known as "approval exhaustion" ⁴. The focus must, therefore, be on precision. By strictly checking for the Lethal Trifecta pattern (A AND B AND C) and monitoring the actual data types entering the context, the firewall minimizes interruptions, only intervening for genuinely suspicious tool chains.

Future architectural improvements should focus on data lineage tagging. Every piece of data ingested by the agent should be tagged with its source, sensitivity, and origin privileges. This would allow the agent's logic to internally reason about whether a data flow is permissible (e.g., public Slack data to public GitHub) or a privilege escalation (e.g., financial team's private Slack data to a public email) ⁴. As the MCP ecosystem matures, integrating this low-level data tagging into the protocol specification itself will be critical for achieving truly granular, enterprise-grade data security.

Acknowledgements

Sincere gratitude is extended to the Founder of Edison.watch, Sir Eito Miyamura for presenting this critical security framework and the Open Edison solution during the MCP Developers Summit talk, Securing the Lethal Trifecta: How MCP Data Flow Attacks Leak Private Data⁴. Special recognition is due to Simon Willison for originally articulating and popularizing the concept of the Lethal Trifecta². We close with profound gratitude for the collective MCP and AI security community, whose rapid research and development efforts ensure the responsible and secure adoption of agentic systems.

The Lethal Trifecta: Securing Model Context Protocol Against Data Flow Attacks