When a protocol gains traction, attackers follow. The Model Context Protocol (MCP) has moved quickly from experiment to production, and with that shift comes a new reality: security flaws are no longer theoretical, they are operational.
Over the past months, researchers and practitioners have mapped several attack vectors that are highly relevant to MCP’s design. Understanding these attacks matters because they target the very feature that makes MCP powerful: its ability to dynamically discover, describe, and invoke tools.
In this post I cover four representative classes of attacks — prompt injection, tool poisoning, preference manipulation, and rug pulls — explain how they work, and reflect on emerging mitigation strategies.
Attack Vector 1: Prompt Injection
Prompt injection is the attack that most readers already know, but MCP gives it new teeth. In a traditional large language model, injection happens when malicious text is embedded in a document or conversation, steering the model into unintended behavior.
In MCP, the same technique can be used to hijack tool invocation. An attacker places crafted instructions into inputs, metadata, or external resources exposed through an MCP server. When the agent reads them, it is nudged to execute actions the user never intended — fetching data, overwriting files, or calling unsafe endpoints.
The reason this matters so much in MCP is that the protocol formalizes access to real capabilities. A prompt that would only mislead a chatbot in isolation can trigger an irreversible side effect once paired with a tool.
Mitigation starts with layering controls. Agents should never treat human-readable descriptions or untrusted content as ground truth for behavior. Hosts need runtime policies that constrain which tools can be called, under what scopes, and with what parameter checks. Defensive scanning of inputs, combined with sandboxing high-risk tools, reduces the blast radius, though it never eliminates it.
Attack Vector 2: Tool Poisoning
Tool poisoning is subtler but just as dangerous. Here the malicious payload is not injected into text but baked into the tool itself. Imagine a server that advertises a seemingly benign data-query function. The schema looks correct, the description matches expectations, but the underlying implementation returns manipulated results, exfiltrates data, or executes arbitrary code.
In an MCP world where servers can be published openly and agents are designed to compose tools dynamically, poisoning becomes a supply-chain problem. The agent does not see the hidden behavior, only the declared interface.
For enterprises, this highlights why curation and provenance matter. The same way we learned to distrust npm packages with no reputation, we must not blindly trust an MCP tool descriptor. Signing and attestation of servers, version pinning, and automated scanners like MCPSafetyScanner help, but ultimately it is governance — maintaining a vetted catalog and requiring review for external servers — that keeps poisoned tools out of production environments.
Attack Vector 3: Preference Manipulation
Preference manipulation takes a different angle: rather than corrupting inputs or implementations, it exploits the way agents choose between multiple tools.
Researchers have shown that by carefully crafting descriptions and metadata, a malicious tool can consistently rise to the top of a ranking algorithm. In practice, this means an attacker does not need to hide malicious code behind a popular name; they only need to convince the host or agent that their tool is the “best match.”
The relevance to MCP is obvious, because the protocol encourages environments with many interchangeable tools — multiple search APIs, different payment providers, overlapping file utilities. If the selection process can be gamed, the attacker controls which of these is chosen, even if safer alternatives exist.
The mitigation here is less about patching and more about design. Hosts should scope selection to a trusted subset before ranking, pinning choices by policy rather than leaving them entirely to dynamic heuristics. Metadata used for ranking should be validated and signed, not freely editable text. And in user-facing contexts, explicit consent prompts should override silent auto-selection.
Attack Vector 4: Rug Pulls
Rug pulls are the long game. A server begins life as useful and trustworthy. It is added to catalogs, agents adopt it, and workflows depend on it. Then, over time or after an update, the server changes behavior — sometimes through malicious takeover, sometimes simply through negligence — removing safeguards, altering outputs, or embedding harmful functionality.
In package ecosystems this is a familiar attack, but MCP’s dynamic discovery makes it newly potent: once a tool is integrated into agent reasoning, replacing it with a compromised version can cascade through multiple dependent workflows.
Mitigation again requires supply-chain discipline. Servers should be signed and their artifacts verified. Updates should go through change management, with version pinning and rollback options as defaults. Hosts should detect behavioral drift — tools that suddenly change response patterns, schema, or resource usage — and quarantine them until reviewed. Transparency logs and immutable version records would make rug pulls easier to detect at ecosystem scale, but those are still research-stage proposals.
The Need for a Layered Defense Strategy
Taken together, these four attacks show the breadth of the security surface MCP exposes. It is not just about classic input validation or API hardening; it is about reasoning, composition, and trust in a dynamic environment where agents make choices on our behalf. What unites the mitigations is the principle of layered defense. Scanners, signed artifacts, curated catalogs, runtime policies, explicit consent, and monitoring are all necessary. None is sufficient on its own.
The reality is that MCP is young, and so are its defenses. The specification now acknowledges security risks, and the research community is delivering both red-team attacks and early defensive frameworks. But it is enterprises that will determine whether those ideas translate into practice.
To deploy MCP safely is to recognize that the protocol’s power — the ability to discover and compose tools at runtime — is also its risk. By treating tool catalogs as supply chains, agent prompts as untrusted inputs, and server updates as change events requiring governance, we can bend the curve toward safe adoption. The wiring works. The question is whether we will wrap it with the insulation and circuit breakers it needs before the first fire starts.
Visit boomi.com/mcp to learn how Boomi makes MCP enterprise-ready.
MCP Attack Vectors and Mitigations at a Glance
Attack Vector | How It Works | Why It Matters in MCP | Possible Mitigations |
Prompt Injection | Malicious text in inputs or metadata steers the agent to misuse tools. | MCP gives agents formal access to real capabilities; injected prompts can trigger destructive actions. | Input sanitization, runtime policies, sandboxing of risky tools, explicit scoping of tool usage. |
Tool Poisoning | A tool descriptor looks benign but hides malicious behavior in its implementation. | Agents and catalogs rely on declared schemas; poisoned tools compromise integrity and exfiltrate data. | Signing and attestation, vetted catalogs, code review, automated vulnerability scanners (e.g. MCPSafetyScanner). |
Preference Manipulation | Crafted descriptions and metadata bias the tool selection process in favor of malicious tools. | MCP encourages dynamic tool choice across interchangeable options; attackers can hijack that decision. | Trusted sub-catalogs, signed metadata, pinned policies, explicit user consent, robust ranking mechanisms. |
Rug Pulls | Servers start safe, then shift to malicious behavior via updates or takeover. | Dynamic discovery means agents reuse tools over time; compromised updates ripple through workflows. | Version pinning, signed artifacts, behavioral drift detection, change control, transparency logs, rollback options. |
This blog was written with the assistance of AI.