When I first wrote about the Model Context Protocol (MCP), the core message was simple: MCP is the wiring, not the intelligence. It standardizes how agents discover and use tools; it does not, by itself, deliver governance, identity, or safety. This hasn’t changed, but over the last few months, a wave of academic work has stress-tested that wiring.
The result is a clearer, more rigorous view of where MCP stands today: a protocol that is rapidly maturing, increasingly embedded in real products, and — crucially — now the focus of substantive security research that maps directly to production risk.
This post brings that research together, explains what each contribution covers, and translates it into practical implications for anyone shipping MCP in real environments. It closes with an honest assessment of what is still unsolved and what needs to happen next.
The Evolution of MCP (So Far)
The starting point is the specification itself. The June 18, 2025 revision codifies a broader set of client and server capabilities — roots, sampling, elicitation on the client side; and server concepts such as prompts, resources, tools, and utilities — without changing MCP’s fundamental contract: JSON-RPC over transport-agnostic channels and a declarative surface for tools and context. The spec also formalizes an authorization story that leans on OAuth, and ships a companion security-best-practices document that acknowledges protocol-specific risks like confused deputy and supply-chain tampering.
Taken together, these updates confirm MCP’s trajectory: the protocol is stable enough to build on, and its authors are now explicitly addressing deployment concerns that enterprises care about. Nevertheless, the authorization text still punts important implementation details to integrators, which means operational security remains a shared responsibility between MCP hosts, clients, and servers.
An Academic Overview of MCP Security
With the ground truth established, the most comprehensive academic overview to date is Hou et al.’s landscape paper. The paper lays out MCP’s lifecycle — from server creation through operation and update — and frames risks and mitigations across those phases.
Its importance is organizational as much as technical: it gives security and platform teams a common taxonomy and a phased mental model that aligns nicely with how internal platforms onboard, run, and retire services. In production, that mapping is straightforward. “Creation” corresponds to server onboarding and code review; “operation” to runtime policy, auth, logging, and observability; “update” to supply-chain hygiene and change control. If your internal MCP program doesn’t yet mirror those life-cycle stages, this paper makes a strong case to adopt them.
Safety
Where Hou et al. provide the map, Radosevich and Halloran supply a crash test. Their “MCP Safety Audit” demonstrates concrete end-to-end compromise scenarios — malicious code execution, remote access, and credential theft — driven by tool misuse and hostile servers. More importantly, they release MCPSafetyScanner, an auditing tool that automatically probes a target server’s tools, synthesizes adversarial prompts, and emits a report.
For production teams, this is a pragmatic bridge from theory to CI/CD: treat MCP servers like you treat APIs and containers — scan them before admission to your catalog and on every update; fail the build when scanners find dangerous affordances; and gate high-risk tools behind stricter consent and policy. The research doesn’t eliminate runtime risk, but it gives you a repeatable control you can wire into the delivery pipeline today.
Scalability
Hasan et al. ask a different question: what does the open-source ecosystem of MCP servers look like at scale?
By analyzing 1,899 repositories, they identify both traditional vulnerabilities and MCP-specific issues such as tool poisoning, then quantify maintainability signals like code smells and bug patterns.
The scope matters because many enterprises will bootstrap their catalogs from community servers before curating in-house ones. The production mapping is therefore uncomfortable but useful: if you ingest third-party servers, you must assume a non-trivial percentage carry MCP-specific flaws your generic SAST/DAST won’t catch, and you must plan for ongoing hygiene, not just initial vetting. A catalog is not a store you browse; it is an attack surface you operate.
Preference Manipulation
Preference manipulation is the next frontier, and Wang et al. show why. Their “MCP Preference Manipulation Attack” (MPMA) work demonstrates that adversaries can bias tool selection by gaming server and tool metadata — sometimes blatantly, sometimes with stealth via genetic search over descriptions — to make a malicious server “win” in the agent’s ranking.
In production, this connects directly to developer experience and catalog governance. If your host uses agentic heuristics to choose among functionally similar tools, you must assume ranking is attackable.
Two immediate mitigations follow from the paper’s logic: first, constrain selection to a trusted sub-catalog by default, and second, decouple human-readable descriptions from the features your ranking model uses, or sign and pin metadata so policy, not text, drives eligibility.
The research exposes a hole in many pilots where “let the agent pick the best tool” reads as sophistication but is actually an invitation to manipulation.
Going Beyond MCP
Ferrag et al. broaden the lens beyond MCP while still landing on protocol-level risk. Their survey categorizes thirty-plus attack techniques across input manipulation, model compromise, system and privacy attacks, and protocol exploits — including explicit MCP vectors. The contribution is not a new exploit but a unifying threat model that lets teams prioritize across heterogeneous risks.
In real deployments, this translates into layered defenses that live in different places: retrieval hardening for RAG, provenance tracking for data and tool outputs, content-safety policies at the host boundary, and MCP-aware controls for tool invocation and sampling. The punchline is that MCP security cannot be carved out from “agent security” more broadly; you need a consistent threat model that spans them.
Attack Categorization and Defense Planning
Two additional strands round out the academic picture. Song et al. present the first systematic categorization of MCP-specific attacks, including tool-poisoning, puppet attacks, rug pulls via malicious updates, and exploitation through external resources referenced by servers. The mapping to production is direct: you need signed artifacts, version pinning, update review, and a policy engine that can disable or degrade tools when metadata or behavior drift.
In parallel, a set of defense-oriented efforts has emerged: Xing et al.’s MCP-Guard proposes a layered detection pipeline for adversarial prompts and tool interactions and introduces MCP-AttackBench to evaluate defenses.
These are early but useful building blocks for anyone building “guardrails as code” around MCP traffic and wanting a benchmark to measure regressions. Neither eliminates the need for identity and policy, but they nudge the ecosystem toward testable, automatable security baselines.
Building a Base for MCP System Evaluation
Not every paper is explicitly about breaking things. A growing body of work builds the scaffolding we will need to evaluate MCP systems realistically.
MCP-Zero tackles discovery and selection at scale and, in the process, constructs a dataset of hundreds of servers and thousands of tools. Other benchmarks like MCP-Universe and MCPSecBench assemble larger task suites and security-focused scenarios. These aren’t “security” papers in the narrow sense, but they matter because production risk often hides in long-tail compositions, not toy examples. If your architecture only passes green tests on hand-picked tools, you are not testing your architecture — you are testing your curation.
Industry research has also matured and, while not academic, is worth heeding. Red Hat’s guidance separates local and remote server risks and calls out confused-deputy scenarios, sampling abuse, and supply-chain controls, echoing the spec’s best-practices document and adding concrete operator advice. CyberArk’s deep dive frames threat modeling in terms that red teams can act on — particularly around composability and sampling mechanics that many pilots treat as benign.
These analyses are valuable not because they introduce new classes of bugs but because they show how to wire the academic findings into the runbooks security teams already use.
MCP Caveats
So what, concretely, remains unsolved?
First, authorization is specified but not yet standardized enough in practice to guarantee safe defaults. OAuth is the right backbone, but implementers still choose discovery, token scope, and client trust semantics, and those choices differ across hosts and servers. Until there is stronger convergence — ideally reference implementations with interoperable flows — enterprises must assume they will own the last mile of auth hardening, including proxies that enforce organization-level SSO, token mediation, and per-tool scopes. The official spec’s best practices help, but they cannot replace a policy layer that understands your tenants, your data, and your audit requirements.
Second, supply-chain integrity is acknowledged but not comprehensively addressed. The research on rug-pulls and tool poisoning makes it clear that signed artifacts and pinned versions should be table stakes, yet many current hosts happily pull a server by URL and treat its self-reported metadata as truth. We do not yet have a widely adopted standard for signed tool descriptors or server attestations that survive updates and support revocation. Until that exists, treat MCP servers like packages: require signatures, verify provenance, and gate updates through change control with automatic rollback when behavior drifts.
Third, ranking robustness and selection fairness are unsolved at an architectural level. Preference manipulation works because ranking is an emergent property of host heuristics and model suggestions. Hardening that loop will require two moves: constrain candidate sets via policy before the model opines, and minimize reliance on mutable, human-readable text as a ranking feature. The literature points to the problem; product teams will need to translate that into catalog design and host UI that favors explicit choice and consent over “smart” auto-selection.
Fourth, runtime guardrails are still early. MCP-Guard and similar efforts are promising, but they are classifiers living next to the system, not invariants enforced by it. We need protocol-level affordances that make safe behavior cheaper than unsafe behavior: declarative tool annotations that hosts must honor, standard policies for destructive actions, and first-class audit events that can be routed to SIEM without bespoke glue. The spec is moving in that direction with richer schemas and guidance, but the control plane story is still vendor- and deployer-defined.
Finally, measurement is incomplete. The emergence of larger benchmarks is encouraging, yet most fail to capture the messy compositions that break real systems: long-horizon tasks with cross-tenant data sensitivities, mixed trust domains, and chained tools where the third hop is where the secret leaks. Until we have benchmarks that encode those realities — and policies in hosts that can be tested against them — security posture will be hard to compare across products and hard to improve over time.
Where We Are Today
If there is a single takeaway from this literature, it is that MCP’s risks are not exotic; they are the predictable by-product of giving software the power to act.
The good news is that the research now gives us a shared language and a set of workable controls. The spec has matured, the community is building scanners and benchmarks, and industry teams are translating these insights into operator guidance.
The next phase is less about discovering new classes of bugs and more about standardizing the guardrails: interoperable authorization flows, signed and attestable servers, enforceable tool annotations, sane defaults in hosts, and benchmarks that punish unsafe behavior. Until then, treat MCP like any other powerful interface: assume it will do exactly what you wired it to do, instrument it so you can see that happen, and wrap it with the same identity, policy, and change management discipline you apply to your APIs and microservices. MCP makes agents useful; your platform makes them safe.
For more insights on MCP from Markus Mueller, read the previous blogs in this series:
“How to Use Model Context Protocol the Right Way”
“
MCP: What’s Changed, What Still Matters, What Comes Next.”
This blog was written with the assistance of AI.