ai-coding

MCP Security: The Risks of Model Context Protocol and How to Govern It (2026)

PrivSec LabJune 24, 20265 min read

Source code in a dark editor, an illustration of the code an AI agent reads and runs through connected tools

MCP lets AI agents plug into your tools and data through one open interface - and that connection is the attack surface. The real MCP security risks in 2026 (tool poisoning, rug pulls, cross-server attacks) and how to govern MCP servers safely.

The Model Context Protocol (MCP) is the open standard that lets an AI agent plug into your tools, files, and apps through one common interface - often described as "USB-C for AI." It is genuinely useful, and through 2025 and 2026 it has been adopted across the best AI coding assistants, IDEs, and agent frameworks. But the same connector that makes an agent powerful is also its biggest attack surface. Recent moves toward governing AI agents in the enterprise - security vendors shipping tools to monitor coding agents, and MCP-based governance layers landing inside Claude, ChatGPT, and Copilot - are a sign of the same thing: connecting an agent to your environment is a security decision, not a convenience setting. Here is the honest picture of MCP security in 2026 and how to govern it.

Why MCP is a security problem, not just a feature

MCP itself is just plumbing: a standard way for a model to discover tools, read their descriptions, and call them. The risk isn't the protocol - it's what flows through it.

When an agent connects to an MCP server, that server provides two things the model trusts: tool descriptions (text telling the model what each tool does and how to call it) and tool outputs (whatever the tool returns). The model reads both and acts on them. So every MCP server you attach is effectively code and instructions running with your agent's privileges. Whatever the agent can reach - your files, a repository, an API, your email - a malicious server can try to reach through the agent.

This is the same shift that makes AI agent security hard in general, applied to a specific connector: the security of your MCP setup is the security of every server you plug into it.

The MCP-specific risks in 2026

These aren't hypothetical - security researchers have documented them on real MCP clients.

Tool poisoning. A malicious server hides instructions inside a tool's description - text the model reads but the user usually doesn't. A tool that looks like a harmless add(a, b) can secretly instruct the agent to read private files and exfiltrate them. Because users tend to approve tool calls without inspecting the description, this is one of the most impactful MCP-specific attacks.
Rug pulls (silent redefinition). An MCP tool changes its own definition after you've installed and approved it. You vetted something safe; the server later swaps in malicious behaviour without telling you.
Tool shadowing and cross-server attacks. When several servers connect to the same agent, a compromised one can override or intercept calls meant for a trusted tool - a "confused deputy" problem where the agent does the attacker's bidding while thinking it's using a legitimate tool.
The exfiltration trifecta. The genuinely dangerous combination is an agent that has private data, reads untrusted content, and has an exfiltration path to the outside. MCP makes all three easy to wire together by accident.
Indirect prompt injection. Even an honest server returns outputs the agent reads - a web page, an issue, a document - that may contain hidden instructions. The agent can obey them as if they came from you.

A close-up of a laptop's side showing a USB port and an SD card slot, illustrating MCP as a universal connector for plugging tools into an AI

How to govern MCP safely

You don't need to avoid MCP. You need to govern what you connect and box it in so a single bad server can't become a disaster. The principles are old security wisdom applied to a new connector.

Vet and pin trusted servers. Prefer official or well-reviewed MCP servers. Don't attach arbitrary third-party servers to an agent that holds real access, and watch for tool definitions that change after install.
Least privilege per server. Give each server only the access its job needs, using scoped, revocable credentials - never your primary accounts or production keys. If a server only needs to read, don't let it write.
Limit the blast radius. Avoid connecting many untrusted servers to the same agent, since one compromised server can intercept others. Isolate sensitive work from anything that reads the open web.
Human in the loop for high-impact actions. Require explicit confirmation before anything irreversible - sending money, deleting data, posting publicly, changing access. Let the agent draft; you approve.
Treat tool descriptions and outputs as untrusted. Both can carry injected instructions. The same caution applies when an agent uses AI code review tools or any tool that ingests external content.
Log and audit tool calls. Keep a record of which servers and tools the agent used, so you can spot anomalies and revoke fast.
Keep secrets out of prompts and tool arguments. Passwords and API keys pasted into a prompt or a tool call become text on a server. Use scoped tokens and secret managers instead.

The honest takeaway

MCP security comes down to one mindset shift: an MCP server is not a plugin you install and forget - it's a new participant with autonomy and access, and you should treat it like one you don't fully trust. The protocol is open and useful; the danger is in granting broad, standing trust to servers you haven't vetted. Connect deliberately, scope every server tightly, keep a human gate on anything irreversible, and assume every tool description and output could be trying to hijack your agent. The teams now building governance around AI agents - and the AI coding agents that lean on MCP most - are converging on exactly that: connect less, trust narrowly, and verify.

Image: Pixabay (source)

Also available in

FR ES DE IT PT

FAQ

What is MCP security?

MCP security is the practice of safely connecting AI models and agents to external tools and data through the Model Context Protocol - an open standard introduced by Anthropic in late 2024, often described as 'USB-C for AI'. MCP itself is just a connector: the security question is what you plug into it and how much you trust it. Each MCP server an agent connects to is code and instructions running with the agent's access, so a malicious or compromised server can read your data, call other tools, or take actions on your behalf. MCP security means vetting servers, scoping permissions tightly, and treating tool descriptions and outputs as untrusted input.

What is tool poisoning in MCP?

Tool poisoning is when a malicious MCP server hides instructions inside a tool's description or metadata - text the model reads but the user usually doesn't. The model treats those hidden instructions as commands, so a tool that looks like a harmless 'add two numbers' function can secretly tell the agent to read private files and send them somewhere. Security researchers have documented this as one of the most impactful MCP-specific risks, because users tend to approve tool calls without inspecting the underlying descriptions.

What is an MCP rug pull?

A rug pull, also called silent redefinition, is when an MCP tool changes its own definition after you've already installed and trusted it. You approve a tool that looks safe, and later the server quietly swaps in malicious instructions without notifying you. A related attack is tool shadowing, where a malicious server overrides or intercepts calls meant for a trusted tool. Both exploit the fact that trust granted once is rarely re-checked, which is why monitoring tool definitions for changes matters.

Is MCP safe to use?

MCP is broadly safe for everyday use if you connect only to servers you trust and scope their access tightly, but it is not safe to wire up arbitrary third-party servers with broad permissions and walk away. The protocol is an open connector, so its safety depends entirely on the servers you attach and the access you grant them. Use official or well-reviewed servers, give each one separate revocable credentials instead of your main accounts, keep a human in the loop for high-impact actions, and review what tools can do before approving them.

How do I secure MCP servers?

Apply least privilege: give each MCP server only the access its job needs, using scoped, revocable tokens rather than admin keys or your primary accounts. Vet and pin trusted servers, prefer official ones, and watch for tool definitions that change after install. Treat tool descriptions and tool outputs as untrusted content that may contain injected instructions. Avoid connecting many untrusted servers to the same agent, since one compromised server can intercept others. Log tool calls so you can audit and revoke, and keep secrets out of prompts and tool arguments.

Related research

Blurred CSS source code on a screen, streaked in blue and purple

ai-coding

NVIDIA Ships a Security Scanner for AI Agent Skills. It Does Not Close the Hole

SkillSpector reads an agent skill before you install it: 68 patterns across 17 categories, plus an optional LLM pass. What it checks, how to run it, and why scanner-evasion research means it cannot be your only control.

PrivSec Lab·Aug 3, 2026·4 min read

A wooden library card catalogue with one drawer pulled open, its index cards packed upright

ai-coding

Agent Memory Is Not Memory: What MCP Resources Actually Do (2026)

What people call agent memory is context the host application chose to hand over. The MCP specification is explicit that resources are application-driven, and that single design decision explains most of what agents can and cannot remember.

PrivSec Lab·Aug 2, 2026·5 min read

A man sitting in a dimly lit control room looking at his phone in front of several surveillance monitors

ai-coding

Perplexity Open-Sourced Numbat: What It Watches, and What It Says It Cannot Prove

Numbat gives endpoint visibility into AI coding agent activity, with optional pre-action blocking. Its own README is unusually clear about the limits, and those limits are the useful part.

PrivSec Lab·Jul 31, 2026·4 min read