On March 11, 2026, Perplexity CTO Denis Yarats stood on stage at the company’s Ask 2026 developer conference and announced that Perplexity was moving away from MCP internally. The irony was immediate: Perplexity still has an official MCP server listed on their docs site. The company that built an MCP integration was publicly walking away from MCP at its own conference, replacing it with a direct Agent API.
Perplexity is not alone. Cloudflare replaced MCP tool-calling with code generation. Y Combinator CEO Garry Tan got so frustrated he built a CLI alternative. Google Workspace quietly dropped MCP support in v0.8.0. Pieter Levels declared MCP dead. The protocol that was supposed to be the USB-C of AI agents is facing its first serious backlash, and the criticisms are technical, specific, and backed by production data.
The Token Tax: MCP’s Context Bloat Problem
The core complaint is simple math. Every MCP tool sends its complete schema, parameter definitions, and description to the LLM on every request. More tools means more tokens consumed before the agent processes a single user query.
The numbers are brutal. A MySQL MCP server with 106 tools generates 207KB of schema data, roughly 54,600 tokens on initialization. Five MCP servers with 30 tools each consume 30,000 to 60,000 tokens just for metadata, which is 25-30% of a typical context window before the user’s query is even processed.
Anthropic themselves reported seeing setups where tool definitions alone consumed 134K tokens, roughly half of Claude’s entire context window. Users on Reddit have documented specific servers: Linear’s MCP sends 23 tools at roughly 12,935 tokens, JetBrains sends 20 tools at 12,252 tokens, and Playwright sends 21 tools at 9,804 tokens.
Why This Matters More Than It Sounds
Token consumption is not just a cost problem. Research consistently shows that LLM reliability negatively correlates with instructional context volume. More tool descriptions in the prompt means worse tool selection accuracy. Bill Prin’s analysis found that just two MCP tools consume 20% of the context window before any actual work begins. Tau-Bench testing showed Claude 3.7 Sonnet achieving only 16% task completion on airline booking scenarios using MCP tools, a rate that would be unacceptable in any production system.
Platforms have responded with hard limits. Cursor caps at roughly 80 tools, OpenAI at 128, Claude at 120. These limits exist specifically because tool overload degrades performance. If your agent needs access to multiple services, you hit those ceilings fast.
Perplexity’s Exit: From MCP Server to Agent API
Perplexity’s move is the highest-profile defection so far. Denis Yarats cited two specific problems: context overhead from tool definitions eating into the model’s working memory, and authentication friction from MCP’s decentralized auth model creating integration headaches.
Their replacement is the Perplexity Agent API: a single endpoint at POST https://api.perplexity.ai/v1/agent that supports six model providers (OpenAI, Anthropic, Google, xAI, NVIDIA, Perplexity), with built-in tools for web search ($0.005/call), URL fetch ($0.0005/call), and function calling (free). It is OpenAI SDK-compatible, meaning developers can swap in Perplexity’s search capabilities with a one-line change.
The strategic signal matters more than the technical details. Perplexity built an MCP server, adopted the standard, used it in production, and then publicly abandoned it at their own developer conference. That is not a company that did not try MCP. That is a company that tried it and decided the trade-offs were not worth it.
Cloudflare’s Code Mode: 99.9% Fewer Tokens
Cloudflare’s response is even more radical. Their platform has over 2,500 API endpoints. Representing all of them as MCP tools would consume 1.17 million tokens. Instead, they built “Code Mode”: the agent gets just two tools, search() and execute(), that accept JavaScript code. The agent writes code that calls Cloudflare’s APIs directly, consuming roughly 1,000 tokens total.
That is a 99.9% reduction. Cloudflare kept MCP for discovery (finding which APIs exist) but replaced the tool-calling mechanism entirely with code generation. Theo Browne, who has 480,000 YouTube subscribers covering developer tools, summarized the irony: “The creators of MCP are telling us writing TypeScript code is 99% more effective than their spec.”
The Security Reckoning
Context bloat is annoying. Security vulnerabilities are dangerous. MCP has both.
In January 2026, three CVEs were disclosed in Anthropic’s own Git MCP server: CVE-2025-68143, CVE-2025-68144, and CVE-2025-68145. These were not theoretical risks. They enabled code execution, arbitrary file deletion, and loading unauthorized files into AI context. They worked out of the box, requiring no credentials. Anthropic accepted the reports in September 2024 and took until December 2025 to patch them.
Palo Alto’s Unit 42 research team identified three attack vectors via MCP sampling: resource theft, conversation hijacking, and covert tool invocation. They built a proof-of-concept malicious code summarizer that triggered unauthorized writeFile operations. Security researcher Johann Rehberger noted that most deployed MCP agents have the “lethal trifecta” of tool access, data access, and an exfiltration path, making them vulnerable by default.
Six Structural Flaws
A Scalifi AI analysis identified six design-level security problems that patches cannot fully address:
- No enforced authentication. MCP’s security model is opt-in. If a server does not implement auth, there is no protocol-level enforcement.
- Session IDs leaked in URL query strings. No message signing means sessions can be hijacked.
- Dynamic tool manipulation. Servers can redefine tool names and descriptions after the user grants permission, enabling “rug pull” attacks.
- Shared context space. Multiple tools share the same context, enabling remote poisoning across tool boundaries.
- No risk categorization. A tool that reads a file and a tool that deletes a database have the same permission model.
- Stateful design. MCP’s session-based architecture complicates integration with REST-based infrastructure.
Over 8,000 MCP servers are currently exposed to the public internet with these structural vulnerabilities. And 42% of AI projects reportedly fail during MCP implementation, with security complexity cited as a major factor.
What Developers Are Using Instead
The backlash has produced three distinct alternative approaches.
Direct APIs and CLIs
Perplexity’s Agent API, Garry Tan’s gstack (8 specialized Claude Code skills instead of MCP), and OpenClaw’s deliberate “bash over MCP” architecture all represent a return to direct integration. The argument: well-documented REST APIs and CLI tools already exist, LLMs are now capable enough to use them directly, and the protocol layer adds overhead without sufficient value.
Mario Zechner’s approach is even simpler: markdown files with script examples. No protocol, no schema, no server processes. Just documentation the LLM can read.
Code Generation Over Tool-Calling
Cloudflare’s Code Mode is the template here. Instead of giving the agent 2,500 individual tools, give it a code execution sandbox and API documentation. The agent writes code that calls the APIs directly. Steve Krouse captured the historical irony: “We started with OpenAPI specs, abandoned them for MCP because LLMs couldn’t handle them, and now that LLMs can handle specs again…”
This approach scales better because the token cost does not grow linearly with the number of available APIs. The agent only retrieves documentation for the specific endpoints it needs.
Lazy Tool Hydration
For teams that still want MCP’s standardization benefits, lazy tool hydration offers a middle path. Instead of sending full schemas for all tools upfront, you send a minimal manifest with just names, categories, and summaries at roughly 4,900 tokens. When the agent selects a tool, you hydrate the full schema on demand at roughly 400 tokens per tool. This achieves a 91% reduction while keeping MCP’s standardized interface.
Anthropic’s own Claude Code now uses a Tool Search mechanism that activates when tool descriptions would consume more than 10% of context, achieving an 85% token reduction for large tool libraries.
The Counter-Argument: Why MCP Might Survive Anyway
Not everyone is ready to write the obituary. Charles Chen’s analysis, “MCP is Dead; Long Live MCP,” argues that what is dying is MCP as a general-purpose API wrapper in local stdio contexts. What is surviving is MCP over HTTP as enterprise infrastructure for centralized agent tooling: credential management, standardized telemetry, OAuth-based access control, and organization-wide knowledge delivery.
Matthew Hall compares the current moment to 1999-era REST critiques before REST superseded SOAP. The MCP spec only launched in November 2025. Sixteen months is not enough time to judge a protocol that the Linux Foundation now maintains and that still has 97 million monthly SDK downloads and 10,000+ servers.
Speakeasy’s defense is more pragmatic: MCP’s security problems mirror npm and pip risks. Running untrusted code has always been dangerous. The spec now includes readOnlyHint and destructiveHint metadata. The ecosystem is immature, not fundamentally broken.
The strongest argument for MCP is the absence of a single replacement. Perplexity built a proprietary API. Cloudflare built a custom code execution layer. Garry Tan built CLI skills. None of these are interoperable standards. If MCP dies, the agent ecosystem fragments into vendor-specific integrations, which is the exact problem MCP was created to solve.
Whether that matters depends on whether you believe the protocol overhead is a temporary tax that will shrink as the spec matures, or a structural flaw that cannot be fixed without abandoning the architecture. Based on the production data, both answers have evidence behind them.
Frequently Asked Questions
Why is Perplexity moving away from MCP?
Perplexity CTO Denis Yarats announced at the Ask 2026 conference that the company is abandoning MCP internally due to context overhead from tool definitions consuming working memory and authentication friction from MCP’s decentralized auth model. They replaced it with a direct Agent API endpoint.
How much context do MCP tools consume?
MCP tool definitions consume significant context window space. A MySQL MCP server with 106 tools uses roughly 54,600 tokens. Five MCP servers with 30 tools each can consume 30,000 to 60,000 tokens (25-30% of a context window) just for metadata, before the agent processes any user query.
What is Cloudflare’s Code Mode alternative to MCP?
Cloudflare’s Code Mode replaces individual MCP tool-calling with code generation. Instead of representing 2,500+ API endpoints as separate tools (1.17 million tokens), agents get two tools that accept JavaScript code, consuming roughly 1,000 tokens total. This achieves a 99.9% reduction in token usage.
Is the MCP protocol dead?
MCP is not dead but is facing serious criticism. It still has 97 million monthly SDK downloads and 10,000+ servers. However, high-profile departures from Perplexity, Cloudflare, and others suggest the protocol may survive primarily as enterprise HTTP infrastructure rather than as a general-purpose tool integration layer.
What are the main security vulnerabilities in MCP?
MCP has structural security issues including no enforced authentication (opt-in only), session IDs leaked in URLs, dynamic tool manipulation enabling rug pull attacks, shared context enabling cross-tool poisoning, and no risk categorization between read-only and destructive tools. Three CVEs in Anthropic’s own Git MCP server demonstrated these risks in practice.
