There is a recurring debate in agent-design circles that goes roughly like this: why build all these MCP servers when you can just write a skill, a markdown instruction file that tells the agent everything it needs to know, saving 90% of your context tokens? It's a seductive argument. It sounds like engineering pragmatism. It is, in fact, a category error dressed up as optimization advice.

Let's be precise about what each mechanism actually does, where each breaks down, and why treating them as competitors reveals a fundamental misunderstanding of the context-engineering problem.

First lets talk about what are we talking about. An LLM skill, in the operational sense used by systems like Claude's Code or Web or various other agent frameworks, is a markdown file that injects structured instructions into the model's context at runtime. It tells the model how to behave, what tools to prefer, what patterns to follow, and sometimes pre-loads domain-specific knowledge, API conventions, library quirks, output schemas that would otherwise require several turns of exploration or failure to discover.

Skills work. When you know exactly what a task looks like, when the domain is well-understood and bounded, and when the instructions are stable across invocations, a well-written skill is extraordinarily effective. It collapses setup time, eliminates certain classes of model confusion, and produces more consistent outputs.

But notice the implicit preconditions embedded in that last paragraph: you know exactly what the task looks like. The domain is well-understood. The instructions are stable. These are not universal properties of agentic workloads. They are special cases.

The 90% Token Savings Claim Is Misleading

The argument that skills save 90% of context tokens usually rests on a comparison like this: "instead of having the agent make three tool calls to discover the schema, just put the schema in the skill file." This is true and useful in exactly one scenario when you already have the schema, it doesn't change, and every invocation needs it.

In practice, this framing quietly assumes away the hardest part of the problem: context that needs to be built, not encoded.

Consider the difference between these two tasks:

  1. "Analyze this ROOT file and produce a summary of the branch structure."
  2. "Find the most recent LHCb simulation request on GitLab for the BnoC working group and tell me its status."

The first task has a known shape. You could write a skill for it: instruct the model on ROOT file conventions, uproot idioms, what a good branch summary looks like. That skill would genuinely compress context and reduce noise.

The second task cannot be encoded in a skill file because its answer does not exist until runtime. The relevant context, which MR, what its current status is, what comments have been left, what pipeline stage it's in is discovered, not pre-known. No amount of markdown instructions substitutes for a tool that actually queries the GitLab API and returns live data. The skill tells the model how to reason. The MCP tool gives the model something to reason about. This is the core asymmetry that the "just use a skill" argument elides.

MCP Tools Are a Context-Building Mechanism, Not a Behavior-Encoding Mechanism

The Model Context Protocol is architecturally oriented around a different problem than skills. MCP servers expose tools that the agent can invoke to retrieve, filter, and assemble context dynamically. The emphasis is on discovery, finding information whose existence, structure, or current value could not have been anticipated at system-design time.

A well-designed MCP server is essentially a context faucet. The agent doesn't know in advance what it will need; it queries the server, inspects the response, decides what's relevant, and proceeds. This is fundamentally an active, runtime-dependent process. The agent is not a passive recipient of pre-loaded instructions; it is an active participant in constructing the context it needs.

This is why comparing MCP tools to skills as substitutes is like comparing a database to a config file. Both store information. The use cases are almost entirely non-overlapping.

Beyond the philosophical mismatch, skills have practical failure modes that advocates underemphasize.

Staleness. A skill that encodes API conventions is correct until the API changes. Skills require active maintenance. In rapidly evolving codebases or external services, the skill becomes a liability the moment its content diverges from ground truth. MCP tools query live systems and are structurally immune to this class of failure.

Authorship bottleneck. To write a skill, you must already understand the domain well enough to encode it. For novel tasks, exploratory analyses, or unfamiliar systems, you don't have this knowledge. You need the agent to discover it. Skills require a human SME investment upfront that is often precisely what you're trying to offload to the agent in the first place.

Context inflation under generalization. The temptation, once you've bought into the skills-as-optimization frame, is to write increasingly comprehensive skills that cover more edge cases. This is the opposite of the promised token savings. Comprehensive skills balloon. They introduce ambiguity as instructions conflict. They create a maintenance surface that grows superlinearly with coverage.

Overfitting to anticipated tasks. Skills optimize for the tasks you predicted. Agentic systems are often deployed precisely because the task space is too large or dynamic to predict exhaustively. A skill-heavy architecture implicitly re-centralizes the knowledge that distribution was supposed to eliminate.

The Real Case for MCP Is Not Token Efficiency

Proponents of MCP tools sometimes make a tactical mistake by competing on the token-efficiency axis. That's a losing argument because skills, in their narrow domain of applicability, genuinely do use fewer tokens. The right argument is structural.

MCP tools solve problems that token efficiency doesn't touch:

The token argument is also somewhat moot as context windows expand. What doesn't become moot is the fundamental question of whether the information the agent needs exists anywhere at authoring time. If it doesn't, no skill will supply it. None of this is a case against skills. It is a case for accurate categorization.

Skills are the right tool when:

In these cases, a skill is not just efficient; it is the correct abstraction. Asking an MCP server to answer "what's the idiomatic way to write a RooFit PDF in this codebase" is misusing the tool. That's a skills job.

The productive framing is not "skills vs MCP" but "skills and MCP, applied to their respective domains." A mature agent architecture typically looks something like this:

These two mechanisms compose naturally. A skill might instruct the agent on how to interpret ROOT file structures; an MCP tool provides the actual file to interpret. The skill is the interpreter; the tool is the input.

Treating them as substitutes is not just technically wrong, it leads to bad architectural decisions. Teams that go all-in on skills end up with brittle, high-maintenance instruction sets that can't adapt to live data. Teams that go all-in on MCP without any behavioral guidance end up with agents that know what data to fetch but don't know what to do with it.

The "just use a skill" argument fails not because skills are bad but because it misunderstands what skills are for. Skills encode what you already know. MCP tools discover what you don't. Most non-trivial agentic tasks require both.

The token savings framing is particularly worth resisting. It frames the problem as one of compression when the real challenge is one of knowledge availability. You cannot compress information that doesn't exist yet at system design time. And in production agentic systems, a significant fraction of the most important context, live state, external data, dynamic artifacts is exactly that kind of information.

Build your skills carefully, for the domains where they belong. Build your MCP servers for the rest. Stop asking which one you need. Start asking which problem each one solves.