How Navil Cuts 94% of Your MCP Token Costs Without Breaking Anything
How Navil Cuts 94% of Your MCP Token Costs Without Breaking Anything
If you're running AI agents with MCP, you have a billing problem you probably haven't noticed yet.
Every time your agent starts a session, it calls tools/list and gets back the full schema for every tool on every connected MCP server. GitHub MCP alone returns 90+ tool definitions. That's over 50,000 tokens — loaded into context before your agent does a single useful thing.
Most agents use 3–5 tools per task. You're paying for the other 85.
This isn't a theoretical problem. At current API pricing, an agent that runs 100 sessions a day is burning through millions of tokens on tool schemas alone. Multiply that across a team, and you're looking at real money spent on JSON definitions your agent never reads.
Navil's context-aware tool scoping fixes this. But every time we mention "94% token reduction," the first question is always the same:
"Wait — if you hide 94% of the tools, doesn't the agent break?"
No. Here's why.
What actually consumes tokens
There's a critical distinction that's easy to miss: tool schemas and tool execution are completely separate.
When your agent calls tools/list, the MCP server returns a JSON schema for every available tool — the name, description, parameter definitions, and type annotations. These schemas are what get loaded into the agent's context window. They're what consume tokens. And they're what the agent uses to decide which tools to call.
Tool execution is different. When the agent actually calls get_pull_request, that's a direct function call to the MCP server. The server processes it and returns the result. The schema isn't involved anymore.
Navil sits between these two steps:
WITHOUT SCOPING WITH SCOPING
━━━━━━━━━━━━━━━ ━━━━━━━━━━━━
Agent calls tools/list Agent calls tools/list
↓ ↓
MCP returns 90 tool schemas Navil proxy intercepts
(50,000 tokens) Filters to 3 tool schemas
↓ (2,800 tokens)
Agent processes all 90 ↓
Most are irrelevant Agent sees only what it needs
↓ ↓
Agent calls get_pull_request Agent calls get_pull_request
(same tool, same result) (same tool, same result)
The tool calls work identically. The agent calls the same tools, gets the same results, does the same work. It just doesn't spend 50,000 tokens reading about tools it was never going to use.
This is filtering, not blocking
We're not disabling tools. We're not removing capabilities. We're filtering the tools/list response so the agent only sees schemas relevant to its task.
Think of it like a restaurant menu. A 200-page menu with every dish the kitchen can make doesn't help you order lunch faster. A single page with today's specials does. The kitchen can still make everything — you're just not reading about the entire menu every time you sit down.
When Navil scopes tools:
- Tool schemas (the JSON definitions) get filtered — this is where the token savings come from
- Tool execution is completely unaffected — if the agent calls a tool it's scoped to, the call goes through normally to the MCP server
- Out-of-scope calls get a clear error:
"Tool not in scope for this agent. Update your policy.yaml to add it."
The agent can still do everything it needs to do. It just doesn't see tools it doesn't use.
"What if I scope too aggressively?"
Fair question. If you remove a tool the agent actually needs, the call fails. But this is a managed risk, not an uncontrolled one:
1. The default is wide open.
Out of the box, Navil's scope is allow: "*" — every tool is visible. You opt into restriction. Nothing changes unless you choose to change it.
2. AI suggests the right scope.
navil policy suggest watches how your agent actually uses tools over time and recommends scopes based on observed behavior. It only suggests removing tools the agent has never called.
$ navil policy suggest
Based on 7 days of observed behavior:
✓ Agent "code-review" uses 3 of 90 GitHub tools → scope to 3
✓ Agent "deploy" calls shell_exec 12x/day → rate limit to 20/day
⚠ Agent "research" accessed .env 2x → recommend deny rule
Apply all? [y/N]3. Failures are loud, not silent. If an agent calls an out-of-scope tool, it gets an explicit error message — not a silent failure, not a hallucination, not a retry loop. The error tells the agent (and you) exactly what happened and how to fix it.
This is the same tradeoff as a firewall. You could block everything and nothing works. You could allow everything and you're exposed. The value is in blocking what's unnecessary while keeping what's needed. Navil's policy engine makes this automatic.
How to set it up
Start with observation, then scope:
# Step 1: Install and wrap your MCP config (no scoping yet)
pip install navil && navil wrap mcp_config.json
# Step 2: Let Navil observe for a few days
# It watches which tools each agent actually calls
# Step 3: Get AI-generated scope recommendations
navil policy suggest
# Step 4: Apply the suggestions you agree with
navil policy applyOr if you already know which tools an agent needs:
# Scope directly
navil scope set code-review --tools "get_pull_request,create_review,list_files"Either way, you can adjust scopes at any time. Add a tool, remove a tool, reset to allow: "*" — it's a YAML file, not a commitment.
The security bonus
Token savings is the immediate benefit you feel in your wallet. But there's a security benefit that matters more in the long run.
Every tool schema in context is attack surface. A tool description can contain malicious instructions (tool poisoning). The more schemas your agent processes, the more opportunities for an attacker to inject instructions that redirect agent behavior.
When you scope from 90 tools to 3, you're not just saving tokens — you're reducing the attack surface by 97%. An agent that can't see a tool can't be poisoned by its description.
This is why scoping and security are the same feature in Navil. Fewer schemas means lower cost AND lower risk.
The numbers
We benchmarked against GitHub MCP (one of the most popular MCP servers):
| Metric | Without scoping | With scoping | Reduction | |--------|----------------|--------------|----------| | Tool schemas loaded | 90 | 3 | 97% | | Schema tokens | ~50,000 | ~2,800 | 94% | | Context window used | ~25% | ~1.4% | 94% | | Time to first tool call | Slower (processes 90 schemas) | Faster (processes 3) | Significant |
And that's just one MCP server. If you're connected to GitHub, Slack, a database, and a file system — you're loading hundreds of tool schemas per session. The savings compound.
Try it
pip install navil && navil wrap mcp_config.jsonStart with allow: "*", let navil policy suggest tell you what to scope, and watch your token usage drop.
Your agents do the same work. Your bill doesn't.
Have questions about scoping? Open an issue at github.com/navilai/navil or check the scoping docs.
Get your coverage score
See how well your AI agents are protected against known threats.