We Published the First Open Threat Taxonomy for AI Agents
We Published the First Open Threat Taxonomy for AI Agents
The MCP ecosystem grew from 714 servers to over 16,000 in twelve months — a 2,100% increase. Security didn't keep up. Two-thirds of MCP servers have at least one security finding. 82% are vulnerable to path traversal. Critical RCE vulnerabilities (CVSS 9.6) have been disclosed in core MCP tooling.
And until now, there was no standardized way to talk about it.
MITRE ATT&CK doesn't cover tool poisoning. OWASP's new agentic applications list is a good start but stays at the category level. When your security team sits down to threat-model an agent deployment, there's no checklist, no shared vocabulary, no way to measure coverage against known attack patterns.
Today we're changing that. We're publishing the Navil Threat Catalog: 11 attack classes, 30 detection categories, and 200+ base attack vectors targeting AI agents and MCP integrations. It's open data, licensed under CC BY-SA 4.0, and we're inviting the community to contribute.
Repo: github.com/navilai/navil-threat-catalog
Why agent threats need their own taxonomy
Traditional security frameworks were built for a world where software does exactly what it's told. AI agents break that assumption. They interpret instructions, choose tools dynamically, and maintain state across long conversations. This creates attack surfaces that have no equivalent in traditional software:
Tool poisoning — malicious instructions embedded in MCP tool descriptions. Unlike prompt injection, tool poisoning is invisible to the user and persists across sessions. Invariant Labs formalized this attack class in 2025 and it's already widespread.
Credential exfiltration via natural language — an agent can be tricked into revealing API keys, tokens, or database credentials through conversational manipulation. No code execution required. The attack happens in plain English.
Multi-agent collusion — when multiple agents coordinate, each individual action may appear benign. The security violation only becomes visible at the system level. No existing monitoring framework handles this.
These aren't theoretical. The Postmark supply chain attack, the GitHub issue prompt injection, and the NeighborJack 0.0.0.0 exposure all happened in 2025. Real attacks, real data loss, real consequences.
The 11 attack classes
We analyzed every published MCP CVE, real-world incident, and security research paper from 2024–2026 and identified 11 distinct attack classes:
01. Multi-Modal Smuggling
Embedding malicious instructions in images, audio, PDFs, or other documents that agents process. The instructions are invisible to the user but readable by the model. 15 base vectors.
Example: A PDF uploaded to a RAG pipeline contains white-on-white text with instructions to exfiltrate the next query's results to an external URL.
02. Handshake Hijacking
Intercepting or manipulating the MCP client-server connection during initialization. Includes MITM attacks on the stdio/SSE transport, malicious server redirects, and capability downgrade attacks. 18 base vectors.
Example: CVE-2025-6514 (CVSS 9.6) — the mcp-remote project allowed arbitrary command execution when connecting to untrusted servers.
03. RAG / Memory Poisoning
Corrupting retrieval-augmented generation data sources to influence agent behavior. Includes document injection, embedding manipulation, and context window overflow. 20 base vectors.
Example: An attacker adds a document to a shared knowledge base containing instructions that override the agent's system prompt when retrieved.
04. Supply Chain / Discovery
Malicious MCP servers in registries, typosquatting, impersonation, and dependency confusion. The MCP equivalent of npm supply chain attacks. 20 base vectors.
Example: The Postmark attack — a malicious MCP server on npm impersonated Postmark's email API. It actually sent emails (passed all tests) but BCC'd every message to the attackers.
05. Privilege Escalation
Exploiting over-scoped tool permissions to access unauthorized resources. Includes confused deputy attacks, scope creep, and permission inheritance exploitation. 20 base vectors.
Example: A reconciliation agent at a financial firm was tricked into exporting "all customer records matching pattern X" — 45,000 records exfiltrated via over-scoped database permissions.
06. Anti-Forensics
Techniques to evade logging, monitoring, and audit trails in agent environments. Includes log injection, trace manipulation, and output sanitization. 15 base vectors.
Example: Injecting ANSI escape codes into tool responses that manipulate terminal output, hiding malicious actions from developers reviewing logs.
07. Agent Collusion & Multi-Agent
Multi-agent coordination to bypass security controls that only monitor individual agents. Includes covert channels between agents, distributed action splitting, and consensus manipulation. 20 base vectors.
Example: Agent A requests seemingly innocuous data. Agent B requests a different piece of data. Together, the two datasets constitute a privacy violation that neither agent's individual actions would trigger.
08. Cognitive Architecture Exploitation
Attacking the reasoning layer: prompt injection at the tool call level, context window manipulation, goal hijacking, and attention steering. The most studied class, but still evolving rapidly. 20 base vectors.
Example: The GitHub issue attack — malicious instructions embedded in public GitHub issues hijacked AI assistants that developers asked to review open issues.
09. Temporal & Stateful Attacks
Exploiting conversation history, session state, and long-running agent memory. Includes progressive prompt injection (building up context over multiple turns), state confusion, and memory corruption. 20 base vectors.
Example: Over 50 conversation turns, an attacker gradually shifts an agent's understanding of its permissions, eventually getting it to perform actions it would have refused in turn 1.
10. Output Manipulation & Weaponization
Altering agent outputs to deliver misinformation, malicious code, or weaponized content. Includes code injection in generated outputs, citation manipulation, and confidence calibration attacks. 17 base vectors.
Example: An agent is manipulated into generating code that includes a subtle backdoor — syntactically valid, passes tests, but contains an exploitable vulnerability.
11. Infrastructure & Runtime Attacks
Attacking the execution environment itself: 0.0.0.0 binding without auth, resource exhaustion, container escape, and denial of service. 15 base vectors.
Example: NeighborJack — hundreds of MCP servers bound to 0.0.0.0 by default, exploitable from any device on the same network with a single curl command.
What surprised us
After six months of cataloging these vectors, three things stood out:
Supply chain is the biggest real-world threat. Everyone talks about prompt injection, but the attacks that have actually caused damage in production are supply chain attacks — fake MCP servers, typosquatted packages, and impersonation. Class 04 has the highest ratio of real-world incidents to research papers.
Temporal attacks are vastly underrated. Almost no monitoring framework tracks how an agent's behavior evolves across a long conversation. Progressive context manipulation — slowly shifting an agent's beliefs over many turns — is trivially easy and almost never detected.
Agent collusion has zero defense frameworks. Multi-agent systems are the next wave of enterprise AI deployment, but there is no inter-agent trust model, no cross-agent anomaly detection, and no way to detect distributed action splitting. This is the class we're most concerned about for 2027.
How to use the taxonomy
The catalog is designed to be practical, not academic. Here's how teams are using it:
Threat modeling. Use the 11 classes as your checklist when designing agent architectures. For each class, ask: "How exposed are we? What controls do we have?"
Pen testing. Use the 200 base vectors as test cases. Each vector includes a description, preconditions, expected behavior, and detection guidance.
Coverage scoring. Run navil test to automatically execute all 200 vectors against your proxy configuration. You get a real number — "84.7% of known patterns blocked" — with gaps broken down by category.
pip install navil && navil test --pool defaultCI/CD integration. Add navil test --threshold 90 to your pipeline. If coverage drops below 90%, the build fails. Security coverage becomes a measurable, enforceable standard.
Contributing
The taxonomy is open data (CC BY-SA 4.0). We actively want contributions, especially in three areas:
- Classes 07–09 (Agent Collusion, Cognitive Architecture, Temporal Attacks) — the research is thinnest here and real-world examples are emerging faster than we can catalog them.
- Industry-specific vectors — healthcare, financial services, and legal sectors have unique attack surfaces we haven't fully mapped.
- Novel attack patterns — if you've discovered a vector that doesn't fit any of the 11 classes, we want to hear about it. Open an issue or PR.
Repo: github.com/navilai/navil-threat-catalog
What's next
The catalog is a living document. New vectors are added weekly as we discover them through Navil's community threat network — every Navil proxy that detects an anomaly contributes anonymized signal that strengthens the catalog for everyone.
We're also working on:
- Combinatorial expansion — the 200 base vectors combine across industry, modality, and attacker profile to generate 1M+ test scenarios. We'll publish the expansion methodology next month.
- Detection rule mappings — for each vector, the specific detection logic that catches it. This bridges the gap between "knowing the threat" and "stopping the threat."
- Coverage benchmarks — anonymous, aggregate coverage scores across the Navil network. "The average deployment blocks 73% of known vectors. Here's where the gaps cluster."
The agent era needs its own security language. This is our contribution to building it.
Run your coverage score: pip install navil && navil test
Star the catalog: github.com/navilai/navil-threat-catalog
Get your coverage score
See how well your AI agents are protected against known threats.