Navil Watches Your Agents for a Week, Then Writes Your Security Policies
Navil Watches Your Agents for a Week, Then Writes Your Security Policies
You''re shipping a new AI agent to production. Before you flip the switch, you need security policies. But here''s the problem: you don''t actually know how it''s going to behave yet.
You face a choice, and both options are bad.
Option 1: Be permissive. Write minimal policies or none at all. Your agent has access to everything it might need. It ships fast. Then, inevitably, something goes wrong--an agent exfiltrates data, calls a tool it shouldn''t, or spirals into unexpected behavior. You''re patching security incidents in production.
Option 2: Be restrictive. Lock down everything before day one. Write policies for every tool, every endpoint, every data source. But you''ve guessed wrong about what your agent actually needs. It gets blocked on legitimate operations. Your team spends the next two weeks relaxing rules and fixing false positives.
Most teams oscillate between these extremes, or worse--they write policies after an incident. Navil does something different.
The Problem: Security Policies Are Guesses
Security policies are control mechanisms. They define what your agents can and cannot do. But writing them before you''ve observed how agents actually behave is gambling.
Here''s what usually happens:
- You write a policy based on what you think your agent will do
- You deploy it and watch it fail on legitimate operations
- You loosen the policy
- Repeat until something goes wrong, then you tighten it back up
- You''re now in reactive mode instead of proactive mode
The fundamental issue: you''re writing rules without data.
The Solution: Observe, Then Suggest
Navil''s AI Policy Builder flips this on its head. Instead of guessing, it watches. Instead of hand-writing every rule, it learns from behavior and suggests policies based on what it actually observes.
The magic is in a 5-step feedback loop:
Step 1: OBSERVE
Navil runs silently for a week (or however long you configure). It watches your agents interact with their tools in production. It logs which tools they call, how often, what data flows through, when they''re called, and what happens after.
This is zero-impact observation. No policies yet. No blocking. Just telemetry that builds a behavior baseline.
Step 2: DETECT
Once Navil has observed enough behavior, it analyzes patterns. It asks questions like:
- "Agent A has access to 90 GitHub tools but only uses 3. Which 3?"
- "Agent B calls shell_exec 12 times a day. Is that normal? Should we rate-limit it?"
- "Agent C accessed .env twice in one week. Is that intentional or a leak?"
- "Agent D has fs_write in its scope but never actually calls it. Can we remove it?"
This step identifies anomalies, patterns, and unnecessary access.
Step 3: SUGGEST
The AI Policy Builder generates specific policy rules based on what it learned. Each suggestion includes:
- What to change -- the specific policy rule
- Why -- reasoning based on observed behavior
- Confidence score -- how confident is Navil that this is the right rule?
For example:
- "Agent code-review uses 3 of 90 GitHub tools -> scope to 3 (confidence: 97%)"
- "Agent deploy calls shell_exec 12x/day -> rate limit to 20/day (confidence: 92%)"
Step 4: APPROVE
Here''s the critical part: humans always decide. Navil suggests, but humans approve. You review each suggested rule, understand the reasoning, and decide whether to apply it. You can accept all suggestions, modify them, or reject them entirely.
This is human-in-the-loop governance. The AI doesn''t override your decisions. You maintain control.
Step 5: ENFORCE
Once you approve, Navil enforces the policies in real time at the proxy layer. Agents are blocked when they violate policy. Data flows are controlled. You''re no longer reactive--you''re proactive.
Then the loop continues. As your agents evolve, Navil learns and suggests new policies.
How It Works: The CLI Commands
There are three main commands you''ll use:
navil policy auto-generate
Bootstrap your initial policies from observed baselines:
$ navil policy auto-generate --observation-period 7d
Analyzing 7 days of agent behavior...
Generated 12 policy suggestions based on observed baselines.
Run ''navil policy suggest'' to review them.This command scans the telemetry from your observation period and generates an initial set of rules.
navil policy suggest
Review pending AI-generated rules and decide what to apply:
$ navil policy suggest
Based on 7 days of observed behavior:
✓ Agent "code-review" uses 3 of 90 GitHub tools -> scope to 3 (confidence: 97%)
✓ Agent "deploy" calls shell_exec 12x/day -> rate limit to 20/day (confidence: 92%)
⚠ Agent "research" accessed .env 2x -> recommend deny rule (confidence: 85%)
✓ Agent "slack-bot" never calls fs_write -> remove from scope (confidence: 99%)
Apply all? [y/N] y
Applied 4 rules to policy.auto.yamlNotice the confidence scores. The high-confidence suggestions (99%, 97%) are likely safe to apply automatically. The medium-confidence one (85%) might warrant a closer look before you approve.
When you confirm, rules are written to policy.auto.yaml.
navil policy rollback
If an auto-generated rule breaks something, you can undo it:
$ navil policy rollback --policy-id agent-slack-bot-fs-write-remove
Rolled back: agent-slack-bot-fs-write-remove
Policy enforced at 2026-03-25T14:22:00Z removed.Machine-Generated Rules vs. Hand-Written Rules
Navil creates two policy files:
policy.auto.yaml-- Machine-generated rules from the AI Policy Builderpolicy.yaml-- Your hand-written, manually-maintained policies
Here''s the critical design principle: your policies always take precedence. If you write a rule in policy.yaml that conflicts with something in policy.auto.yaml, your rule wins.
This means the AI never overrides your decisions. You maintain control. The machine is there to learn and suggest, not to dictate.
# policy.yaml (your hand-written rules - highest priority)
agents:
code-review:
deny:
- github.admin_endpoints # You explicitly deny this
# policy.auto.yaml (machine-generated - lower priority)
agents:
code-review:
scope:
- github.pull_requests
- github.code_search
- github.issuesIn this example, even though auto-generated policy allows some GitHub tools, your explicit deny rule takes absolute precedence.
Bring Your Own LLM
The AI Policy Builder doesn''t require you to use a specific model. It works with:
- Anthropic Claude -- Recommended for reasoning and policy generation
- OpenAI GPT-4 -- Full support
- Google Gemini -- Full support
- Ollama -- Run everything locally, no external API calls
Install with LLM support:
pip install navil[llm]Then configure your provider:
navil config llm --provider anthropic --api-key your-key
# or
navil config llm --provider local --model llama2Want zero external API calls? Run Ollama locally. Your policies are generated entirely on your infrastructure.
The Key Insight
Most teams write security policies after an incident. They''re reactive. Someone got breached or accessed something they shouldn''t, so now you write rules to prevent it next time.
Navil writes policies before one.
By observing behavior first, then suggesting rules based on actual data, you''re building security that matches reality instead of guessing at it. You''re not over-constraining your agents. You''re not under-securing them. You''re finding the optimal balance.
And critically, humans stay in control. The AI learns and suggests. You decide.
In This Series
This is part of the "How Navil Works" series:
- Token Scoping: The Foundation -- How Navil controls token access
- The Policy Engine: From Rules to Enforcement -- How policies are evaluated in real time
- AI Policy Builder: Observations -> Suggestions -> Approval <- You are here
Get Started
The AI Policy Builder is one of Navil''s core features. Deploy an agent, observe its behavior, and let Navil suggest policies based on reality instead of guessing.
pip install navilDeploy your first agent with Navil governance today. Write security policies based on data, not luck.
Navil is open-source agent security middleware. Control your agents. Understand your agents. Govern your agents.
Get your coverage score
See how well your AI agents are protected against known threats.