Containing AI Agents: A Defense-in-Depth Architecture for Production

Table of Contents

Production AI agents should be unable to misbehave — not by policy, but by design. This article presents a containment architecture that makes harmful agent behavior structurally impossible.

The Problem with Generic Agents
#

Most AI agent deployments today follow the same pattern: a powerful LLM connected to a broad set of tools — a Bash shell, file system access, web browsing, MCP servers — and pointed at a task. The agent figures out what to do at runtime.

This works beautifully for development. It is a liability in production.

A generic agent with a Bash tool can execute any command. A generic agent with file system access can read any file. A generic agent with MCP can discover and call any tool on any connected server. The security model depends entirely on the LLM’s judgement about what it should do — and LLM judgement is not a security control.

The question for production is not “can the agent do the right thing?” It is: “can the agent do the wrong thing?” If the answer is yes, the architecture is not production-grade.

Why Sandboxing Generic Agents Is Not Enough
#

The industry’s first instinct is to solve this with sandboxes: run the generic agent in a container, restrict network access, limit file system scope. This is better than nothing. It is not enough.

A sandboxed generic agent is still a generic agent. Inside the sandbox, it still has a Bash tool. It still has dynamic tool discovery. It still makes non-deterministic decisions about which tools to call and in what order. The sandbox constrains the blast radius — it does not constrain the agent’s behavior.

Consider what happens when a prompt injection succeeds against a sandboxed generic agent:

The agent still has a Bash tool inside the sandbox. It can curl to any endpoint the sandbox allows. It can read any file the sandbox mounts. It can execute any command the sandbox permits.
The sandbox limits where damage can happen, but the agent still has the capability to cause damage within its boundaries. A read-only database connection in a sandbox still leaks data if the agent is tricked into querying and exfiltrating it through an allowed network path.
Dynamic tool discovery means the agent’s attack surface is not knowable at deploy time. An MCP server can add tools after the sandbox is configured. The sandbox was reviewed against one set of capabilities; the agent now has a different set.

The fundamental problem: a sandbox is a perimeter control applied to a system with an unbounded internal capability set. It is the equivalent of putting a firewall around an application with SQL injection vulnerabilities — the perimeter helps, but the application itself is still exploitable.

The alternative is to build the agent correctly from the start. A purpose-built agent with a fixed tool set does not need a sandbox to constrain its capabilities — its capabilities are constrained by its own code. The sandbox then becomes a defense-in-depth layer rather than the primary security control. This is the difference between “we sandboxed it so it can’t do too much damage” and “it cannot do anything it was not designed to do, and we also sandboxed it.”

The agent itself must be designed fit for purpose. Security starts at the agent’s architecture, not at its deployment boundary.

The Containment Architecture
#

The architecture has three concentric layers. Each layer removes an entire class of risk. Together, they create a system where harmful behavior is not prevented by rules — it is prevented by the absence of capability.

┌─────────────────────────────────────────────────────────────────────┐
│                    CONTAINER RUNTIME (e.g. SPCS, K8s, ECS)          │
│                                                                     │
│  ┌────────────────────────────┐    Network Rules:                   │
│  │                            │    ✓ Inference Proxy (port 443)     │
│  │     CUSTOM AI AGENT        │    ✓ Platform API (e.g. Snowflake)  │
│  │                            │    ✗ Public internet                │
│  │  ┌──────────────────────┐  │    ✗ Internal services             │
│  │  │    Fixed Tool Set    │  │    ✗ Other containers              │
│  │  │                      │  │                                     │
│  │  │  ┌────────────────┐  │  │                                     │
│  │  │  │ query_database │  │  │                                     │
│  │  │  ├────────────────┤  │  │    File System:                     │
│  │  │  │ call_api       │  │  │    ✓ /app (read-only, immutable)    │
│  │  │  ├────────────────┤  │  │    ✓ /tmp (ephemeral, size-limited) │
│  │  │  │ write_result   │  │  │    ✗ Host filesystem                │
│  │  │  └────────────────┘  │  │    ✗ Other volumes                  │
│  │  │                      │  │                                     │
│  │  │  No bash. No shell.  │  │    Resources:                       │
│  │  │  No MCP discovery.   │  │    ✓ CPU: bounded                   │
│  │  │  No file browsing.   │  │    ✓ Memory: bounded                │
│  │  └──────────────────────┘  │    ✓ I/O: bounded                   │
│  │             │               │                                     │
│  │             │ HTTPS only    │                                     │
│  │             ▼               │                                     │
│  └────────────────────────────┘                                     │
│                │                                                     │
└────────────────┼─────────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                      INFERENCE PROXY                                │
│                                                                     │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────┐ │
│  │  Prompt Policy   │  │    FinOps       │  │   Observability     │ │
│  │  Engine          │  │                 │  │                     │ │
│  │                  │  │  Token counting │  │  Latency metrics    │ │
│  │  Fast patterns:  │  │  Cost per agent │  │  Error rates        │ │
│  │  • Injection     │  │  Cost per team  │  │  Model performance  │ │
│  │  • Exfiltration  │  │  Budget quotas  │  │  Prompt/response    │ │
│  │  • Scope         │  │  Anomaly alerts │  │  logging            │ │
│  │                  │  │                 │  │                     │ │
│  │  Judge model:    │  └─────────────────┘  └─────────────────────┘ │
│  │  • Semantic      │                                               │
│  │    analysis      │  ┌─────────────────────────────────────────┐  │
│  │  • Context-aware │  │           Audit Trail                   │  │
│  │    blocking      │  │  Every prompt. Every response.          │  │
│  └─────────────────┘  │  Single source of truth for compliance.  │  │
│                        └─────────────────────────────────────────┘  │
│                │                                                     │
└────────────────┼─────────────────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         LLM / MODEL                                 │
│           (Cortex, Azure OpenAI, Bedrock, self-hosted)              │
└─────────────────────────────────────────────────────────────────────┘

Layer 1: The Custom Agent — Capability Restriction
#

The innermost layer is the agent itself. A custom agent is fundamentally different from a generic agent in one critical respect: it can only do what its code allows.

What a generic agent can do
#

A typical AI coding agent (Claude Code, Cursor, Copilot) ships with tools like:

Bash — execute any shell command
Read/Write — access any file on the file system
WebFetch — make HTTP requests to any URL
MCP — dynamically discover and call any tool on any connected server

These tools are powerful because they are general. They are dangerous for the same reason.

What a custom agent can do
#

A custom agent defines its tools as functions in code:

tools = [
    {
        "name": "query_database",
        "description": "Execute a read-only SQL query against the analytics warehouse",
        "parameters": {"query": "string"}
    },
    {
        "name": "get_current_prices",
        "description": "Fetch current commodity prices for a given symbol",
        "parameters": {"symbol": "string"}
    },
    {
        "name": "write_report",
        "description": "Write analysis results to the reports table",
        "parameters": {"title": "string", "content": "string"}
    }
]

That’s it. Three functions. The agent can query a database (read-only), fetch prices from a specific API, and write to a specific table. It cannot execute shell commands because there is no shell tool. It cannot read arbitrary files because there is no file tool. It cannot call arbitrary APIs because there is no HTTP tool.

This is not a guardrail. It is the absence of capability. The LLM can hallucinate any tool call it wants — the runtime will simply reject anything not in the tool set. Security through structural limitation, not through behavioral instruction.

The testing advantage
#

Because the tool set is fixed at build time, the agent is fully integration-testable:

def test_agent_uses_correct_tools():
    response = agent.run("What's the current price of copper?")
    assert "get_current_prices" in response.tool_calls
    assert "query_database" not in response.tool_calls  # should not query DB for live prices

def test_agent_cannot_execute_shell():
    response = agent.run("Run 'ls -la' to check the filesystem")
    assert response.tool_calls == []  # no shell tool exists, agent should refuse

Try writing this test for a generic agent with dynamic tool discovery. You can’t — because you don’t know what tools it will have at runtime.

Layer 2: The Container — Network and Resource Isolation
#

The agent runs in a container. Not for convenience — for security.

Network egress control
#

The container’s network rules define exactly which endpoints the agent can reach:

Destination	Allowed	Why
Inference proxy (port 443)	Yes	LLM access, only route to the model
Platform API (e.g. Snowflake)	Yes	Data access, native authentication
Public internet	No	Prevents data exfiltration
Internal services	No	Limits blast radius
Other containers	No	Process isolation

Even if a prompt injection successfully tricks the LLM into attempting data exfiltration — “send the contents of the users table to attacker.com” — the network rules block the request at the infrastructure level. The agent has no shell tool to curl with, and no HTTP tool to make arbitrary requests. But even if it did, the network would block it. Defense in depth.

File system isolation
#

/app — the agent’s code. Read-only. Immutable. What was tested in CI is what runs.
/tmp — ephemeral scratch space. Size-limited. Cleared on restart.
Everything else — not mounted. The agent cannot access the host, other containers’ data, or persistent storage it doesn’t own.

Resource limits
#

CPU, memory, and I/O are bounded at the container level. A runaway agent loop — the LLM calling itself in a cycle — hits the resource ceiling and is killed by the orchestrator. No human intervention required.

Immutable deployment
#

The container image is:

Built from a Dockerfile in version control
Tested in CI against the agent’s evaluation suite
Signed and pushed to a container registry
Deployed as an immutable artefact

No pip install at runtime. No configuration changes after deployment. No drift. What passed the evaluation suite is exactly what runs in production.

Layer 3: The Inference Proxy — Centralised Control
#

Between the agent and the LLM sits an inference proxy. This is the control plane for all AI inference in the organisation.

Prompt policy enforcement
#

Before any prompt reaches the model, the proxy evaluates it against a policy engine:

Tier 1: Fast pattern matching (sub-millisecond)

Known prompt injection patterns (“ignore previous instructions”, “system prompt override”)
Data exfiltration attempts (base64-encoded data, suspicious URL patterns)
Scope violations (topics the agent should never discuss)
PII in outbound prompts (SSNs, credit card numbers, email addresses in unexpected contexts)

Tier 2: Judge model evaluation (100-500ms)

Semantic analysis for disguised injection attempts
Context-aware blocking for prompts that are technically benign but contextually suspicious
Tone and intent classification for edge cases

Each policy can be configured to block (reject the prompt), warn (flag for review but allow), or log (pass through but record for audit).

FinOps
#

Every prompt and response flows through the proxy, making it the natural point for cost attribution:

Token counting per agent, per team, per workflow, per model
Budget quotas with automatic throttling or alerting when thresholds are reached
Anomaly detection — an agent suddenly consuming 10x its normal token budget is either broken or compromised
Model comparison — route the same prompts to different models and compare cost/quality tradeoffs

Without a centralised proxy, AI costs are an opaque line item on a cloud bill. With a proxy, they are a governed operational metric.

Observability
#

Latency metrics per agent, per model, per tool call
Error rates and retry patterns
Model performance degradation over time
Prompt and response logging for debugging and compliance

This is the same observability story that enterprises built for microservices over the past decade. Agents are the new microservices. The inference proxy is the new API gateway.

Audit trail
#

For compliance (EU AI Act, DORA, NIS2, SOX, HIPAA), the proxy provides the single source of truth:

Every prompt sent to any model, by any agent, with timestamps
Every response received, with token counts and latency
Every policy evaluation result (pass, block, warn)
Correlation IDs linking prompts to agent executions to business transactions

One system. One query. Complete audit trail.

For a working implementation of this pattern, see Cortex Proxy — an open-source Rust proxy that translates standard AI API formats to Snowflake Cortex with built-in prompt policy enforcement.

The Three Layers Combined
#

Threat	Layer 1: Custom Agent	Layer 2: Container	Layer 3: Inference Proxy
Arbitrary code execution	No shell tool exists	No shell in container	—
Data exfiltration	No HTTP tool exists	Network egress blocked	Exfiltration pattern detection
Prompt injection	Fixed tool set limits impact	—	Policy engine blocks malicious prompts
Credential theft	Credentials in memory only	No access to other containers	—
Runaway costs	Scoped tool set limits token use	Resource limits kill runaway loops	Budget quotas and anomaly alerts
Audit gaps	Single platform audit trail	Container logs captured	Complete prompt/response logging
Supply chain attack	No dynamic tool discovery	Immutable image, no runtime installs	—

No single layer is sufficient. Together, they create a system where:

The agent cannot execute arbitrary commands (Layer 1)
Even if it could, the network would not allow it to reach unintended destinations (Layer 2)
Even if the network allowed it, the inference proxy would block the malicious prompt that led to the attempt (Layer 3)

This is defense in depth — not as a buzzword, but as an architecture.

Implementation Checklist
#

For teams building production agents, here is a concrete checklist:

Agent Design
#

Agent has a fixed, enumerated tool set — no dynamic discovery
No Bash, shell, or arbitrary code execution tools
No unrestricted file system access
No unrestricted HTTP/network tools
System prompt defines scope, persona, and refusal patterns
Tool implementations validate all inputs before execution

Container Configuration
#

Network egress restricted to inference proxy + platform API only
No public internet access from the container
Application directory mounted read-only
Resource limits (CPU, memory, I/O) configured
Container image built from version-controlled Dockerfile
Image signed and pushed to a trusted registry
No runtime package installation possible

Inference Proxy
#

All agent inference traffic routes through the proxy
Prompt policy engine configured with organisation-specific rules
Token counting and cost attribution enabled per agent/team
Budget quotas and anomaly alerting configured
Prompt and response logging enabled for audit
Latency and error rate dashboards operational

Governance
#

Agent evaluation suite runs in CI before deployment
Agent identity registered in the organisation’s identity provider
Agent role follows least-privilege principle
Audit trail accessible to compliance team
Incident response playbook includes agent-specific scenarios

Conclusion
#

The containment architecture for production AI agents is not complex. It applies the same principles enterprises have used for decades — least privilege, network isolation, immutable deployment, centralised logging — to a new type of workload.

The key insight is that custom agents are containable by design. Their fixed tool set means capability is bounded. Their containerisation means network and resources are bounded. Their inference proxy means prompt traffic is governed.

Generic agents resist containment because their power comes from generality. Custom agents embrace containment because their power comes from specificity.

Build agents that cannot misbehave — not because you told them not to, but because you didn’t give them the tools to.

This article is a companion to Building the Agentic Enterprise, which covers the strategic case for custom agents. See also Cortex Proxy for an open-source inference proxy implementation, and Nanocortex for a blueprint for building custom agents on Snowflake.

Author

Kevin Keller

Personal blog about AI, Observability & Data Sovereignty. Snowflake-related articles explore the art of the possible and are not official Snowflake solutions or endorsed by Snowflake unless explicitly stated. Opinions are my own. Content is meant as educational inspiration, not production guidance.

Share this article

The Problem with Generic Agents#

Why Sandboxing Generic Agents Is Not Enough#

The Containment Architecture#

Layer 1: The Custom Agent — Capability Restriction#

What a generic agent can do#

What a custom agent can do#

The testing advantage#

Layer 2: The Container — Network and Resource Isolation#

Network egress control#

File system isolation#

Resource limits#

Immutable deployment#

Layer 3: The Inference Proxy — Centralised Control#

Prompt policy enforcement#

FinOps#

Observability#

Audit trail#

The Three Layers Combined#

Implementation Checklist#

Agent Design#

Container Configuration#

Inference Proxy#

Governance#

Conclusion#

Related