The era of AI agents that don’t just answer questions but take action is here. Every enterprise is experimenting. Few are in production. The gap between pilot and production is not a technology problem — it is an architecture problem.
The Agentic Enterprise Is Not a Future State#
The industry’s most consequential voices are aligned: the agentic enterprise is not a future state — it is the present trajectory.
Sridhar Ramaswamy, CEO of Snowflake, has stated that “we are entering the era of the agentic enterprise”, predicting that 2026 will be the year of the “Great Decentralization” — where massive general-purpose models are challenged by specialised, task-oriented agents that prioritise data context over raw compute scale. With the launch of Project SnowWork, Snowflake is backing this vision with a platform to put secure, data-grounded AI agents on every business surface.
Satya Nadella, CEO of Microsoft, frames 2026 as a “pivotal year for AI”, describing an “agentic revolution” where AI doesn’t just respond to prompts but coordinates tools, services, and workflows autonomously — while preserving human agency and organisational control.
Jensen Huang, CEO of NVIDIA, put it most viscerally at GTC 2026: “We’re going to see agents in every single part of every single company.” He predicted that in ten years, NVIDIA’s 75,000 employees will work alongside 7.5 million AI agents — and that every company will need an agentic strategy.
The consensus is clear. The question is no longer whether but how. And this is where the industry conversation needs a sharper edge — because the architecture you choose determines whether your agents are production-grade or permanently stuck in pilot.
We are past the point of debating whether AI agents will transform enterprise operations. They already are — just not in the way most technology strategies anticipated.
The dominant pattern today looks like this: a general-purpose AI model (Claude, GPT, Gemini) connected to a set of tools via a protocol like MCP (Model Context Protocol), pointed at enterprise data, and asked to do useful work. This pattern is powerful for exploration. It is how most organisations discovered what agents can do.
But exploration is not production. And the organisations that are actually deploying agents at scale — in financial services, healthcare, manufacturing, government — are converging on a different architecture. One that prioritises determinism over flexibility, native integration over protocol abstraction, and embedded security over bolted-on controls.
This paper argues that the agentic enterprise will not be built on generic agents with generic protocols. It will be built on purpose-specific agents with native tooling, designed for their task, integrated into the organisation’s security infrastructure, and deployed with the same rigour as any other production workload.
The thesis is simple: custom agents are more effective, more efficient, integrate natively into existing infrastructure and workflows, have a smaller attack surface, are easier to test, and are more tightly scoped to the business problem they solve. They are not harder to build — they are harder to build badly, which is exactly the property you want in a production system.
Where We Are: The Three Phases of Enterprise Agent Adoption#
Phase 1: Experimentation (2024–2025)#
Developers discovered that LLMs can call tools. MCP emerged as a standard for connecting models to external systems. Hundreds of MCP servers appeared — for databases, APIs, file systems, cloud services. The value proposition was compelling: plug any tool into any model, instantly.
This phase produced genuine insights. Organisations learned which workflows benefit from agent automation, which data sources agents need, and where the boundaries of LLM reasoning fall.
Phase 2: Disillusionment (2025–2026)#
The pilots that worked in demos failed in production. Common failure modes:
- Security gaps. MCP servers authenticate independently of the enterprise identity stack. Credentials are managed per-server, not per-user. Audit trails are fragmented across multiple systems.
- Reliability. Generic agents with dynamic tool discovery make non-deterministic decisions about which tools to call. The same prompt produces different tool sequences on different runs. This is acceptable for a coding assistant; it is not acceptable for a financial workflow.
- Compliance. Regulators want to know: which data did the agent access, under whose authority, and was the action auditable? A generic agent calling an MCP server calling a database calling an API produces an audit trail that spans four systems and three authentication boundaries.
- Cost. Every MCP call is a network round-trip. Every dynamic tool discovery adds tokens. Every retry adds latency. At scale, the overhead of protocol abstraction becomes a line item.
Phase 3: Production Architecture (2026+)#
The organisations that break through disillusionment share a common architectural pattern: they stop trying to make generic agents work for specific tasks and instead build specific agents for specific tasks.
This is not a retreat from the agentic vision. It is its maturation.
The Case Against MCP in Production#
MCP is a well-designed protocol. It solved a real problem: how do you connect an AI model to tools without writing custom integration code for each model-tool pair? The answer — a standard protocol for tool discovery, invocation, and result handling — was the right answer for the experimentation phase.
But production has different requirements than experimentation.
Dynamic Tool Discovery Is a Liability#
In production, you do not want an agent to discover tools at runtime. You want the agent to know exactly which tools it has, exactly what those tools do, and exactly what parameters they accept — at build time, not at runtime.
Dynamic discovery means an agent’s capabilities can change without the agent’s code changing. An MCP server can update its tool definitions. A new server can be added to the agent’s configuration. In the security literature, this is called a “rug pull” — and it is a recognised attack vector for MCP deployments.
A purpose-built agent has a fixed tool set. Its capabilities are defined in code, reviewed in pull requests, tested in CI, and deployed as immutable artefacts. The attack surface is the tool implementations, not a protocol layer that can change underneath you.
Authentication Delegation Is a Risk#
Every MCP server manages its own authentication. When an agent calls three MCP servers in a single workflow, it delegates credentials to three independent trust boundaries. The agent — or the framework running the agent — must manage these credentials, often storing them in environment variables, config files, or in-memory caches.
A custom agent authenticates once, to the platform it runs on. It holds credentials in process memory. Those credentials are scoped to the agent’s role. There is no delegation, no credential fan-out, no trust boundary multiplication.
The Audit Trail Fragments#
A production agent must produce a single, continuous audit trail that answers: who asked, what was done, which data was accessed, and what was the outcome. When an agent calls an MCP server that calls an API that accesses a database, the audit trail spans four systems. Correlating those logs in a SIEM is possible but expensive. Proving to a regulator that the chain is complete is harder.
A custom agent that uses native platform APIs produces a single audit trail in a single system. Every query, every tool call, every result — logged in one place, under one identity, with one set of governance controls.
Protocol Overhead at Scale#
MCP uses HTTP/SSE for remote servers and stdio for local servers. Each tool call is a network request (remote) or a subprocess communication (local). At low volume, this overhead is negligible. At thousands of agent invocations per hour — which is where the agentic enterprise is headed — protocol overhead becomes a scaling concern.
Custom agents calling native APIs eliminate the protocol layer entirely. A function call is a function call. The serialisation cost is zero. The latency is the latency of the operation itself, not the operation plus the protocol.
The Architecture That Works: Custom Agents with Native Tooling#
The pattern that is emerging in production deployments has five characteristics:
1. Purpose-Specific Design#
A production agent does one thing well. It is not a general-purpose assistant that can do anything. It is a financial reconciliation agent, or a security audit agent, or a supply chain risk monitor. Its system prompt, tool set, and guardrails are all designed for that specific purpose.
This is the same principle that software engineering has applied for decades: specialised components beat monolithic general-purpose systems. An agent is a component, not a platform.
2. Native Tool Integration#
Instead of calling tools through a protocol layer, a production agent calls platform-native APIs directly. If the agent runs on a data platform, it calls the platform’s SQL API. If it needs to send a message, it calls the messaging system’s SDK. If it needs to read a file, it reads a file.
This eliminates an entire class of integration concerns: protocol version compatibility, server availability, authentication delegation, response format translation. The agent speaks the platform’s language natively.
3. Embedded Security#
Security is not a layer added on top. It is built into the agent’s architecture:
- Identity. The agent authenticates as itself, with its own credentials, scoped to its own role. It does not impersonate users or delegate credentials to external services.
- Authorisation. The platform’s native RBAC controls what the agent can access. Row-level policies, column masking, object grants — all enforced by the platform, not by the agent’s code.
- Secrets. Credentials live in the agent’s process memory or in a secrets manager. They never transit a protocol layer, never appear in tool parameters, never get logged in an MCP server’s request log.
- Audit. Every action produces an audit event in the platform’s native logging system. No log correlation across multiple systems required.
4. Deterministic Tool Execution#
A production agent does not decide at runtime which tools to call based on dynamic discovery. Its tool set is fixed and known. The LLM reasons about when and how to use the tools, but the tools themselves are compile-time constants, not runtime discoveries.
This makes the agent’s behavior testable. You can write integration tests that verify: given this prompt, the agent calls these tools in this order with these parameters. Try doing that with an agent whose tool set is discovered at runtime from a remote server.
5. Evaluation Before Deployment#
Production agents are evaluated against domain-specific test suites before deployment — not just “does it produce a reasonable response” but “does it produce the correct response for this specific business scenario, with this specific data, within this specific latency budget.”
This is what the industry is starting to call “agent evaluation frameworks.” They are the CI/CD of agentic systems: automated, domain-specific, quantitative, and gating. No agent ships to production without passing its evaluation suite.
Why Custom Agents Win: The Six Advantages#
| Advantage | Generic Agent + MCP | Custom Agent + Native Tooling |
|---|---|---|
| Effectiveness | Broad but shallow — can call many tools, masters none | Deep domain expertise — tools designed for the specific task |
| Efficiency | Protocol overhead per call, token cost for tool discovery | Direct API calls, no serialisation layer, minimal token overhead |
| Infrastructure integration | Bolted on via protocol adapters | Native — speaks the platform’s language, uses its SDKs, runs in its compute |
| Workflow integration | Agent is a standalone tool users interact with | Agent is embedded in the business process — triggered by events, writing results back to systems of record |
| Attack surface | Multiple trust boundaries: agent → MCP server → API → data | Single trust boundary: agent → platform. Credentials in memory, not in transit |
| Testability | Non-deterministic tool discovery makes testing probabilistic | Fixed tool set, deterministic execution paths, integration-testable |
Where MCP Still Wins#
This paper is not arguing that MCP should be abandoned. It has clear, enduring value in three contexts:
Development and prototyping. When you are figuring out what an agent should do, MCP’s plug-and-play tool ecosystem is invaluable. Discover tools, experiment with combinations, iterate fast. Then, once you know what works, build the production version with native tooling.
Multi-vendor integration. When an agent genuinely needs to call tools from multiple vendors that have no shared platform, MCP provides a common interface. This is the “glue” use case — and it is real, especially for orchestration layers that coordinate across cloud providers.
Community and ecosystem. The MCP ecosystem has produced hundreds of well-tested tool implementations. Even if you build a custom agent, you can study MCP server implementations as reference architectures for your native tool integrations.
The mistake is not using MCP. The mistake is using MCP as the production integration layer when a native integration would be simpler, faster, more secure, and more auditable.
The Agentic Enterprise Maturity Model#
| Level | Characteristic | Agent Architecture | Governance |
|---|---|---|---|
| 1. Experimentation | Individual developers using AI assistants | Generic agent + MCP servers | None (shadow AI) |
| 2. Standardisation | IT provides approved agent frameworks | Generic agent + curated MCP servers | Basic (approved tools list) |
| 3. Production | Business-critical workflows automated | Custom agents + native tooling | Full (identity, RBAC, audit, evaluation) |
| 4. Enterprise-scale | Agents collaborate across domains | Custom agents + orchestration layer | Platform-enforced (governance as code) |
Most organisations are at Level 1 or 2. The transition to Level 3 requires a mindset shift: from “how do we connect our agent to everything?” to “how do we build an agent that does this one thing reliably, securely, and at scale?”
Level 4 is where the agentic enterprise vision fully materialises — multiple purpose-built agents, each excellent at their domain, coordinated by an orchestration layer that may well use a standard protocol (MCP or its successor) for inter-agent communication. But the individual agents themselves are purpose-built, not generic.
The Security Advantage: Agents That Cannot Misbehave by Design#
One of the most underappreciated advantages of custom agents over generic ones is not just that they can be secured more easily — it is that they are structurally incapable of doing things they were not designed to do.
No Bash Tool, No Arbitrary Execution#
A generic AI coding agent typically has access to a Bash tool, a file system tool, and a broad set of MCP servers. It can, in principle, execute any command, read any file, and call any API it discovers. The security model relies entirely on the LLM’s judgement about what it should do — which is a guardrail, not a guarantee.
A custom agent has none of this. It can only execute the functions defined in its code. There is no Bash tool that can run arbitrary commands. There is no dynamic tool discovery that could introduce unexpected capabilities. The agent’s code is its security boundary. What is not in the code cannot be executed. This is not a policy — it is a technical constraint.
Containerisation: Controlling the Blast Radius#
Production agents must be containerised. A container provides:
- Network egress/ingress control. The agent can only communicate with explicitly allowed endpoints. No exfiltration to arbitrary URLs. No inbound connections from untrusted sources. Network rules are enforced at the infrastructure level, not by the agent’s code.
- File system isolation. Any file operations are restricted to the container’s ephemeral or mounted volumes. The agent cannot access the host file system, other containers’ data, or shared storage unless explicitly granted.
- Resource limits. CPU, memory, and I/O are bounded. A runaway agent loop cannot consume unbounded resources.
- Immutable deployment. The container image is built, signed, and deployed as an immutable artefact. What was tested is what runs. No runtime modifications, no package installations, no configuration drift.
This is the same security model that enterprises apply to microservices. Agents are workloads. Treat them as such.
The Inference Proxy: Centralising Control#
Between the agent and the LLM, there should be an inference proxy — a centralised gateway that all AI inference traffic flows through. This is not optional for production deployments. It provides:
- FinOps. Centralised token counting and cost attribution. Know which agent, which team, and which workflow is consuming inference budget. Set quotas. Alert on anomalies. This turns AI spend from an opaque cloud bill line item into a governed operational cost.
- DevOps and observability. Centralised logging of all prompts and completions. Latency metrics. Error rates. Model performance tracking. One dashboard for all AI inference, regardless of which agent or which model.
- Prompt policy enforcement. Before a prompt reaches the LLM, the proxy can evaluate it against a policy engine — blocking prompt injection attempts, data exfiltration patterns, scope violations, and other harmful patterns. Two-tier evaluation works well here: fast pattern matching for known threats (milliseconds), followed by a judge model for deeper semantic analysis of ambiguous cases. Policies can block, warn, or log.
- Audit trail. Every prompt and every response is logged centrally. For compliance, this is the single source of truth for “what did the AI do?”
For a working proof of technology that implements this pattern, see Cortex Proxy — an open-source Rust proxy that translates standard AI API formats to Snowflake Cortex while enforcing prompt policies.
The combination of custom agents (can only do what their code allows) + containerisation (can only reach what the network allows) + inference proxy (can only send prompts that policy allows) creates a defense-in-depth architecture where security is not a layer you add — it is a property of the system.
┌─────────────────────────────────────────────────────────────┐
│ CONTAINER (network-isolated) │
│ │
│ ┌───────────────────────────────────────┐ │
│ │ CUSTOM AI AGENT │ Network Rules: │
│ │ │ ✓ Proxy (443) │
│ │ Fixed tools only: │ ✓ Platform API │
│ │ • query_database() │ ✗ Internet │
│ │ • call_api() │ ✗ Other hosts │
│ │ • write_result() │ │
│ │ │ Filesystem: │
│ │ No bash. No shell. No MCP. │ ✓ /app (r/o) │
│ │ No file browsing. No web access. │ ✓ /tmp (limited)│
│ └──────────────────┬────────────────────┘ ✗ Host FS │
│ │ │
└─────────────────────┼────────────────────────────────────────┘
│ HTTPS only
▼
┌─────────────────────────────────────────────────────────────┐
│ INFERENCE PROXY │
│ │
│ ┌──────────────┐ ┌──────────┐ ┌─────────────┐ ┌────────┐ │
│ │Prompt Policy │ │ FinOps │ │Observability│ │ Audit │ │
│ │Engine │ │ │ │ │ │ Trail │ │
│ │• Injection │ │• Tokens │ │• Latency │ │• Every │ │
│ │• Exfiltration│ │• Cost/ │ │• Errors │ │ prompt│ │
│ │• Scope │ │ agent │ │• Model perf │ │• Every │ │
│ │• PII │ │• Budgets │ │• Dashboards │ │ resp │ │
│ └──────────────┘ └──────────┘ └─────────────┘ └────────┘ │
│ │ │
└─────────────────────┼────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LLM (Cortex, Azure OpenAI, Bedrock) │
└─────────────────────────────────────────────────────────────┘For a deep dive into implementing each layer — including threat matrices, configuration checklists, and code examples — see the companion article: Containing AI Agents: A Defense-in-Depth Architecture for Production.
What This Means for Technology Leaders#
For CIOs#
The agentic enterprise is not a technology project. It is an operating model change. The question is not “should we deploy AI agents?” — your employees are already using them (shadow AI is the fastest-growing category in every enterprise). The question is: “how do we govern, secure, and scale agent deployments so they produce reliable business outcomes?”
The answer starts with architecture, not procurement. A purpose-built agent for your most painful workflow — deployed with proper identity, governance, and evaluation — will deliver more value than a hundred MCP-connected generic agents running in a developer’s IDE.
For CTOs and Architects#
Invest in the agent development capability, not in the protocol ecosystem. The organisations that build internal competence in custom agent development — understanding the agentic loop, tool design, prompt engineering, evaluation frameworks — will move faster than those that wait for the protocol ecosystem to mature.
The protocol layer will standardise eventually. When it does, your custom agents can adopt it for inter-agent communication. But the core of each agent — its tools, its security model, its domain logic — will always be custom. Start building that muscle now.
For Security Leaders#
AI agents are workloads. They authenticate, authorise, access data, communicate over networks, and must be audited. Apply the same defense-in-depth principles you apply to any other workload: network isolation, identity propagation, least-privilege access, data protection, and continuous monitoring.
The biggest risk is not the agent itself. It is the integration layer between the agent and your data. Every protocol hop, every credential delegation, every trust boundary crossing is an attack surface. Minimise those boundaries by building agents that integrate natively with your platform’s security model.
Conclusion#
The agentic enterprise will be built by organisations that treat agents as first-class production workloads — purpose-built, natively integrated, security-embedded, and rigorously evaluated.
The protocol hype of 2024–2025 served its purpose: it showed us what agents can do. The production reality of 2026 and beyond will be shaped by a different question: not what agents can do, but what they reliably do, under governance, at scale, with accountability.
Generic agents with generic protocols are the scaffolding. Custom agents with native tooling are the building.
It is time to start building.
This paper is part of a series on AI agent architecture. See also: Governing AI Inference in the Data Cloud for the security architecture, and Nanocortex for a hands-on blueprint for building custom agents.
