Skip to main content
  1. Posts/

Nanocortex: A Blueprint for Building Custom Snowflake Cortex Agents

·4391 words·21 mins·
Table of Contents

How a 1,600-line Python file can teach you everything about the Cortex Agent API

Nanocortex is open source: github.com/sfc-gh-kkeller/nanocortex


Building AI-powered applications that interact with your data warehouse is increasingly important. Snowflake’s Cortex Agent API provides a powerful foundation, but the learning curve can be steep. Enter Nanocortex — a minimal, single-file implementation that serves as both a working coding assistant and a blueprint for building your own custom agents.

Why This Matters: Build Any Agent You Can Imagine
#

Nanocortex is a coding assistant, but the patterns it teaches are universal. The Cortex Agent API combined with custom tools means you can build any kind of agent — the only limit is the tools you give it. Here are some real-world examples to spark your imagination:

Enterprise Productivity

  • An agent that generates PowerPoint decks and Excel reports from Snowflake data — ask it “create a quarterly sales review for EMEA” and it queries the data, builds charts, and assembles a polished .pptx via the python-pptx library, all through a custom tool.
  • A meeting summariser that pulls transcripts from your collaboration platform, extracts action items, and writes them back to your project tracker — fully automated, triggered on a schedule.

Intelligent Security & Networking

  • An AI-powered firewall analyst that continuously observes network traffic logs in Snowflake, flags anomalies using Cortex AI functions, and automatically creates incident tickets — think of it as an intelligent layer on top of your SIEM that can reason, not just match rules.
  • A compliance auditor agent that scans your Snowflake account for privilege escalation paths, public access patterns, and policy violations — then generates a remediation report with SQL commands ready to execute.

Data Engineering & Integration

  • Replace classic CDC (Change Data Capture) pipelines with an AI agent that understands schema changes, resolves conflicts intelligently, and adapts transformations on the fly — no more brittle rule-based ETL that breaks when a source system adds a column.
  • A data quality agent that profiles incoming data, detects drift, and either auto-corrects issues or alerts the right team with context — not just “column X has nulls” but “column X started having nulls after the CRM migration on Tuesday, here’s the upstream query that changed.”

Domain-Specific Assistants

  • A financial analyst agent that can pull market data, join it with internal positions from Snowflake, run risk calculations, and produce a morning briefing — all triggered by a single message.
  • A customer support agent that has access to your product database, order history, and documentation — it can look up a customer’s issue, check their order status via SQL, and draft a resolution, all in one conversational flow.

The blueprint is always the same: define tools, register them, let the agent reason about when and how to use them. Nanocortex shows you exactly how that loop works — once you understand it, you can build agents for any domain.


What is Nanocortex?
#

Nanocortex is a ~1,600 line Python CLI that demonstrates the complete lifecycle of a Cortex Agent, so you can build your own custom agent.

This is just an educational article from me. You are on your own when you implement custom agents.

This is also not an offical Snowflake document. Snowflake will likely make this easier in the future. So look out for that.

But why wait?

If you are a builder, or Snowflake partner that wants to deliver Cortex based agents, this will help you get started right now.

  • Minimal dependencies (stdlib only required; optional snowflake-connector-python for better SQL execution, falls back to REST API)
  • Multiple authentication methods (PAT, Private Key/JWT, Workload Identity Federation, External Browser)
  • Client-side tools (bash, read, write, edit, glob, grep, snowflake_sql_execute)
  • Server-side tools (web_search)
  • SSE streaming with full tool execution loop
  • Multi-turn conversations with context management
  • Self-correction loop (reflection mode) for automatic error recovery

Think of it as the “minimal viable Cortex agent”

Architecture Overview
#

┌─────────────────────────────────────────────────────────────────┐
│                        Nanocortex CLI                           │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────┐  │
│  │   System     │  │   Message    │  │   Context            │  │
│  │   Prompt     │  │   History    │  │   Management         │  │
│  └──────────────┘  └──────────────┘  └──────────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                    Tool Registry                          │  │
│  │  ┌─────────────────────┐  ┌─────────────────────────┐    │  │
│  │  │   Client-Side       │  │   Server-Side           │    │  │
│  │  │   - bash            │  │   - web_search          │    │  │
│  │  │   - read/write/edit │  │                         │    │  │
│  │  │   - glob/grep       │  │                         │    │  │
│  │  │   - snowflake_sql   │  │                         │    │  │
│  │  └─────────────────────┘  └─────────────────────────────┘│  │
│  └──────────────────────────────────────────────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────────────────────────────────────────────────┐  │
│  │              Cortex Agent API (SSE Streaming)             │  │
│  └──────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

Key Concepts for Building Your Own Agent
#

1. Authentication Flow
#

Nanocortex supports multiple authentication methods:

PAT (Programmatic Access Token) - simplest for automation:

body = {
    "data": {
        "ACCOUNT_NAME": account_name,
        "AUTHENTICATOR": "PROGRAMMATIC_ACCESS_TOKEN",
        "LOGIN_NAME": self.user,
        "TOKEN": self.pat,
    }
}

Private Key / JWT Authentication - for service accounts:

def generate_jwt_token(account, user, private_key_path, private_key_pwd=None):
    # Load private key and generate public key fingerprint
    private_key = serialization.load_pem_private_key(key_bytes, password=pwd)
    public_key_fp = "SHA256:" + base64.b64encode(sha256(public_key_der).digest())

    # Build JWT payload
    payload = {
        "iss": f"{qualified_user}.{public_key_fp}",
        "sub": qualified_user,
        "iat": now,
        "exp": now + 3600,
    }

    # Sign with RS256 and return JWT
    return f"{header_b64}.{payload_b64}.{signature_b64}"

# Use with SNOWFLAKE_JWT authenticator
body = {"data": {"AUTHENTICATOR": "SNOWFLAKE_JWT", "TOKEN": jwt_token, ...}}

Workload Identity Federation (WIF) - for cloud environments:

def fetch_wif_token(provider="auto"):
    """Fetch token from cloud provider metadata service."""
    endpoints = {
        "gcp": "http://metadata.google.internal/.../token",
        "azure": "http://169.254.169.254/metadata/identity/oauth2/token",
        "aws": "http://169.254.169.254/latest/meta-data/iam/security-credentials/",
    }
    # Auto-detect cloud environment and fetch token
    return token

# Use with OAUTH authenticator
body = {"data": {"AUTHENTICATOR": "OAUTH", "TOKEN": wif_token, ...}}

External Browser - for interactive use (default fallback).

Connection configuration in ~/.snowflake/connections.toml:

# PAT authentication
[pat_connection]
account = "myorg-myaccount"
user = "myuser"
authenticator = "PROGRAMMATIC_ACCESS_TOKEN"
token_file_path = "~/.snowflake/pat_token"

# Private key authentication
[keypair_connection]
account = "myorg-myaccount"
user = "service_user"
private_key_file = "~/.snowflake/rsa_key.p8"
private_key_file_pwd = "optional_password"

# WIF authentication (for cloud workloads)
[wif_connection]
account = "myorg-myaccount"
user = "wif_user"
authenticator = "WIF"
wif_provider = "auto"  # or "gcp", "azure", "aws"

The token is then used for all subsequent API calls:

headers = {"Authorization": f'Snowflake Token="{self.token}"'}

2. Tool Registration System
#

The Cortex Agent API has two categories of tools:

Built-in Tools (server knows the schema):

BUILTIN_TOOL_TYPES = {"read", "write", "edit", "glob", "grep", "bash",
                      "web_search", "snowflake_sql_execute"}

# SQL executes client-side via Snowflake connector or REST API
# Only web_search truly executes server-side

def build_tools():
    tools = []
    for name in BUILTIN_TOOL_TYPES:
        tools.append({"tool_spec": {"type": name, "name": name}})
    return tools

Custom Tools (you define the schema):

# Adding a custom tool is straightforward
def my_custom_tool(args):
    """Your tool implementation."""
    param1 = args.get("param1")
    return f"Result: {param1}"

CUSTOM_TOOLS = {
    "my_tool": (
        "Description of what this tool does",
        {"param1": "string", "param2": "number?"},  # Schema
        my_custom_tool  # Handler function
    ),
}

3. The Agentic Loop
#

The core of any agent is the execution loop. Here’s the simplified pattern:

def chat(self, user_input: str):
    # 1. Build message with system prompt (first message only)
    content = [{"type": "text", "text": user_input}]
    if not self._system_prompt_sent:
        content.insert(0, {"type": "text", "text": system_prompt})
        self._system_prompt_sent = True

    self.messages.append({"role": "user", "content": content})

    # 2. Agentic loop - continues until no more tool calls
    while True:
        tool_calls = []

        # Stream response from Cortex Agent API
        for event in self._stream_call():
            if event["_type"] == "response.text.delta":
                print(event["text"], end="", flush=True)
            elif event["_type"] == "response.tool_use":
                tool_calls.append(event)

        # 3. Execute tools and collect results
        if not tool_calls:
            break  # No tools = conversation turn complete

        results = []
        for tc in tool_calls:
            if tc["client_side_execute"]:
                result = execute_client_tool(tc["name"], tc["input"])
            else:
                result = get_server_result(tc["tool_use_id"])
            results.append({"tool_use_id": tc["id"], "content": result})

        # 4. Send results back, loop continues
        self.messages.append({"role": "user", "content": results})

    # 5. Self-correction: reflect on the output
    if self.reflect and not is_reflection:
        self.chat(REFLECT_PROMPT, _reflect_iteration=1)

4. Self-Correction Loop (Reflection)
#

One of the most powerful patterns in agentic systems is self-correction. Nanocortex implements this with a reflection loop:

REFLECT_PROMPT = """Review your previous response. Did you fully complete
the user's request? Check for:
- Errors in tool outputs (SQL errors, file not found, etc.)
- Incomplete results or missing information
- Incorrect assumptions about the data or context

If everything looks correct, respond with just: LGTM
If there are issues to fix, explain briefly and take corrective action."""

def chat(self, user_input: str, _reflect_iteration: int = 0):
    is_reflection = _reflect_iteration > 0
    # ... main agentic loop ...

    # After completing the response, trigger reflection
    if self.reflect and not is_reflection \
       and _reflect_iteration < self._max_reflect_iterations:
        print("[reflecting...]")
        self.chat(REFLECT_PROMPT,
                  _reflect_iteration=_reflect_iteration + 1)

This can be toggled via:

  • CLI flag: --no-reflect to disable
  • Session command: /reflect or /r to toggle on/off

5. Interrupt and Clarify
#

Users need the ability to interrupt the agent when it’s going in the wrong direction. Nanocortex handles Ctrl+C gracefully:

def chat(self, user_input: str, _reflect_iteration: int = 0):
    interrupted = False
    # ...
    while True:
        try:
            for evt in self._call():
                # Process streaming events...
                pass
        except KeyboardInterrupt:
            interrupted = True
            print("\n[interrupted]")

            # Save partial progress
            if text_buf:
                assistant_content.append({
                    "type": "text",
                    "text": text_buf + " [interrupted]"
                })
            if assistant_content:
                self.messages.append({
                    "role": "assistant",
                    "content": assistant_content
                })

            # Allow user to clarify
            clarification = input(
                "Clarify or press Enter to continue: "
            ).strip()
            if clarification:
                self.messages.append({
                    "role": "user",
                    "content": [{
                        "type": "text",
                        "text": f"[User clarified]: {clarification}"
                    }]
                })
                continue  # Resume with clarification
            else:
                break  # Exit cleanly

This enables a powerful workflow:

  1. User asks: “List all tables”
  2. Agent starts querying ALL databases
  3. User presses Ctrl+C: “[interrupted]”
  4. User clarifies: “Only in the SALES schema”
  5. Agent continues with the correct scope

6. System Prompt Injection
#

The system prompt shapes the agent’s behavior. Nanocortex injects it with the first user message:

SYSTEM_PROMPT = """You are a coding assistant with access to these tools:
- bash: Execute shell commands
- read: Read file contents
- snowflake_sql_execute: Run SQL queries (ONE statement per call)

{snowflake_context}

Current working directory: {cwd}
"""

SNOWFLAKE_CONTEXT_PROMPT = """Snowflake Connection:
- Account: {account}
- Database: {database}
- Schema: {schema}
- Role: {role}

IMPORTANT: When a database/schema is set, query ONLY that context.
"""

7. Context Management
#

Real agents need to track state. Nanocortex manages Snowflake context:

class CortexAgent:
    def __init__(self):
        self.snowflake_context = {}
        self._pending_context_update = None

    def update_context(self, key, value):
        self.snowflake_context[key] = value
        # Queue update for next message
        self._pending_context_update = f"[Context: {key}={value}]"

Adding Custom Tools: A Practical Example
#

Let’s say you want to add a tool that queries a REST API:

# 1. Define the handler
def fetch_api(args):
    url = args.get("url")
    method = args.get("method", "GET")

    req = urllib.request.Request(url, method=method)
    resp = urllib.request.urlopen(req, timeout=30)
    return resp.read().decode()

# 2. Register it
CLIENT_TOOLS["fetch_api"] = (
    "Fetch data from a REST API endpoint",
    {"url": "string", "method": "string?"},
    fetch_api
)

# 3. Add to tool list (for custom tools, include full schema)
def build_tools():
    tools = []
    # Built-in tools
    for name in BUILTIN_TOOL_TYPES:
        tools.append({"tool_spec": {"type": name, "name": name}})

    # Custom tools with explicit schema
    tools.append({
        "tool_spec": {
            "type": "function",
            "name": "fetch_api",
            "description": "Fetch data from a REST API endpoint",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "The URL to fetch"
                    },
                    "method": {
                        "type": "string",
                        "description": "HTTP method"
                    }
                },
                "required": ["url"]
            }
        }
    })
    return tools

Advanced Patterns (DIY Extensions)
#

The following patterns are not implemented in Nanocortex but demonstrate how easily you can extend it for your own use cases.

Planning Mode
#

Before executing complex tasks, have the agent create a plan first:

PLAN_PROMPT = """Before executing, create a step-by-step plan:
1. List all steps needed to complete this task
2. Identify which tools you'll use for each step
3. Note any potential risks or side effects
4. Present the plan and wait for approval

Format your plan as:
## Plan
1. [Step 1] - tool: <tool_name>
2. [Step 2] - tool: <tool_name>
...

Then ask: "Proceed with this plan? (yes/no/modify)" """

class CortexAgent:
    def __init__(self):
        self.planning_mode = False

    def chat(self, user_input: str):
        if self.planning_mode \
           and not user_input.startswith("[APPROVED]"):
            # Inject planning requirement
            content = [{
                "type": "text",
                "text": PLAN_PROMPT + "\n\nUser request: " + user_input
            }]
        else:
            content = [{
                "type": "text",
                "text": user_input.replace("[APPROVED] ", "")
            }]
        # ... rest of chat logic

You could toggle with a /plan command:

if inp == "/plan":
    agent.planning_mode = not agent.planning_mode
    status = "ON" if agent.planning_mode else "OFF"
    print(f"Planning mode: {status}")
    continue

Approval Flow (Approve/Deny/Always)
#

Another useful extension — add human-in-the-loop approval for tool execution:

class ApprovalMode:
    ASK = "ask"           # Ask for each tool call
    ALWAYS = "always"     # Auto-approve everything
    NEVER = "never"       # Deny all tool calls (read-only mode)
    SAFE_ONLY = "safe"    # Auto-approve safe tools, ask for others

SAFE_TOOLS = {"read", "glob", "grep"}
DANGEROUS_TOOLS = {"bash", "write", "edit", "snowflake_sql_execute"}

class CortexAgent:
    def __init__(self):
        self.approval_mode = ApprovalMode.ASK
        self.approved_tools = set()

    def request_approval(self, tool_name: str, tool_input: dict) -> bool:
        """Request user approval for a tool call."""
        if self.approval_mode == ApprovalMode.ALWAYS:
            return True
        if self.approval_mode == ApprovalMode.NEVER:
            print(f"  [blocked: read-only mode]")
            return False
        if self.approval_mode == ApprovalMode.SAFE_ONLY:
            if tool_name in SAFE_TOOLS:
                return True
            if tool_name in self.approved_tools:
                return True

        # Show what's about to execute
        preview = json.dumps(tool_input, indent=2)[:200]
        print(f"\n  Tool: {tool_name}")
        print(f"  Input: {preview}")
        print(f"\n  [a]pprove / [d]eny / [A]lways / [D]eny all: ",
              end="")

        choice = input().strip().lower()
        if choice == 'a':
            return True
        elif choice == 'd':
            return False
        elif choice == 'A':
            self.approved_tools.add(tool_name)
            return True
        elif choice == 'D':
            self.approval_mode = ApprovalMode.NEVER
            return False
        else:
            return False

Skills and Memory Injection
#

You can extend Nanocortex with domain-specific skills:

SKILLS = {
    "data_analyst": """When analyzing data:
    1. Always check data types first with DESCRIBE TABLE
    2. Sample before aggregating large tables
    3. Use CTEs for complex queries""",

    "security_auditor": """When auditing:
    1. Check GRANTS on sensitive objects
    2. Review ACCOUNT_USAGE views
    3. Flag any public access patterns"""
}

def inject_skill(skill_name):
    if skill_name in SKILLS:
        agent._pending_context_update = \
            f"[Skill activated: {SKILLS[skill_name]}]"

Persistent Memory
#

Add a simple memory system:

import json
from pathlib import Path

MEMORY_FILE = Path("~/.nanocortex/memory.json").expanduser()

def save_memory(key, value):
    memory = json.loads(MEMORY_FILE.read_text()) \
        if MEMORY_FILE.exists() else {}
    memory[key] = value
    MEMORY_FILE.parent.mkdir(exist_ok=True)
    MEMORY_FILE.write_text(json.dumps(memory, indent=2))
    return f"Saved: {key}"

def recall_memory(key):
    memory = json.loads(MEMORY_FILE.read_text()) \
        if MEMORY_FILE.exists() else {}
    return memory.get(key, f"No memory found for: {key}")

# Register as tools
CLIENT_TOOLS["remember"] = (
    "Save information for later",
    {"key": "string", "value": "string"},
    lambda a: save_memory(a["key"], a["value"])
)
CLIENT_TOOLS["recall"] = (
    "Recall saved information",
    {"key": "string"},
    lambda a: recall_memory(a["key"])
)

Using Nanocortex as an SDK
#

You can import Nanocortex components into your own applications:

from nanocortex import CortexAgent, CLIENT_TOOLS, build_tools

# Create agent from connection config
agent = CortexAgent.from_connection(
    "my_connection", model="claude-sonnet-4-6"
)

# Configure reflection (self-correction)
agent.reflect = True  # Enable (default)

# Add custom tools
CLIENT_TOOLS["my_tool"] = ("My tool", {"param": "string"}, my_handler)

# Authenticate and fetch context
agent.authenticate()
agent.snowflake_context = agent.fetch_snowflake_context()

# Run programmatically
agent.chat("List all tables in the SALES schema")
print(agent.messages[-1])  # Get last response

Key Takeaways for Building Your Own Agent
#

  1. Start minimal — Nanocortex proves you can build a capable agent in ~1,600 lines. Don’t over-engineer.

  2. Client vs Server tools — Most tools (including SQL) run client-side for control. Only web_search executes server-side.

  3. The agentic loop is simple — Stream response, collect tool calls, execute, send results, repeat.

  4. System prompts matter — Invest time in crafting prompts that guide the model to use your tools correctly.

  5. Context is king — Track and inject context changes so the model stays aware of the current state.

  6. Streaming is essential — SSE streaming provides real-time feedback. Don’t wait for complete responses.

  7. Error handling at tool boundaries — Always return structured errors from tools so the model can recover.

Conclusion
#

Nanocortex demonstrates that building a Cortex Agent doesn’t require a complex framework. By studying its ~1,600 lines, you can learn:

  • How to authenticate with the Cortex Agent API
  • How to register and execute both built-in and custom tools
  • How to manage multi-turn conversations with context
  • How to stream responses for real-time interaction

Whether you’re building a coding assistant, a data analyst bot, or a custom workflow automation tool, Nanocortex provides the blueprint. Fork it, extend it, or use it as reference — the patterns are all there.


Nanocortex is open source: github.com/sfc-gh-kkeller/nanocortex

Quick Start
#

# Set up connection in ~/.snowflake/connections.toml
[myconnection]
account = "myorg-myaccount"
user = "myuser"
authenticator = "PROGRAMMATIC_ACCESS_TOKEN"
token_file_path = "~/.snowflake/pat_token"
warehouse = "COMPUTE_WH"

# Run nanocortex
python nanocortex.py -c myconnection

# Or with specific context
python nanocortex.py -c myconnection -d MY_DATABASE -s MY_SCHEMA

# Disable reflection (self-correction) if needed
python nanocortex.py -c myconnection --no-reflect

# In-session commands:
# /reflect or /r    - Toggle self-correction on/off
# /db <name>        - Switch database
# /schema <name>    - Switch schema
# /model            - Change model
# /c                - Clear conversation
# /clear-context    - Clear conversation context (or /cc)
# /clear-history    - Clear prompt history (or /ch)
# Ctrl+C            - Interrupt and clarify
# Up/Down arrows    - Navigate prompt history

How It Actually Works: The Full Flow Explained
#

If you are new to AI agents and want to understand what is actually happening under the hood — how a user’s natural language question turns into a SQL query, gets executed, and comes back as a formatted answer — this section walks through the entire flow step by step.

The Big Picture
#

An AI agent is just a loop. The LLM (large language model) receives a message, decides what to do, optionally calls tools to take actions, receives the results, and then decides again — until it has enough information to answer. That is the entire concept. Everything else is plumbing.

┌─────────────────────────────────────────────────────────────────────┐
│                         The Agentic Loop                             │
│                                                                     │
│   ┌──────────┐     ┌───────────────┐     ┌──────────────────────┐  │
│   │  User     │────▶│  Cortex Agent │────▶│  LLM (e.g. Claude)   │  │
│   │  Message  │     │  API          │     │                      │  │
│   └──────────┘     └───────────────┘     └──────────┬───────────┘  │
│                                                      │              │
│                          ┌───────────────────────────┘              │
│                          │                                          │
│                          ▼                                          │
│                    ┌─────────────┐                                  │
│                    │  LLM thinks │                                  │
│                    │  "Do I need │                                  │
│                    │  a tool?"   │                                  │
│                    └──────┬──────┘                                  │
│                           │                                         │
│                    ┌──────┴──────┐                                  │
│                    │             │                                  │
│                   YES            NO                                 │
│                    │             │                                  │
│                    ▼             ▼                                  │
│            ┌──────────────┐  ┌──────────────┐                      │
│            │ Call tool(s)  │  │ Return text  │                      │
│            │ e.g. execute  │  │ response to  │──── DONE             │
│            │ SQL query     │  │ user         │                      │
│            └──────┬───────┘  └──────────────┘                      │
│                   │                                                  │
│                   ▼                                                  │
│           ┌───────────────┐                                         │
│           │ Tool results  │                                         │
│           │ sent back to  │──── loop back to "LLM thinks"           │
│           │ the LLM       │                                         │
│           └───────────────┘                                         │
└─────────────────────────────────────────────────────────────────────┘

Step 1: Authentication and Context Gathering
#

When Nanocortex starts, it authenticates to Snowflake (via PAT, JWT, WIF, or browser SSO) and immediately queries the account for context:

-- Nanocortex runs these automatically at startup
SELECT CURRENT_WAREHOUSE(), CURRENT_DATABASE(), CURRENT_SCHEMA(), CURRENT_ROLE();
SHOW DATABASES;
SELECT CURRENT_VERSION();

This gives the agent its bearings — it now knows which account, database, schema, and role it is connected to. This information is critical for what happens next.

Step 2: The System Prompt — How the LLM Knows About Snowflake
#

The LLM does not inherently know anything about your Snowflake account. It knows SQL syntax and Snowflake functions from its training data, but it has no idea what tables you have or what database you are in.

Nanocortex solves this by injecting a system prompt with the first message. This is invisible to the user but shapes all of the LLM’s behavior:

You are a coding assistant with access to these tools:
- bash: Execute shell commands
- read: Read file contents
- snowflake_sql_execute: Run SQL queries (ONE statement per call)

Snowflake Connection:
- Account: myorg-myaccount
- Database: PRODUCTION_DB
- Schema: SALES
- Role: ANALYST_ROLE

IMPORTANT: When a database/schema is set, query ONLY that context.

Current working directory: /home/user/project

This is the key insight: the LLM is not magic — it is guided by the context you give it. The system prompt tells it which tools are available, what Snowflake environment it is in, and how it should behave. A well-crafted system prompt is the difference between an agent that works and one that hallucinates.

Step 3: User Asks a Question — Text to SQL
#

Here is where it gets interesting. Say the user types:

“Show me the top 10 customers by revenue this quarter”

The LLM receives this along with the system prompt context. It does not have a text-to-SQL module — there is no special translation layer. Instead, the LLM reasons:

  1. “The user wants customer data sorted by revenue. I am in the SALES schema of PRODUCTION_DB.”
  2. “I do not know the exact table names yet. Let me find out.”
  3. “I will call snowflake_sql_execute with SHOW TABLES IN SCHEMA SALES.”

The LLM generates a tool call — not the final query, but a discovery query first. This is what separates a good agent from a naive one: it explores before it acts.

Step 4: The Cortex Agent API — Streaming and Tool Calls
#

Nanocortex sends the conversation to the Cortex Agent API as a streaming request:

POST /api/v2/cortex/agent:run
Content-Type: application/json
Accept: text/event-stream

{
  "model": "claude-sonnet-4-6",
  "stream": true,
  "messages": [ ... ],
  "tools": [
    {"tool_spec": {"type": "snowflake_sql_execute", "name": "snowflake_sql_execute"}},
    {"tool_spec": {"type": "bash", "name": "bash"}},
    {"tool_spec": {"type": "read", "name": "read"}},
    ...
  ]
}

The API responds with Server-Sent Events (SSE) — a stream of small chunks that arrive in real time:

event: response.text.delta
data: {"text": "Let me look at the tables in your SALES schema.\n"}

event: response.tool_use
data: {
  "name": "snowflake_sql_execute",
  "input": {"query": "SHOW TABLES IN SCHEMA SALES"},
  "tool_use_id": "tool_abc123",
  "client_side_execute": true
}

Notice client_side_execute: true — this means Nanocortex runs the SQL locally, not the Cortex API. The LLM decided what to run, but your client executes it against Snowflake using your authenticated session. This is where the security advantage of custom agents comes in: the SQL runs with your credentials, in your network, under your control.

Step 5: Tool Execution and Result Return
#

Nanocortex executes the SQL via the Snowflake connector (or REST API as fallback) and sends the result back to the LLM:

{
  "role": "user",
  "content": [{
    "type": "tool_result",
    "tool_use_id": "tool_abc123",
    "content": [{"type": "text", "text": "name         | kind\nCUSTOMERS    | TABLE\nORDERS       | TABLE\nREVENUE_V    | VIEW"}]
  }]
}

The LLM now sees the table listing. It reasons again:

  1. “There is a CUSTOMERS table and an ORDERS table. Let me check the columns.”
  2. Calls snowflake_sql_execute with DESCRIBE TABLE CUSTOMERS
  3. Gets column info back: CUSTOMER_ID, NAME, REGION, ...
  4. Calls snowflake_sql_execute with DESCRIBE TABLE ORDERS
  5. Gets: ORDER_ID, CUSTOMER_ID, AMOUNT, ORDER_DATE, ...

Now the LLM has enough context to write the actual query:

SELECT c.NAME, SUM(o.AMOUNT) AS total_revenue
FROM CUSTOMERS c
JOIN ORDERS o ON c.CUSTOMER_ID = o.CUSTOMER_ID
WHERE o.ORDER_DATE >= DATE_TRUNC('quarter', CURRENT_DATE())
GROUP BY c.NAME
ORDER BY total_revenue DESC
LIMIT 10;

It calls snowflake_sql_execute one more time, gets the results, and formats them into a readable response for the user.

The entire text-to-SQL flow was 4-5 tool calls — discovery, schema inspection, and finally the actual query. The LLM figured out the schema on its own, joined the tables correctly, and applied the time filter. No hardcoded mappings, no pre-built SQL templates.

Step 6: The Loop Continues Until Done
#

The agentic loop keeps running as long as the LLM requests tool calls. Once it has the final answer and decides no more tools are needed, it emits only text — and the loop ends:

event: response.text.delta
data: {"text": "Here are your top 10 customers by revenue this quarter:\n\n"}

event: response.text.delta
data: {"text": "| Customer | Revenue |\n|---|---|\n| Acme Corp | $1.2M | ..."}

No tool calls → loop breaks → turn complete.

Step 7: Self-Correction (Reflection)
#

This is where Nanocortex goes beyond a basic agent. After the response is complete, if reflection is enabled, the agent automatically sends itself a review prompt:

Review your previous response. Did you fully complete the user's request?
Check for:
- Errors in tool outputs (SQL errors, file not found, etc.)
- Incomplete results or missing information
- Incorrect assumptions about the data or context

If everything looks correct, respond with just: LGTM
If there are issues to fix, explain briefly and take corrective action.

The LLM reviews its own work. If the SQL had an error — say it used REVENUE instead of AMOUNT — the reflection catches it:

  1. LLM sees the SQL error in the previous tool result
  2. Identifies the wrong column name
  3. Calls snowflake_sql_execute again with the corrected query
  4. Returns the fixed results

If everything was fine, the LLM simply responds “LGTM” and the conversation moves on. This self-correction loop is configurable (--no-reflect to disable, /reflect to toggle mid-session) and has a maximum iteration limit to prevent infinite loops.

Step 8: Context Updates Persist
#

When the user switches context mid-conversation — /db ANALYTICS_DB or /schema MARKETING — Nanocortex updates its internal state and injects a context message into the next API call:

[Context: database=ANALYTICS_DB]

The LLM now knows the environment has changed and adjusts its queries accordingly. This is how the agent maintains awareness of state across a multi-turn conversation.

Putting It All Together: A Complete Trace
#

Here is what happens end to end when a user asks “How many orders did we get last week?”:

1. USER: "How many orders did we get last week?"
2. NANOCORTEX: Builds request with system prompt + message history + tools
3. CORTEX API: Streams response via SSE
4. LLM (streaming): "Let me check your orders table."
5. LLM → tool call: snowflake_sql_execute("DESCRIBE TABLE ORDERS")
6. NANOCORTEX: Executes SQL locally → returns column info
7. LLM (streaming): "I can see the ORDERS table has an ORDER_DATE column."
8. LLM → tool call: snowflake_sql_execute(
       "SELECT COUNT(*) AS order_count
        FROM ORDERS
        WHERE ORDER_DATE >= DATEADD(week, -1, CURRENT_DATE())")
9. NANOCORTEX: Executes SQL locally → returns "order_count\n4,823"
10. LLM (streaming): "You received **4,823 orders** in the last week."
11. No more tool calls → loop ends
12. REFLECTION: "Review your response..."
13. LLM: "LGTM" → reflection ends
14. Done. User sees the answer.

The user typed one sentence. Under the hood, Nanocortex made 2 API round-trips to the Cortex Agent API (main turn + reflection), executed 2 SQL queries against Snowflake, and streamed the response in real time. The entire flow — from authentication to context injection to tool calling to self-correction — is handled by ~1,600 lines of Python.


Have questions or want to contribute? Open an issue or PR on GitHub.

Kevin Keller
Author
Kevin Keller
Personal blog about AI, Observability & Data Sovereignty. Snowflake-related articles explore the art of the possible and are not official Snowflake solutions or endorsed by Snowflake unless explicitly stated. Opinions are my own. Content is meant as educational inspiration, not production guidance.
Share this article

Related