Splunk + Snowflake: Building a Hybrid Security Data Lake with Federated Queries

Table of Contents

You can find all the code and demo scripts for this article on GitHub.

Better Together: Why Splunk + Snowflake
#

Splunk is the gold standard for real-time security monitoring. Its detection engine, Enterprise Security’s 1,400+ pre-built correlation searches, threat intelligence framework, and sub-second alerting are unmatched. Security teams rely on Splunk because it works — and it keeps getting better.

But as organizations grow, so does log volume. And not every log source needs sub-second alerting. Web access logs, DNS queries, cloud audit trails, netflow data — these are invaluable for investigations and compliance, but they don’t all need to live in the real-time detection tier.

This is where the Splunk + Snowflake partnership shines. Snowflake provides a complementary tier for cost-effective, long-term, queryable storage. Splunk and Snowflake recognized this together — it’s a better together story, not a replacement story.

The architecture is straightforward: send everything to Snowflake for deep storage and analytics. Send the critical, time-sensitive subset to Splunk for real-time detection. When Splunk analysts need historical context, they query Snowflake directly via federated search — without ever leaving the Splunk interface.

You keep Splunk’s world-class detection capabilities. You keep all your logs for years. And you optimize your overall security spend by 70-80%.

Architecture
#

                         ┌─────────────────────────────┐
                         │       Log Sources            │
                         │  (EDR, Firewall, DNS, WAF,   │
                         │   Cloud Audit, Web Access,   │
                         │   Netflow, Auth Logs)         │
                         └──────────────┬──────────────┘
                                        │
                                        ▼
                         ┌─────────────────────────────┐
                         │      Routing Layer           │
                         │  (Vector / Kafka / Cribl /   │
                         │       Fluent Bit)             │
                         └──────┬──────────────┬───────┘
                                │              │
                     Critical   │              │  ALL logs
                    (20-30%)    │              │  (100%)
                                ▼              ▼
                 ┌──────────────────┐  ┌──────────────────┐
                 │     Splunk       │  │    Snowflake      │
                 │                  │  │                   │
                 │  Real-time       │  │  Historical       │
                 │  detection &     │  │  investigation,   │
                 │  alerting        │  │  compliance,      │
                 │                  │  │  threat hunting    │
                 │  ~$150-200/GB/d  │  │  ~$23-40/TB/mo    │
                 └────────┬─────────┘  └───────▲──────────┘
                          │                    │
                          │   DB Connect       │
                          │   (dbxquery)       │
                          └────────────────────┘
                          Federated Search:
                          SQL runs on Snowflake compute,
                          only results return to Splunk

The routing layer is the key component. Tools like Vector, Cribl, Kafka, or Fluent Bit sit between your log sources and your destinations. They evaluate each event and route it to one or both destinations based on rules you define.

All logs flow to Snowflake — no exceptions. Only the critical subset flows to Splunk. This means Snowflake becomes your system of record, and Splunk becomes your real-time detection engine.

Cost Comparison
#

Here is what the numbers look like for a typical 100 GB/day ingestion scenario:

	Splunk-Only	Hybrid (Splunk + Snowflake)
Daily ingestion	100 GB to Splunk	20-30 GB to Splunk, 100 GB to Snowflake
Splunk license cost	~$15,000-20,000/mo	~$3,000-6,000/mo
Snowflake storage	—	~$70-120/mo (~3 TB/mo)
Snowflake compute	—	~$200-500/mo (ad-hoc queries)
Total monthly cost	~$15,000-20,000	~$3,500-6,500
Retention	30-90 days (typical)	30 days Splunk + years in Snowflake
Cost reduction	—	70-80%

The savings come from routing: Snowflake storage costs ~$23-40/TB/month with no ingestion fee, while Splunk’s value is in real-time detection where it commands a premium. By routing only the 20-30% of data that needs sub-second alerting to Splunk, you optimize spend for both platforms.

Detailed Snowflake Cost Breakdown
#

Component	Monthly Cost	Notes
Storage (100 GB/day × 30 days = 3 TB)	~$70-120	At $23-40/TB/month
Compute: Dynamic Tables pipeline (Small warehouse, ~8 hrs effective/day)	~$250-500	Auto-suspends between refreshes
Compute: Splunk DB Connect queries (XS warehouse, ~2 hrs/day)	~$60-120	On-demand for federated queries
Compute: Threat hunting (Medium warehouse, ~10 hrs/month)	~$40-80	Ad-hoc analyst queries
Snowpipe ingestion	~$50-100	Serverless, pay per file
Total Snowflake	~$500-900/mo	For 100 GB/day with active detection

Multi-Year Retention — The Compliance Multiplier
#

Compliance frameworks (GDPR, DORA, PCI DSS, HIPAA) often require 1-7 years of log retention. This is where the hybrid model delivers the biggest financial impact:

Retention	Splunk-Only (100 GB/day)	Snowflake Storage
90 days (3 TB)	Included in license	~$70-120/mo
1 year (36 TB)	Significantly increases license	~$830-1,440/mo
2 years (73 TB)	Often cost-prohibitive	~$1,680-2,920/mo
5 years (182 TB)	Not practical for most orgs	~$4,200-7,300/mo

With Snowflake, keeping 5 years of security logs is economically feasible — and those logs are queryable at any time, either directly or via Splunk’s federated search.

Snowflake Storage Lifecycle Policies — Automatic Tiering
#

Since November 2025, Snowflake offers Storage Lifecycle Policies (GA) that automatically move aging data to lower-cost storage tiers. This further reduces the cost of long-term security log retention:

Tier	Access Pattern	Cost	Availability
Standard	Frequently accessed (real-time queries, dashboards)	~$23-40/TB/month	All clouds
COOL	Infrequently accessed (quarterly investigations, audits)	Significantly lower than standard	AWS + Azure
COLD	Rarely accessed (annual compliance, legal hold)	Lowest cost tier	AWS only

Storage lifecycle policies are schema-level objects that archive or expire rows based on conditions you define — for example, move security logs older than 90 days to COOL, and logs older than 1 year to COLD:

-- Example: Tiered retention for security logs
CREATE STORAGE LIFECYCLE POLICY security_tiering
    ON TABLE security_data.access_logs
    ARCHIVE_TIER = 'COOL'
    ARCHIVE_AFTER_DAYS = 90
    EXPIRE_AFTER_DAYS = 730;  -- 2 years total retention

This means your first 90 days of security data are in standard (hot) storage for fast queries and detection pipelines. After 90 days, data automatically moves to COOL storage at reduced cost — still queryable but optimized for less frequent access. The data expires after 2 years (or whatever your compliance framework requires).

For security data lakes with multi-year retention, combining storage lifecycle policies with Apache Iceberg tables provides even more flexibility — Iceberg tables store data in your own cloud storage (S3, ADLS, GCS) while remaining fully queryable through Snowflake.

Documentation:

What Goes Where: Data Routing Strategy
#

The routing decision comes down to one question: does this event need sub-minute detection and alerting?

Splunk (Critical — 20-30% of volume)
#

Log Type	Why It Needs Splunk
EDR / endpoint alerts	Real-time threat detection, kill chain correlation
Authentication failures	Brute force detection, account lockout alerting
Firewall denies / blocks	Active attack detection, immediate response
High-severity SIEM alerts	Triage and escalation workflows
Privilege escalation events	Immediate investigation required
DLP alerts	Data exfiltration response

Snowflake (Everything — 100% of volume)
#

Log Type	Why It Goes to Snowflake
Web access logs	High volume, needed for historical investigation
DNS query logs	Threat hunting, C2 detection over time
Netflow / traffic metadata	Forensics, lateral movement analysis
Cloud audit trails (AWS CloudTrail, Azure Activity, GCP Audit)	Compliance, long-term investigation
Email gateway logs	Phishing campaign correlation
Proxy / URL filtering logs	Historical threat hunting
All of the above critical logs too	System of record, backup, long-term retention

The critical point: Snowflake gets everything. It is not an either/or. The routing layer duplicates critical events to both destinations. Snowflake is always the complete record.

Feature Comparison
#

Each platform brings distinct strengths. The key is using each for what it does best.

Capability	Splunk	Snowflake	Notes
Real-time alerting (< 1 min)	Excellent	Improving	Splunk excels here; Snowflake Dynamic Tables target ~1 min lag, triggered tasks can go sub-minute
SPL query language	Native	N/A	SPL is purpose-built for security; SQL requires different patterns but is more widely known
Splunk ES (SIEM framework)	Full support	Complementary	Notable models, risk-based alerting, adaptive response — Splunk-native. Snowflake enriches, not replaces
Threat intelligence	Built-in (ES), 9 intel collections	Marketplace (growing)	Splunk ES threat intel framework is mature; Snowflake Marketplace adds Criminal IP, Cybersixgill, and others
Historical investigation	Good (within retention window)	Excellent	Snowflake’s strength — cost-effective queries across years of data
Ad-hoc SQL analytics	SPL-based	Full SQL	Snowflake’s SQL engine excels at complex analytical queries and joins
Multi-year retention	Not its primary use case	Purpose-built	~$23-40/TB/month in Snowflake
Data sharing / marketplace	Splunkbase ecosystem	Snowflake Marketplace	Both have growing ecosystems; different strengths
ML / anomaly detection	MLTK, AI Toolkit, DSDL	Snowpark ML, Cortex AI	Splunk expanded ML with AI Toolkit and Deep Learning (DSDL); Snowflake offers Cortex LLMs and Snowpark ML
Compliance reporting	Good for event-based	Excellent for structured	SQL + BI tools complement Splunk’s event-based compliance monitoring
Dashboards	Purpose-built SOC dashboards	Streamlit, BI tools	Splunk is the standard for real-time SOC operations dashboards
Native cross-platform search	Federated Search for Snowflake (2025)	N/A	Splunk announced native Federated Search for Snowflake — query Snowflake data directly from SPL without DB Connect

The takeaway: Splunk Enterprise Security’s notable models, risk-based alerting, and adaptive response actions are core to any SOC that runs Splunk. The hybrid approach does not replace Splunk ES — it reduces the volume of data that needs to flow through the real-time tier, while making all historical data accessible to Splunk analysts via federated search.

New in 2025: Splunk announced Federated Search for Snowflake — a native integration that lets analysts query Snowflake data directly from the Splunk UI using SPL-like syntax, without DB Connect. This deepens the partnership and makes the hybrid architecture even more seamless.

Setting Up Splunk DB Connect with Snowflake
#

Before running federated queries, you need to connect Splunk Cloud to Snowflake via DB Connect. Here’s the step-by-step setup:

Step 1: Install DB Connect and JDBC Driver
#

Install Splunk DB Connect (v3.x or 4.x) from Splunkbase. Then install the Snowflake JDBC driver:

	Splunk Cloud	Splunk On-Premise
DB Connect install	Install via Splunk Web → Apps	Install via Splunk Web or CLI
JDBC driver	Pre-packaged as a Splunk add-on. During DB Connect connection setup, install the JDBC Driver for Snowflake add-on directly from within DB Connect — no manual download needed. Alternatively, install it from Splunkbase as an app.	Download the Snowflake JDBC JAR (v3.14.x+) from Maven and place it in `$SPLUNK_HOME/etc/apps/splunk_app_db_connect/drivers/`, then restart Splunk.
JRE	Pre-configured (Java 17)	Must install Java 17 or 21 and set the JRE path
Network access	Splunk Cloud can reach Snowflake over the internet by default	May require firewall rules to allow outbound HTTPS to `*.snowflakecomputing.com:443`
Authentication	Username/password or OAuth (v1.2.3+ supports OAuth Authorization Code)	Username/password, key-pair, or OAuth

Step 2: Configure General Settings
#

DB Connect requires Java 17 or 21. The JVM settings differ between Splunk Cloud and on-premise deployments.

Splunk Cloud (validated working configuration):

Setting	Value
JRE Path	`/usr/lib/jvm/java-17-openjdk-amd64`
Task Server Port	`9998`
Task Server JVM Options	`-Ddw.server.applicationConnectors[0].port=9998 --add-opens java.base/java.nio=ALL-UNNAMED`
Query Server JVM Options	`-Dport=9999 --add-opens java.base/java.nio=ALL-UNNAMED`

Splunk On-Premise — settings will vary depending on your environment:

Setting	Notes
JRE Path	Path to your Java 17 or 21 installation (e.g., `/opt/java/jdk-17/` on Linux, `C:\Program Files\Java\jdk-17\` on Windows)
Task Server Port	Default `9998` unless another service uses that port
Task/Query Server JVM Options	Same `--add-opens` flag is required. You may also need to adjust heap size: `-Xms512m -Xmx2048m` for large query workloads

Critical for both: The --add-opens java.base/java.nio=ALL-UNNAMED flag is required for Apache Arrow library compatibility with Java 17+. Without it, you’ll get NoClassDefFoundError exceptions when querying Snowflake.

Step 3: Create an Identity
#

Go to Configuration and create an Identity with the credentials for your Snowflake service account:

Username: SVC_SPLUNK (or your service account name)
Password: The service account password

Step 4: Create a Connection
#

Create a new connection with these parameters:

Parameter	Value	Notes
Connection Name	`snowflake_security`	Your choice
Identity	Select the identity from Step 3
Connection Type	Snowflake
Host	`<account_identifier>`	Account ID only — do NOT include `.snowflakecomputing.com`
Default Database	`SECURITY_DATA`	Your security database
Default Schema	`PUBLIC`	Or your schema

Step 5: Set Snowflake User Defaults
#

DB Connect doesn’t support passing warehouse, role, or namespace via JDBC URL parameters. Instead, set defaults on the Snowflake user:

-- Run this in Snowflake
ALTER USER SVC_SPLUNK SET
    DEFAULT_WAREHOUSE = 'SPLUNK_WH'
    DEFAULT_NAMESPACE = 'SECURITY_DATA.PUBLIC'
    DEFAULT_ROLE = 'SPLUNK_READER';

-- Grant minimum required privileges
CREATE ROLE IF NOT EXISTS SPLUNK_READER;
GRANT USAGE ON WAREHOUSE SPLUNK_WH TO ROLE SPLUNK_READER;
GRANT USAGE ON DATABASE SECURITY_DATA TO ROLE SPLUNK_READER;
GRANT USAGE ON SCHEMA SECURITY_DATA.PUBLIC TO ROLE SPLUNK_READER;
GRANT SELECT ON ALL TABLES IN SCHEMA SECURITY_DATA.PUBLIC TO ROLE SPLUNK_READER;
GRANT SELECT ON FUTURE TABLES IN SCHEMA SECURITY_DATA.PUBLIC TO ROLE SPLUNK_READER;
GRANT ROLE SPLUNK_READER TO USER SVC_SPLUNK;

Step 6: Test the Connection
#

In Splunk, run a test query:

| dbxquery connection="snowflake_security" query="SELECT CURRENT_USER(), CURRENT_ROLE(), CURRENT_WAREHOUSE()"

If you see your user, role, and warehouse returned, the connection is working.

Troubleshooting
#

Issue	Solution
`NoClassDefFoundError`	Add `--add-opens java.base/java.nio=ALL-UNNAMED` to JVM options
Queries timeout	Increase heap: add `-Xms512m -Xmx2048m` to JVM options
Wrong warehouse/role	Set defaults on the Snowflake user (Step 5)
Connection refused	Verify the account identifier format — no `.snowflakecomputing.com` suffix

Dynamic Tables: Near-Real-Time Detection in Snowflake
#

While Splunk handles sub-second alerting, Snowflake’s Dynamic Tables provide a powerful near-real-time detection layer for the data that lives in the Snowflake tier. Dynamic Tables automatically refresh based on a target lag — typically 1-5 minutes for security workloads — and chain together to form multi-stage detection pipelines.

Why Dynamic Tables Matter for Security
#

Traditional task-based pipelines require you to write scheduling logic, manage dependencies, and handle failures. Dynamic Tables handle all of this declaratively — you define the SQL transformation and the target lag, and Snowflake manages the rest.

Raw Logs (Snowpipe)
    │
    ▼  Target lag: 1 minute
┌─────────────────────────┐
│  Stage 1: Normalized     │  Parse, standardize timestamps, extract fields
│  (Dynamic Table)         │
└─────────────────────────┘
    │
    ▼  Target lag: 2 minutes
┌─────────────────────────┐
│  Stage 2: Enriched       │  Add geo-IP, threat intel, user context
│  (Dynamic Table)         │
└─────────────────────────┘
    │
    ▼  Target lag: 5 minutes
┌─────────────────────────┐
│  Stage 3: Alerts         │  Threshold detection, anomaly flags
│  (Dynamic Table)         │
└─────────────────────────┘
    │
    ▼
  Webhook notification → Splunk / SIEM / PagerDuty

Example: Brute Force Detection Pipeline
#

-- Stage 1: Normalize login events (target lag: 1 minute)
CREATE OR REPLACE DYNAMIC TABLE security_data.login_events_normalized
    TARGET_LAG = '1 minute'
    WAREHOUSE = SECURITY_WH
AS
SELECT
    event_timestamp,
    user_name,
    client_ip,
    is_success,
    first_authentication_factor,
    reported_client_type
FROM SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY
WHERE event_timestamp >= DATEADD('day', -1, CURRENT_TIMESTAMP());

-- Stage 2: Aggregate for brute force detection (target lag: 2 minutes)
CREATE OR REPLACE DYNAMIC TABLE security_data.brute_force_candidates
    TARGET_LAG = '2 minutes'
    WAREHOUSE = SECURITY_WH
AS
SELECT
    client_ip,
    user_name,
    COUNT(*) AS failed_attempts,
    MIN(event_timestamp) AS first_attempt,
    MAX(event_timestamp) AS last_attempt,
    DATEDIFF('minute', MIN(event_timestamp), MAX(event_timestamp)) AS window_minutes
FROM security_data.login_events_normalized
WHERE is_success = 'NO'
  AND event_timestamp >= DATEADD('hour', -1, CURRENT_TIMESTAMP())
GROUP BY client_ip, user_name
HAVING failed_attempts >= 5;

Warehouse Sizing for Security Workloads
#

Warehouse Size	Use Case	Approximate Cost/Hour
X-Small	Light queries, Splunk DB Connect reads	~$1-2/hour
Small	Dynamic Table refreshes for moderate log volume	~$2-4/hour
Medium	High-volume detection pipelines, threat hunting	~$4-8/hour

Cost optimization tip: Use a dedicated SECURITY_WH warehouse with AUTO_SUSPEND = 60 (1 minute). Dynamic Table refreshes will auto-resume the warehouse when needed and it suspends automatically between refresh cycles. For periodic threat hunting from Splunk via DB Connect, use a separate SPLUNK_WH so costs are tracked independently.

CREATE WAREHOUSE SECURITY_WH
    WAREHOUSE_SIZE = 'SMALL'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE
    COMMENT = 'Security detection pipeline - Dynamic Tables';

CREATE WAREHOUSE SPLUNK_WH
    WAREHOUSE_SIZE = 'XSMALL'
    AUTO_SUSPEND = 60
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE
    COMMENT = 'Splunk DB Connect federated queries';

Federated Queries from Splunk via DB Connect
#

With the connection set up, Splunk analysts can query Snowflake directly from the Splunk UI using the dbxquery command.

The key benefit is aggregation pushdown. The SQL query runs on Snowflake’s compute, and only the result set comes back to Splunk. You are not pulling raw logs into Splunk — you are pulling aggregated answers.

Example 1: Basic Historical Fetch
#

Pull the last 24 hours of web access logs from Snowflake for a specific source IP under investigation:

| dbxquery
    connection="snowflake_security"
    query="
        SELECT timestamp, src_ip, dst_ip, method, uri, status_code, user_agent
        FROM security_lake.web_access_logs
        WHERE src_ip = '10.0.45.12'
          AND timestamp > DATEADD('hour', -24, CURRENT_TIMESTAMP())
        ORDER BY timestamp DESC
        LIMIT 10000
    "
| table timestamp, src_ip, dst_ip, method, uri, status_code, user_agent

Example 2: Security Overview Dashboard (Aggregation Pushdown)
#

Build a SOC dashboard panel that shows threat categories over time — the heavy aggregation runs on Snowflake, and Splunk just renders the results:

| dbxquery
    connection="snowflake_security"
    query="
        SELECT
            DATE_TRUNC('hour', timestamp) AS hour,
            CASE
                WHEN status_code >= 400 AND status_code < 500 THEN 'client_error'
                WHEN status_code >= 500 THEN 'server_error'
                WHEN uri ILIKE '%/admin%' THEN 'admin_access'
                ELSE 'normal'
            END AS category,
            COUNT(*) AS event_count
        FROM security_lake.web_access_logs
        WHERE timestamp > DATEADD('day', -7, CURRENT_TIMESTAMP())
        GROUP BY 1, 2
        ORDER BY 1 DESC
    "
| timechart span=1h sum(event_count) by category

Example 3: Threat Classification (SQL Injection, XSS Detection)
#

Run a threat hunting query across months of historical logs — something that would be impractical in Splunk at this retention depth:

| dbxquery
    connection="snowflake_security"
    query="
        SELECT
            src_ip,
            CASE
                WHEN uri ILIKE '%UNION%SELECT%' OR uri ILIKE '%1=1%' OR uri ILIKE '%DROP%TABLE%'
                    THEN 'sql_injection'
                WHEN uri ILIKE '%<script%' OR uri ILIKE '%javascript:%' OR uri ILIKE '%onerror=%'
                    THEN 'xss'
                WHEN uri ILIKE '%../%' OR uri ILIKE '%/etc/passwd%'
                    THEN 'path_traversal'
                WHEN uri ILIKE '%/admin%' AND status_code = 200
                    THEN 'admin_access'
                ELSE 'other'
            END AS threat_type,
            COUNT(*) AS attempt_count,
            MIN(timestamp) AS first_seen,
            MAX(timestamp) AS last_seen,
            COUNT(DISTINCT uri) AS unique_uris
        FROM security_lake.web_access_logs
        WHERE timestamp > DATEADD('month', -6, CURRENT_TIMESTAMP())
          AND (
              uri ILIKE '%UNION%SELECT%'
              OR uri ILIKE '%<script%'
              OR uri ILIKE '%../%'
              OR uri ILIKE '%/etc/passwd%'
              OR (uri ILIKE '%/admin%' AND status_code = 200)
          )
        GROUP BY 1, 2
        HAVING attempt_count > 5
        ORDER BY attempt_count DESC
        LIMIT 100
    "
| table src_ip, threat_type, attempt_count, first_seen, last_seen, unique_uris
| sort -attempt_count

Example 4: Hybrid Correlation (Splunk + Snowflake Join)
#

Correlate real-time Splunk alerts with historical Snowflake data. For example, take today’s Splunk EDR alerts and look up whether those source IPs have a history of suspicious activity in Snowflake:

index=edr_alerts severity="critical" earliest=-1h
| stats count AS alert_count, values(alert_name) AS alerts BY src_ip
| map search="
    | dbxquery
        connection=\"snowflake_security\"
        query=\"
            SELECT
                '$src_ip$' AS src_ip,
                COUNT(*) AS historical_events,
                SUM(CASE WHEN uri ILIKE '%UNION%SELECT%' OR uri ILIKE '%<script%' THEN 1 ELSE 0 END) AS suspicious_requests,
                MIN(timestamp) AS first_seen_in_logs,
                MAX(timestamp) AS last_seen_in_logs
            FROM security_lake.web_access_logs
            WHERE src_ip = '$src_ip$'
              AND timestamp > DATEADD('month', -3, CURRENT_TIMESTAMP())
        \"
    "
| table src_ip, alert_count, alerts, historical_events, suspicious_requests, first_seen_in_logs, last_seen_in_logs

Example 5: Scheduled Compliance Report
#

Run a weekly compliance query that counts authentication events by outcome — useful for auditors who need evidence of monitoring:

| dbxquery
    connection="snowflake_security"
    query="
        SELECT
            DATE_TRUNC('day', timestamp) AS day,
            event_type,
            outcome,
            COUNT(*) AS event_count,
            COUNT(DISTINCT src_ip) AS unique_sources
        FROM security_lake.auth_events
        WHERE timestamp > DATEADD('day', -30, CURRENT_TIMESTAMP())
        GROUP BY 1, 2, 3
        ORDER BY 1 DESC, 4 DESC
    "
| table day, event_type, outcome, event_count, unique_sources

Demo Walkthrough
#

The GitHub repository includes a complete demo that generates realistic security log data, loads it into Snowflake, and provides the queries to federate from Splunk.

What the Demo Does
#

Generates 100,000 realistic web access log events using Python — including normal traffic, automated scanners, and injected attack patterns
Injects realistic attack signatures including SQL injection attempts, XSS payloads, path traversal, and privilege escalation patterns
Uploads the data to Snowflake for querying
Provides ready-to-use SPL queries for federated search from Splunk

Running the Demo
#

The project uses pixi for dependency management. To get started:

git clone https://github.com/sfc-gh-kkeller/splunk-snowflake-security-datalake.git
cd splunk-snowflake-security-datalake
pixi install

Generate the sample data:

pixi run generate

This creates a dataset with events spread across multiple days, including:

Normal web traffic — GET/POST requests to typical endpoints with realistic user agents and status codes
SQL injection attempts — UNION SELECT, 1=1, DROP TABLE, OR 1=1-- patterns injected into URI parameters
XSS payloads — <script>, javascript:, onerror= patterns in request URIs
Path traversal — ../../etc/passwd, ..\\windows\\system32 attempts
Privilege escalation — repeated access attempts to /admin, /api/admin, /console endpoints with mixed success/failure status codes

Upload to Snowflake:

pixi run upload

This creates the security_lake.web_access_logs table in your Snowflake account and loads the generated events. You will need Snowflake credentials configured (the repo README covers the details).

Once the data is in Snowflake, configure Splunk DB Connect with your Snowflake connection and run the federated queries shown above.

When to Use This Pattern (and When Not To)
#

Good fit
#

High log volume (50+ GB/day) — the cost savings scale linearly with volume; below 50 GB/day the complexity may not be worth it
Compliance requirements for multi-year retention — regulators want 1-3 years of logs; Snowflake makes this economically feasible
Threat hunting on historical data — your analysts need to search across months of logs, not just the Splunk retention window
Budget pressure on SIEM costs — if your Splunk renewal is the single largest line item in your security budget, this is worth exploring
Teams already using Snowflake — if your org has Snowflake for data warehousing, adding security logs is a natural extension

Not a good fit
#

Small log volumes (< 10 GB/day) — the operational complexity of running two platforms is not justified at low volumes
Heavy reliance on Splunk ES notable models — if your detection logic depends on Splunk ES’s correlation searches and risk framework running against all log sources, splitting data will break those detections
Real-time-only use case — if you never do historical investigation and only care about real-time alerts, the Snowflake side adds cost without value
No engineering capacity for the routing layer — someone needs to build and maintain the routing rules; if your team is already stretched thin, this is not a weekend project

Trade-offs to understand
#

Federated query latency: dbxquery calls to Snowflake take seconds, not milliseconds. This is fine for investigation and dashboards but not for real-time correlation rules.
Two platforms to operate: You now have two systems to monitor, secure, and keep running. Make sure you have the operational capacity.
Routing logic complexity: Deciding what is “critical” requires ongoing tuning. Start conservative (send more to Splunk) and reduce over time as you gain confidence.
Splunk ES content dependencies: Some Splunk ES correlation searches reference specific log sources. Before routing a source to Snowflake-only, audit which ES searches depend on it and adjust your routing strategy accordingly.

Splunk + Snowflake: Building a Hybrid Security Data Lake with Federated Queries

Better Together: Why Splunk + Snowflake
#

Architecture
#

Cost Comparison
#

Detailed Snowflake Cost Breakdown
#

Multi-Year Retention — The Compliance Multiplier
#

Snowflake Storage Lifecycle Policies — Automatic Tiering
#

What Goes Where: Data Routing Strategy
#

Splunk (Critical — 20-30% of volume)
#

Snowflake (Everything — 100% of volume)
#

Feature Comparison
#

Setting Up Splunk DB Connect with Snowflake
#

Step 1: Install DB Connect and JDBC Driver
#

Step 2: Configure General Settings
#

Step 3: Create an Identity
#

Step 4: Create a Connection
#

Step 5: Set Snowflake User Defaults
#

Step 6: Test the Connection
#

Troubleshooting
#

Dynamic Tables: Near-Real-Time Detection in Snowflake
#

Why Dynamic Tables Matter for Security
#

Example: Brute Force Detection Pipeline
#

Warehouse Sizing for Security Workloads
#

Federated Queries from Splunk via DB Connect
#

Example 1: Basic Historical Fetch
#

Example 2: Security Overview Dashboard (Aggregation Pushdown)
#

Example 3: Threat Classification (SQL Injection, XSS Detection)
#

Example 4: Hybrid Correlation (Splunk + Snowflake Join)
#

Example 5: Scheduled Compliance Report
#

Demo Walkthrough
#

What the Demo Does
#

Running the Demo
#

When to Use This Pattern (and When Not To)
#

Good fit
#

Not a good fit
#

Trade-offs to understand
#

Related Articles
#

Related

Better Together: Why Splunk + Snowflake#

Architecture#

Cost Comparison#

Detailed Snowflake Cost Breakdown#

Multi-Year Retention — The Compliance Multiplier#

Snowflake Storage Lifecycle Policies — Automatic Tiering#

What Goes Where: Data Routing Strategy#

Splunk (Critical — 20-30% of volume)#

Snowflake (Everything — 100% of volume)#

Feature Comparison#

Setting Up Splunk DB Connect with Snowflake#

Step 1: Install DB Connect and JDBC Driver#

Step 2: Configure General Settings#

Step 3: Create an Identity#

Step 4: Create a Connection#

Step 5: Set Snowflake User Defaults#

Step 6: Test the Connection#

Troubleshooting#

Dynamic Tables: Near-Real-Time Detection in Snowflake#

Why Dynamic Tables Matter for Security#

Example: Brute Force Detection Pipeline#

Warehouse Sizing for Security Workloads#

Federated Queries from Splunk via DB Connect#

Example 1: Basic Historical Fetch#

Example 2: Security Overview Dashboard (Aggregation Pushdown)#

Example 3: Threat Classification (SQL Injection, XSS Detection)#

Example 4: Hybrid Correlation (Splunk + Snowflake Join)#

Example 5: Scheduled Compliance Report#

Demo Walkthrough#

What the Demo Does#

Running the Demo#

When to Use This Pattern (and When Not To)#

Good fit#

Not a good fit#

Trade-offs to understand#

Related Articles#

Related

Better Together: Why Splunk + Snowflake
#

Architecture
#

Cost Comparison
#

Detailed Snowflake Cost Breakdown
#

Multi-Year Retention — The Compliance Multiplier
#

Snowflake Storage Lifecycle Policies — Automatic Tiering
#

What Goes Where: Data Routing Strategy
#

Splunk (Critical — 20-30% of volume)
#

Snowflake (Everything — 100% of volume)
#

Feature Comparison
#

Setting Up Splunk DB Connect with Snowflake
#

Step 1: Install DB Connect and JDBC Driver
#

Step 2: Configure General Settings
#

Step 3: Create an Identity
#

Step 4: Create a Connection
#

Step 5: Set Snowflake User Defaults
#

Step 6: Test the Connection
#

Troubleshooting
#

Dynamic Tables: Near-Real-Time Detection in Snowflake
#

Why Dynamic Tables Matter for Security
#

Example: Brute Force Detection Pipeline
#

Warehouse Sizing for Security Workloads
#

Federated Queries from Splunk via DB Connect
#

Example 1: Basic Historical Fetch
#

Example 2: Security Overview Dashboard (Aggregation Pushdown)
#

Example 3: Threat Classification (SQL Injection, XSS Detection)
#

Example 4: Hybrid Correlation (Splunk + Snowflake Join)
#

Example 5: Scheduled Compliance Report
#

Demo Walkthrough
#

What the Demo Does
#

Running the Demo
#

When to Use This Pattern (and When Not To)
#

Good fit
#

Not a good fit
#

Trade-offs to understand
#

Related Articles
#