You can find all the code and demo scripts for this article on GitHub.
Better Together: Why Splunk + Snowflake#
Splunk is the gold standard for real-time security monitoring. Its detection engine, Enterprise Security’s 1,400+ pre-built correlation searches, threat intelligence framework, and sub-second alerting are unmatched. Security teams rely on Splunk because it works — and it keeps getting better.
But as organizations grow, so does log volume. And not every log source needs sub-second alerting. Web access logs, DNS queries, cloud audit trails, netflow data — these are invaluable for investigations and compliance, but they don’t all need to live in the real-time detection tier.
This is where the Splunk + Snowflake partnership shines. Snowflake provides a complementary tier for cost-effective, long-term, queryable storage. Splunk and Snowflake recognized this together — it’s a better together story, not a replacement story.
The architecture is straightforward: send everything to Snowflake for deep storage and analytics. Send the critical, time-sensitive subset to Splunk for real-time detection. When Splunk analysts need historical context, they query Snowflake directly via federated search — without ever leaving the Splunk interface.
You keep Splunk’s world-class detection capabilities. You keep all your logs for years. And you optimize your overall security spend by 70-80%.
Architecture#
┌─────────────────────────────┐
│ Log Sources │
│ (EDR, Firewall, DNS, WAF, │
│ Cloud Audit, Web Access, │
│ Netflow, Auth Logs) │
└──────────────┬──────────────┘
│
▼
┌─────────────────────────────┐
│ Routing Layer │
│ (Vector / Kafka / Cribl / │
│ Fluent Bit) │
└──────┬──────────────┬───────┘
│ │
Critical │ │ ALL logs
(20-30%) │ │ (100%)
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Splunk │ │ Snowflake │
│ │ │ │
│ Real-time │ │ Historical │
│ detection & │ │ investigation, │
│ alerting │ │ compliance, │
│ │ │ threat hunting │
│ ~$150-200/GB/d │ │ ~$23-40/TB/mo │
└────────┬─────────┘ └───────▲──────────┘
│ │
│ DB Connect │
│ (dbxquery) │
└────────────────────┘
Federated Search:
SQL runs on Snowflake compute,
only results return to SplunkThe routing layer is the key component. Tools like Vector, Cribl, Kafka, or Fluent Bit sit between your log sources and your destinations. They evaluate each event and route it to one or both destinations based on rules you define.
All logs flow to Snowflake — no exceptions. Only the critical subset flows to Splunk. This means Snowflake becomes your system of record, and Splunk becomes your real-time detection engine.
Cost Comparison#
Here is what the numbers look like for a typical 100 GB/day ingestion scenario:
| Splunk-Only | Hybrid (Splunk + Snowflake) | |
|---|---|---|
| Daily ingestion | 100 GB to Splunk | 20-30 GB to Splunk, 100 GB to Snowflake |
| Splunk license cost | ~$15,000-20,000/mo | ~$3,000-6,000/mo |
| Snowflake storage | — | ~$70-120/mo (~3 TB/mo) |
| Snowflake compute | — | ~$200-500/mo (ad-hoc queries) |
| Total monthly cost | ~$15,000-20,000 | ~$3,500-6,500 |
| Retention | 30-90 days (typical) | 30 days Splunk + years in Snowflake |
| Cost reduction | — | 70-80% |
The savings come from routing: Snowflake storage costs ~$23-40/TB/month with no ingestion fee, while Splunk’s value is in real-time detection where it commands a premium. By routing only the 20-30% of data that needs sub-second alerting to Splunk, you optimize spend for both platforms.
Detailed Snowflake Cost Breakdown#
| Component | Monthly Cost | Notes |
|---|---|---|
| Storage (100 GB/day × 30 days = 3 TB) | ~$70-120 | At $23-40/TB/month |
| Compute: Dynamic Tables pipeline (Small warehouse, ~8 hrs effective/day) | ~$250-500 | Auto-suspends between refreshes |
| Compute: Splunk DB Connect queries (XS warehouse, ~2 hrs/day) | ~$60-120 | On-demand for federated queries |
| Compute: Threat hunting (Medium warehouse, ~10 hrs/month) | ~$40-80 | Ad-hoc analyst queries |
| Snowpipe ingestion | ~$50-100 | Serverless, pay per file |
| Total Snowflake | ~$500-900/mo | For 100 GB/day with active detection |
Multi-Year Retention — The Compliance Multiplier#
Compliance frameworks (GDPR, DORA, PCI DSS, HIPAA) often require 1-7 years of log retention. This is where the hybrid model delivers the biggest financial impact:
| Retention | Splunk-Only (100 GB/day) | Snowflake Storage |
|---|---|---|
| 90 days (3 TB) | Included in license | ~$70-120/mo |
| 1 year (36 TB) | Significantly increases license | ~$830-1,440/mo |
| 2 years (73 TB) | Often cost-prohibitive | ~$1,680-2,920/mo |
| 5 years (182 TB) | Not practical for most orgs | ~$4,200-7,300/mo |
With Snowflake, keeping 5 years of security logs is economically feasible — and those logs are queryable at any time, either directly or via Splunk’s federated search.
Snowflake Storage Lifecycle Policies — Automatic Tiering#
Since November 2025, Snowflake offers Storage Lifecycle Policies (GA) that automatically move aging data to lower-cost storage tiers. This further reduces the cost of long-term security log retention:
| Tier | Access Pattern | Cost | Availability |
|---|---|---|---|
| Standard | Frequently accessed (real-time queries, dashboards) | ~$23-40/TB/month | All clouds |
| COOL | Infrequently accessed (quarterly investigations, audits) | Significantly lower than standard | AWS + Azure |
| COLD | Rarely accessed (annual compliance, legal hold) | Lowest cost tier | AWS only |
Storage lifecycle policies are schema-level objects that archive or expire rows based on conditions you define — for example, move security logs older than 90 days to COOL, and logs older than 1 year to COLD:
-- Example: Tiered retention for security logs
CREATE STORAGE LIFECYCLE POLICY security_tiering
ON TABLE security_data.access_logs
ARCHIVE_TIER = 'COOL'
ARCHIVE_AFTER_DAYS = 90
EXPIRE_AFTER_DAYS = 730; -- 2 years total retentionThis means your first 90 days of security data are in standard (hot) storage for fast queries and detection pipelines. After 90 days, data automatically moves to COOL storage at reduced cost — still queryable but optimized for less frequent access. The data expires after 2 years (or whatever your compliance framework requires).
For security data lakes with multi-year retention, combining storage lifecycle policies with Apache Iceberg tables provides even more flexibility — Iceberg tables store data in your own cloud storage (S3, ADLS, GCS) while remaining fully queryable through Snowflake.
Documentation:
- Storage Lifecycle Policies
- Create and Manage Storage Lifecycle Policies
- Billing for Storage Lifecycle Policies
- Apache Iceberg Tables
What Goes Where: Data Routing Strategy#
The routing decision comes down to one question: does this event need sub-minute detection and alerting?
Splunk (Critical — 20-30% of volume)#
| Log Type | Why It Needs Splunk |
|---|---|
| EDR / endpoint alerts | Real-time threat detection, kill chain correlation |
| Authentication failures | Brute force detection, account lockout alerting |
| Firewall denies / blocks | Active attack detection, immediate response |
| High-severity SIEM alerts | Triage and escalation workflows |
| Privilege escalation events | Immediate investigation required |
| DLP alerts | Data exfiltration response |
Snowflake (Everything — 100% of volume)#
| Log Type | Why It Goes to Snowflake |
|---|---|
| Web access logs | High volume, needed for historical investigation |
| DNS query logs | Threat hunting, C2 detection over time |
| Netflow / traffic metadata | Forensics, lateral movement analysis |
| Cloud audit trails (AWS CloudTrail, Azure Activity, GCP Audit) | Compliance, long-term investigation |
| Email gateway logs | Phishing campaign correlation |
| Proxy / URL filtering logs | Historical threat hunting |
| All of the above critical logs too | System of record, backup, long-term retention |
The critical point: Snowflake gets everything. It is not an either/or. The routing layer duplicates critical events to both destinations. Snowflake is always the complete record.
Feature Comparison#
Each platform brings distinct strengths. The key is using each for what it does best.
| Capability | Splunk | Snowflake | Notes |
|---|---|---|---|
| Real-time alerting (< 1 min) | Excellent | Improving | Splunk excels here; Snowflake Dynamic Tables target ~1 min lag, triggered tasks can go sub-minute |
| SPL query language | Native | N/A | SPL is purpose-built for security; SQL requires different patterns but is more widely known |
| Splunk ES (SIEM framework) | Full support | Complementary | Notable models, risk-based alerting, adaptive response — Splunk-native. Snowflake enriches, not replaces |
| Threat intelligence | Built-in (ES), 9 intel collections | Marketplace (growing) | Splunk ES threat intel framework is mature; Snowflake Marketplace adds Criminal IP, Cybersixgill, and others |
| Historical investigation | Good (within retention window) | Excellent | Snowflake’s strength — cost-effective queries across years of data |
| Ad-hoc SQL analytics | SPL-based | Full SQL | Snowflake’s SQL engine excels at complex analytical queries and joins |
| Multi-year retention | Not its primary use case | Purpose-built | ~$23-40/TB/month in Snowflake |
| Data sharing / marketplace | Splunkbase ecosystem | Snowflake Marketplace | Both have growing ecosystems; different strengths |
| ML / anomaly detection | MLTK, AI Toolkit, DSDL | Snowpark ML, Cortex AI | Splunk expanded ML with AI Toolkit and Deep Learning (DSDL); Snowflake offers Cortex LLMs and Snowpark ML |
| Compliance reporting | Good for event-based | Excellent for structured | SQL + BI tools complement Splunk’s event-based compliance monitoring |
| Dashboards | Purpose-built SOC dashboards | Streamlit, BI tools | Splunk is the standard for real-time SOC operations dashboards |
| Native cross-platform search | Federated Search for Snowflake (2025) | N/A | Splunk announced native Federated Search for Snowflake — query Snowflake data directly from SPL without DB Connect |
The takeaway: Splunk Enterprise Security’s notable models, risk-based alerting, and adaptive response actions are core to any SOC that runs Splunk. The hybrid approach does not replace Splunk ES — it reduces the volume of data that needs to flow through the real-time tier, while making all historical data accessible to Splunk analysts via federated search.
New in 2025: Splunk announced Federated Search for Snowflake — a native integration that lets analysts query Snowflake data directly from the Splunk UI using SPL-like syntax, without DB Connect. This deepens the partnership and makes the hybrid architecture even more seamless.
Setting Up Splunk DB Connect with Snowflake#
Before running federated queries, you need to connect Splunk Cloud to Snowflake via DB Connect. Here’s the step-by-step setup:
Step 1: Install DB Connect and JDBC Driver#
Install Splunk DB Connect (v3.x or 4.x) from Splunkbase. Then install the Snowflake JDBC driver:
| Splunk Cloud | Splunk On-Premise | |
|---|---|---|
| DB Connect install | Install via Splunk Web → Apps | Install via Splunk Web or CLI |
| JDBC driver | Pre-packaged as a Splunk add-on. During DB Connect connection setup, install the JDBC Driver for Snowflake add-on directly from within DB Connect — no manual download needed. Alternatively, install it from Splunkbase as an app. | Download the Snowflake JDBC JAR (v3.14.x+) from Maven and place it in $SPLUNK_HOME/etc/apps/splunk_app_db_connect/drivers/, then restart Splunk. |
| JRE | Pre-configured (Java 17) | Must install Java 17 or 21 and set the JRE path |
| Network access | Splunk Cloud can reach Snowflake over the internet by default | May require firewall rules to allow outbound HTTPS to *.snowflakecomputing.com:443 |
| Authentication | Username/password or OAuth (v1.2.3+ supports OAuth Authorization Code) | Username/password, key-pair, or OAuth |
Step 2: Configure General Settings#
DB Connect requires Java 17 or 21. The JVM settings differ between Splunk Cloud and on-premise deployments.
Splunk Cloud (validated working configuration):
| Setting | Value |
|---|---|
| JRE Path | /usr/lib/jvm/java-17-openjdk-amd64 |
| Task Server Port | 9998 |
| Task Server JVM Options | -Ddw.server.applicationConnectors[0].port=9998 --add-opens java.base/java.nio=ALL-UNNAMED |
| Query Server JVM Options | -Dport=9999 --add-opens java.base/java.nio=ALL-UNNAMED |
Splunk On-Premise — settings will vary depending on your environment:
| Setting | Notes |
|---|---|
| JRE Path | Path to your Java 17 or 21 installation (e.g., /opt/java/jdk-17/ on Linux, C:\Program Files\Java\jdk-17\ on Windows) |
| Task Server Port | Default 9998 unless another service uses that port |
| Task/Query Server JVM Options | Same --add-opens flag is required. You may also need to adjust heap size: -Xms512m -Xmx2048m for large query workloads |
Critical for both: The
--add-opens java.base/java.nio=ALL-UNNAMEDflag is required for Apache Arrow library compatibility with Java 17+. Without it, you’ll getNoClassDefFoundErrorexceptions when querying Snowflake.
Step 3: Create an Identity#
Go to Configuration and create an Identity with the credentials for your Snowflake service account:
- Username:
SVC_SPLUNK(or your service account name) - Password: The service account password
Step 4: Create a Connection#
Create a new connection with these parameters:
| Parameter | Value | Notes |
|---|---|---|
| Connection Name | snowflake_security | Your choice |
| Identity | Select the identity from Step 3 | |
| Connection Type | Snowflake | |
| Host | <account_identifier> | Account ID only — do NOT include .snowflakecomputing.com |
| Default Database | SECURITY_DATA | Your security database |
| Default Schema | PUBLIC | Or your schema |
Step 5: Set Snowflake User Defaults#
DB Connect doesn’t support passing warehouse, role, or namespace via JDBC URL parameters. Instead, set defaults on the Snowflake user:
-- Run this in Snowflake
ALTER USER SVC_SPLUNK SET
DEFAULT_WAREHOUSE = 'SPLUNK_WH'
DEFAULT_NAMESPACE = 'SECURITY_DATA.PUBLIC'
DEFAULT_ROLE = 'SPLUNK_READER';
-- Grant minimum required privileges
CREATE ROLE IF NOT EXISTS SPLUNK_READER;
GRANT USAGE ON WAREHOUSE SPLUNK_WH TO ROLE SPLUNK_READER;
GRANT USAGE ON DATABASE SECURITY_DATA TO ROLE SPLUNK_READER;
GRANT USAGE ON SCHEMA SECURITY_DATA.PUBLIC TO ROLE SPLUNK_READER;
GRANT SELECT ON ALL TABLES IN SCHEMA SECURITY_DATA.PUBLIC TO ROLE SPLUNK_READER;
GRANT SELECT ON FUTURE TABLES IN SCHEMA SECURITY_DATA.PUBLIC TO ROLE SPLUNK_READER;
GRANT ROLE SPLUNK_READER TO USER SVC_SPLUNK;Step 6: Test the Connection#
In Splunk, run a test query:
| dbxquery connection="snowflake_security" query="SELECT CURRENT_USER(), CURRENT_ROLE(), CURRENT_WAREHOUSE()"If you see your user, role, and warehouse returned, the connection is working.
Troubleshooting#
| Issue | Solution |
|---|---|
NoClassDefFoundError | Add --add-opens java.base/java.nio=ALL-UNNAMED to JVM options |
| Queries timeout | Increase heap: add -Xms512m -Xmx2048m to JVM options |
| Wrong warehouse/role | Set defaults on the Snowflake user (Step 5) |
| Connection refused | Verify the account identifier format — no .snowflakecomputing.com suffix |
Dynamic Tables: Near-Real-Time Detection in Snowflake#
While Splunk handles sub-second alerting, Snowflake’s Dynamic Tables provide a powerful near-real-time detection layer for the data that lives in the Snowflake tier. Dynamic Tables automatically refresh based on a target lag — typically 1-5 minutes for security workloads — and chain together to form multi-stage detection pipelines.
Why Dynamic Tables Matter for Security#
Traditional task-based pipelines require you to write scheduling logic, manage dependencies, and handle failures. Dynamic Tables handle all of this declaratively — you define the SQL transformation and the target lag, and Snowflake manages the rest.
Raw Logs (Snowpipe)
│
▼ Target lag: 1 minute
┌─────────────────────────┐
│ Stage 1: Normalized │ Parse, standardize timestamps, extract fields
│ (Dynamic Table) │
└─────────────────────────┘
│
▼ Target lag: 2 minutes
┌─────────────────────────┐
│ Stage 2: Enriched │ Add geo-IP, threat intel, user context
│ (Dynamic Table) │
└─────────────────────────┘
│
▼ Target lag: 5 minutes
┌─────────────────────────┐
│ Stage 3: Alerts │ Threshold detection, anomaly flags
│ (Dynamic Table) │
└─────────────────────────┘
│
▼
Webhook notification → Splunk / SIEM / PagerDutyExample: Brute Force Detection Pipeline#
-- Stage 1: Normalize login events (target lag: 1 minute)
CREATE OR REPLACE DYNAMIC TABLE security_data.login_events_normalized
TARGET_LAG = '1 minute'
WAREHOUSE = SECURITY_WH
AS
SELECT
event_timestamp,
user_name,
client_ip,
is_success,
first_authentication_factor,
reported_client_type
FROM SNOWFLAKE.ACCOUNT_USAGE.LOGIN_HISTORY
WHERE event_timestamp >= DATEADD('day', -1, CURRENT_TIMESTAMP());
-- Stage 2: Aggregate for brute force detection (target lag: 2 minutes)
CREATE OR REPLACE DYNAMIC TABLE security_data.brute_force_candidates
TARGET_LAG = '2 minutes'
WAREHOUSE = SECURITY_WH
AS
SELECT
client_ip,
user_name,
COUNT(*) AS failed_attempts,
MIN(event_timestamp) AS first_attempt,
MAX(event_timestamp) AS last_attempt,
DATEDIFF('minute', MIN(event_timestamp), MAX(event_timestamp)) AS window_minutes
FROM security_data.login_events_normalized
WHERE is_success = 'NO'
AND event_timestamp >= DATEADD('hour', -1, CURRENT_TIMESTAMP())
GROUP BY client_ip, user_name
HAVING failed_attempts >= 5;Warehouse Sizing for Security Workloads#
| Warehouse Size | Use Case | Approximate Cost/Hour |
|---|---|---|
| X-Small | Light queries, Splunk DB Connect reads | ~$1-2/hour |
| Small | Dynamic Table refreshes for moderate log volume | ~$2-4/hour |
| Medium | High-volume detection pipelines, threat hunting | ~$4-8/hour |
Cost optimization tip: Use a dedicated SECURITY_WH warehouse with AUTO_SUSPEND = 60 (1 minute). Dynamic Table refreshes will auto-resume the warehouse when needed and it suspends automatically between refresh cycles. For periodic threat hunting from Splunk via DB Connect, use a separate SPLUNK_WH so costs are tracked independently.
CREATE WAREHOUSE SECURITY_WH
WAREHOUSE_SIZE = 'SMALL'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE
COMMENT = 'Security detection pipeline - Dynamic Tables';
CREATE WAREHOUSE SPLUNK_WH
WAREHOUSE_SIZE = 'XSMALL'
AUTO_SUSPEND = 60
AUTO_RESUME = TRUE
INITIALLY_SUSPENDED = TRUE
COMMENT = 'Splunk DB Connect federated queries';Federated Queries from Splunk via DB Connect#
With the connection set up, Splunk analysts can query Snowflake directly from the Splunk UI using the dbxquery command.
The key benefit is aggregation pushdown. The SQL query runs on Snowflake’s compute, and only the result set comes back to Splunk. You are not pulling raw logs into Splunk — you are pulling aggregated answers.
Example 1: Basic Historical Fetch#
Pull the last 24 hours of web access logs from Snowflake for a specific source IP under investigation:
| dbxquery
connection="snowflake_security"
query="
SELECT timestamp, src_ip, dst_ip, method, uri, status_code, user_agent
FROM security_lake.web_access_logs
WHERE src_ip = '10.0.45.12'
AND timestamp > DATEADD('hour', -24, CURRENT_TIMESTAMP())
ORDER BY timestamp DESC
LIMIT 10000
"
| table timestamp, src_ip, dst_ip, method, uri, status_code, user_agentExample 2: Security Overview Dashboard (Aggregation Pushdown)#
Build a SOC dashboard panel that shows threat categories over time — the heavy aggregation runs on Snowflake, and Splunk just renders the results:
| dbxquery
connection="snowflake_security"
query="
SELECT
DATE_TRUNC('hour', timestamp) AS hour,
CASE
WHEN status_code >= 400 AND status_code < 500 THEN 'client_error'
WHEN status_code >= 500 THEN 'server_error'
WHEN uri ILIKE '%/admin%' THEN 'admin_access'
ELSE 'normal'
END AS category,
COUNT(*) AS event_count
FROM security_lake.web_access_logs
WHERE timestamp > DATEADD('day', -7, CURRENT_TIMESTAMP())
GROUP BY 1, 2
ORDER BY 1 DESC
"
| timechart span=1h sum(event_count) by categoryExample 3: Threat Classification (SQL Injection, XSS Detection)#
Run a threat hunting query across months of historical logs — something that would be impractical in Splunk at this retention depth:
| dbxquery
connection="snowflake_security"
query="
SELECT
src_ip,
CASE
WHEN uri ILIKE '%UNION%SELECT%' OR uri ILIKE '%1=1%' OR uri ILIKE '%DROP%TABLE%'
THEN 'sql_injection'
WHEN uri ILIKE '%<script%' OR uri ILIKE '%javascript:%' OR uri ILIKE '%onerror=%'
THEN 'xss'
WHEN uri ILIKE '%../%' OR uri ILIKE '%/etc/passwd%'
THEN 'path_traversal'
WHEN uri ILIKE '%/admin%' AND status_code = 200
THEN 'admin_access'
ELSE 'other'
END AS threat_type,
COUNT(*) AS attempt_count,
MIN(timestamp) AS first_seen,
MAX(timestamp) AS last_seen,
COUNT(DISTINCT uri) AS unique_uris
FROM security_lake.web_access_logs
WHERE timestamp > DATEADD('month', -6, CURRENT_TIMESTAMP())
AND (
uri ILIKE '%UNION%SELECT%'
OR uri ILIKE '%<script%'
OR uri ILIKE '%../%'
OR uri ILIKE '%/etc/passwd%'
OR (uri ILIKE '%/admin%' AND status_code = 200)
)
GROUP BY 1, 2
HAVING attempt_count > 5
ORDER BY attempt_count DESC
LIMIT 100
"
| table src_ip, threat_type, attempt_count, first_seen, last_seen, unique_uris
| sort -attempt_countExample 4: Hybrid Correlation (Splunk + Snowflake Join)#
Correlate real-time Splunk alerts with historical Snowflake data. For example, take today’s Splunk EDR alerts and look up whether those source IPs have a history of suspicious activity in Snowflake:
index=edr_alerts severity="critical" earliest=-1h
| stats count AS alert_count, values(alert_name) AS alerts BY src_ip
| map search="
| dbxquery
connection=\"snowflake_security\"
query=\"
SELECT
'$src_ip$' AS src_ip,
COUNT(*) AS historical_events,
SUM(CASE WHEN uri ILIKE '%UNION%SELECT%' OR uri ILIKE '%<script%' THEN 1 ELSE 0 END) AS suspicious_requests,
MIN(timestamp) AS first_seen_in_logs,
MAX(timestamp) AS last_seen_in_logs
FROM security_lake.web_access_logs
WHERE src_ip = '$src_ip$'
AND timestamp > DATEADD('month', -3, CURRENT_TIMESTAMP())
\"
"
| table src_ip, alert_count, alerts, historical_events, suspicious_requests, first_seen_in_logs, last_seen_in_logsExample 5: Scheduled Compliance Report#
Run a weekly compliance query that counts authentication events by outcome — useful for auditors who need evidence of monitoring:
| dbxquery
connection="snowflake_security"
query="
SELECT
DATE_TRUNC('day', timestamp) AS day,
event_type,
outcome,
COUNT(*) AS event_count,
COUNT(DISTINCT src_ip) AS unique_sources
FROM security_lake.auth_events
WHERE timestamp > DATEADD('day', -30, CURRENT_TIMESTAMP())
GROUP BY 1, 2, 3
ORDER BY 1 DESC, 4 DESC
"
| table day, event_type, outcome, event_count, unique_sourcesDemo Walkthrough#
The GitHub repository includes a complete demo that generates realistic security log data, loads it into Snowflake, and provides the queries to federate from Splunk.
What the Demo Does#
- Generates 100,000 realistic web access log events using Python — including normal traffic, automated scanners, and injected attack patterns
- Injects realistic attack signatures including SQL injection attempts, XSS payloads, path traversal, and privilege escalation patterns
- Uploads the data to Snowflake for querying
- Provides ready-to-use SPL queries for federated search from Splunk
Running the Demo#
The project uses pixi for dependency management. To get started:
git clone https://github.com/sfc-gh-kkeller/splunk-snowflake-security-datalake.git
cd splunk-snowflake-security-datalake
pixi installGenerate the sample data:
pixi run generateThis creates a dataset with events spread across multiple days, including:
- Normal web traffic — GET/POST requests to typical endpoints with realistic user agents and status codes
- SQL injection attempts —
UNION SELECT,1=1,DROP TABLE,OR 1=1--patterns injected into URI parameters - XSS payloads —
<script>,javascript:,onerror=patterns in request URIs - Path traversal —
../../etc/passwd,..\\windows\\system32attempts - Privilege escalation — repeated access attempts to
/admin,/api/admin,/consoleendpoints with mixed success/failure status codes
Upload to Snowflake:
pixi run uploadThis creates the security_lake.web_access_logs table in your Snowflake account and loads the generated events. You will need Snowflake credentials configured (the repo README covers the details).
Once the data is in Snowflake, configure Splunk DB Connect with your Snowflake connection and run the federated queries shown above.
When to Use This Pattern (and When Not To)#
Good fit#
- High log volume (50+ GB/day) — the cost savings scale linearly with volume; below 50 GB/day the complexity may not be worth it
- Compliance requirements for multi-year retention — regulators want 1-3 years of logs; Snowflake makes this economically feasible
- Threat hunting on historical data — your analysts need to search across months of logs, not just the Splunk retention window
- Budget pressure on SIEM costs — if your Splunk renewal is the single largest line item in your security budget, this is worth exploring
- Teams already using Snowflake — if your org has Snowflake for data warehousing, adding security logs is a natural extension
Not a good fit#
- Small log volumes (< 10 GB/day) — the operational complexity of running two platforms is not justified at low volumes
- Heavy reliance on Splunk ES notable models — if your detection logic depends on Splunk ES’s correlation searches and risk framework running against all log sources, splitting data will break those detections
- Real-time-only use case — if you never do historical investigation and only care about real-time alerts, the Snowflake side adds cost without value
- No engineering capacity for the routing layer — someone needs to build and maintain the routing rules; if your team is already stretched thin, this is not a weekend project
Trade-offs to understand#
- Federated query latency:
dbxquerycalls to Snowflake take seconds, not milliseconds. This is fine for investigation and dashboards but not for real-time correlation rules. - Two platforms to operate: You now have two systems to monitor, secure, and keep running. Make sure you have the operational capacity.
- Routing logic complexity: Deciding what is “critical” requires ongoing tuning. Start conservative (send more to Splunk) and reduce over time as you gain confidence.
- Splunk ES content dependencies: Some Splunk ES correlation searches reference specific log sources. Before routing a source to Snowflake-only, audit which ES searches depend on it and adjust your routing strategy accordingly.
Related Articles#
- How to Let Snowflake Raise Security Incidents in Your SIEM or XDR Automatically — automate incident creation in Splunk/Sentinel directly from Snowflake
- Ingest OpenTelemetry Logs, Traces and Metrics Directly into Snowflake — build an observability data lake alongside your security data lake
- Data Sovereignty: Querying On-Prem Iceberg from Snowflake — keep sensitive security logs on-prem while still querying from Snowflake
