Skip to main content
  1. Posts/

lakesh: A DuckDB SQL Shell for Iceberg REST Catalogs with MCP Support

A DuckDB-powered SQL shell for Iceberg REST catalogs and DuckLake metastores — with OAuth2, profile switching, psql-style ergonomics, and an MCP server for LLM agents

lakesh is open source: github.com/KellerKev/lakesh


lakesh — duck captain steering a tugboat across the duckicelake

lakesh is a DuckDB-powered SQL shell built for working with Iceberg REST catalogs and DuckLake metastores. DuckDB does the heavy lifting — Parquet reads, predicate pushdown, joins across namespaces. lakesh adds the ergonomics that make daily catalog work fast: multiple named profiles, OAuth2 token fetch and reuse, S3 credential plumbing pre-configured correctly, psql-style meta-commands, and an MCP server so LLM agents can query your catalogs through the same plumbing.

Tested against duckicelake; works against any Iceberg REST catalog — Polaris, Nessie, Lakekeeper, managed REST — or any DuckLake catalog.


What This Project Is
#

lakesh brings named connection profiles, automatic OAuth2 token management, correctly pre-configured S3 credential plumbing, scriptable one-shot queries with JSON or CSV output, and namespace-scoped autocomplete to DuckDB-based catalog work.

Individually those are thin conveniences. Together they remove enough friction that day-to-day catalog work becomes meaningfully faster — especially when switching between a local duckicelake dev catalog, a staging catalog, and a production catalog in the same session.


Business Problems This Showcases
#

Multi-catalog development workflows. Engineers who work across multiple Iceberg catalogs (local dev, staging, prod) without re-pasting ATTACH statements and OAuth tokens in every session. Profile switching is one flag: lakesh -p prod.

LLM agents over your data lake. The MCP server exposes query, list_tables, describe_table, and list_namespaces to any MCP client — Claude Desktop, Cline, Continue. Read-only by default, so LLM-driven SQL is safe to enable. A data team can give their AI assistant direct catalog access without building custom tooling.

Scripted data pipelines. The exec mode outputs clean JSON or CSV for downstream consumption — lakesh exec -f json -q 'SELECT ...' | jq ... works as expected, no REPL noise.

DuckLake direct access alongside Iceberg REST. Both profile types share the same shell: Iceberg REST for catalog-layer reads, DuckLake direct for full read/write access (INSERT / UPDATE / DELETE) — one tool for both paths.


The Demo
#

REPL session against a local duckicelake catalog — profile switching, \d / \l meta-commands, autocomplete, and a query through the Iceberg REST → DuckDB iceberg-ext path.

lakesh REPL demo — profile switching, meta-commands, live query

Profile Types
#

lakesh has two profile types, both sharing the same REPL and commands.

Iceberg REST — connects via the Iceberg REST Catalog API. OAuth2 token fetch and reuse happen automatically per session. S3 credential plumbing is pre-configured correctly for both cloud and self-hosted MinIO deployments.

default = "local"

[profiles.local]
uri       = "http://127.0.0.1:8181"
warehouse = "lake"

[profiles.local.s3]
endpoint   = "http://127.0.0.1:9000"
region     = "us-east-1"
access_key = "minioadmin"
secret_key = "minioadmin"
path_style = true

[profiles.local.oauth]
client_id     = "demo-client"
client_secret = "demo-secret"

DuckLake direct — bypasses the Iceberg REST layer entirely. Attaches the DuckLake catalog via Postgres + S3 directly. Useful for writes, since the iceberg extension does not support INSERT / UPDATE / DELETE over REST.

[profiles.lake_direct]
type         = "ducklake"
postgres_dsn = "dbname=ducklake host=/path/.pgsock port=55432 user=ducklake"
data_path    = "s3://lakehouse/data/"
catalog      = "lake"

Secrets (client_id, client_secret, access_key, postgres_dsn) can all be sourced from environment variables via *_env siblings — no credentials in config files for production profiles.


The REPL
#

Inside the REPL, \-prefixed lines are meta-commands rather than SQL:

\?                         help
\l                         list namespaces
\d                         list all tables
\d <ns>                    list tables in one namespace
\d <ns>.<tbl>              describe a table (columns + types)
\timing [on|off]           toggle elapsed-time reporting
\format [table|json|csv]   change result format
\q                         quit

SQL autocomplete is scoped to the attached catalog. History persists across sessions in $XDG_STATE_HOME/lakesh/history. Multi-line statements terminate on ;.


The MCP Server
#

lakesh mcp runs a Model Context Protocol server on stdio. Configure any MCP client to spawn it, and the LLM gets these tools:

ToolPurpose
list_profiles()Discover configured catalogs
list_namespaces(profile=None)List schemas
list_tables(profile=None, namespace=None)List tables, optionally scoped
describe_table(namespace, table, profile=None)Columns + types + nullability
query(sql, profile=None, limit=1000, format="json")Run SQL, return results

query is read-only by default — it rejects anything that does not start with SELECT / SHOW / DESCRIBE / WITH / EXPLAIN. Set LAKESH_MCP_WRITE=1 in the server’s environment to unlock writes. Keeps LLM-driven SQL safe by default.

Claude Desktop config:

{
  "mcpServers": {
    "lakesh": {
      "command": "lakesh",
      "args": ["mcp"],
      "env": {
        "LAKESH_CONFIG": "/Users/you/.config/lakesh/config.toml"
      }
    }
  }
}

Same shape for Cline, Continue, and any other MCP client — point them at lakesh mcp on stdio.


Getting Started
#

pip install -e '.[dev]'

# Write an example config:
lakesh config init

# Verify connectivity:
lakesh doctor

# Drop into the REPL:
lakesh

# One-shot query:
lakesh exec -q 'SELECT COUNT(*) FROM analytics.events'

# JSON output for pipes:
lakesh exec -f json -q 'SHOW TABLES' | jq '.[].table_name'

To run against a local duckicelake stack:

cd ../duckicelake
pixi run backends-up && pixi run ducklake-init && pixi run serve &
cd ../lakesh && lakesh

Command Reference
#

CommandPurpose
lakeshREPL against default profile
lakesh -p <name>REPL against a named profile
lakesh exec -q '<sql>'One-shot query, table output
lakesh exec -f json -q '<sql>'One-shot query, JSON output
lakesh exec -f csv -q '<sql>'One-shot query, CSV output
lakesh doctorConnectivity smoke test
lakesh mcpRun as MCP server on stdio
lakesh profiles listList configured profiles
lakesh config initWrite an example config

lakesh is open source: github.com/KellerKev/lakesh

The Iceberg REST catalog it pairs with: github.com/KellerKev/duckicelake

Kevin Keller
Author
Kevin Keller
Personal blog about AI, Observability & Data Sovereignty. Snowflake-related articles explore the art of the possible and are not official Snowflake solutions or endorsed by Snowflake unless explicitly stated. Opinions are my own. Content is meant as educational inspiration, not production guidance.
Share this article

Related