lakesh: A DuckDB SQL Shell for Iceberg REST Catalogs with MCP Support

Table of Contents

A DuckDB-powered SQL shell for Iceberg REST catalogs and DuckLake metastores — with OAuth2, profile switching, psql-style ergonomics, and an MCP server for LLM agents

lakesh is open source: github.com/KellerKev/lakesh

lakesh is a DuckDB-powered SQL shell built for working with Iceberg REST catalogs and DuckLake metastores. DuckDB does the heavy lifting — Parquet reads, predicate pushdown, joins across namespaces. lakesh adds the ergonomics that make daily catalog work fast: multiple named profiles, OAuth2 token fetch and reuse, S3 credential plumbing pre-configured correctly, psql-style meta-commands, and an MCP server so LLM agents can query your catalogs through the same plumbing.

Tested against duckicelake; works against any Iceberg REST catalog — Polaris, Nessie, Lakekeeper, managed REST — or any DuckLake catalog.

What This Project Is
#

lakesh brings named connection profiles, automatic OAuth2 token management, correctly pre-configured S3 credential plumbing, scriptable one-shot queries with JSON or CSV output, and namespace-scoped autocomplete to DuckDB-based catalog work.

Individually those are thin conveniences. Together they remove enough friction that day-to-day catalog work becomes meaningfully faster — especially when switching between a local duckicelake dev catalog, a staging catalog, and a production catalog in the same session.

Business Problems This Showcases
#

Multi-catalog development workflows. Engineers who work across multiple Iceberg catalogs (local dev, staging, prod) without re-pasting ATTACH statements and OAuth tokens in every session. Profile switching is one flag: lakesh -p prod.

LLM agents over your data lake. The MCP server exposes query, list_tables, describe_table, and list_namespaces to any MCP client — Claude Desktop, Cline, Continue. Read-only by default, so LLM-driven SQL is safe to enable. A data team can give their AI assistant direct catalog access without building custom tooling.

Scripted data pipelines. The exec mode outputs clean JSON or CSV for downstream consumption — lakesh exec -f json -q 'SELECT ...' | jq ... works as expected, no REPL noise.

DuckLake direct access alongside Iceberg REST. Both profile types share the same shell: Iceberg REST for catalog-layer reads, DuckLake direct for full read/write access (INSERT / UPDATE / DELETE) — one tool for both paths.

The Demo
#

REPL session against a local duckicelake catalog — profile switching, \d / \l meta-commands, autocomplete, and a query through the Iceberg REST → DuckDB iceberg-ext path.

Profile Types
#

lakesh has two profile types, both sharing the same REPL and commands.

Iceberg REST — connects via the Iceberg REST Catalog API. OAuth2 token fetch and reuse happen automatically per session. S3 credential plumbing is pre-configured correctly for both cloud and self-hosted MinIO deployments.

default = "local"

[profiles.local]
uri       = "http://127.0.0.1:8181"
warehouse = "lake"

[profiles.local.s3]
endpoint   = "http://127.0.0.1:9000"
region     = "us-east-1"
access_key = "minioadmin"
secret_key = "minioadmin"
path_style = true

[profiles.local.oauth]
client_id     = "demo-client"
client_secret = "demo-secret"

DuckLake direct — bypasses the Iceberg REST layer entirely. Attaches the DuckLake catalog via Postgres + S3 directly. Useful for writes, since the iceberg extension does not support INSERT / UPDATE / DELETE over REST.

[profiles.lake_direct]
type         = "ducklake"
postgres_dsn = "dbname=ducklake host=/path/.pgsock port=55432 user=ducklake"
data_path    = "s3://lakehouse/data/"
catalog      = "lake"

Secrets (client_id, client_secret, access_key, postgres_dsn) can all be sourced from environment variables via *_env siblings — no credentials in config files for production profiles.

The REPL
#

Inside the REPL, \-prefixed lines are meta-commands rather than SQL:

\?                         help
\l                         list namespaces
\d                         list all tables
\d <ns>                    list tables in one namespace
\d <ns>.<tbl>              describe a table (columns + types)
\timing [on|off]           toggle elapsed-time reporting
\format [table|json|csv]   change result format
\q                         quit

SQL autocomplete is scoped to the attached catalog. History persists across sessions in $XDG_STATE_HOME/lakesh/history. Multi-line statements terminate on ;.

The MCP Server
#

lakesh mcp runs a Model Context Protocol server on stdio. Configure any MCP client to spawn it, and the LLM gets these tools:

Tool	Purpose
`list_profiles()`	Discover configured catalogs
`list_namespaces(profile=None)`	List schemas
`list_tables(profile=None, namespace=None)`	List tables, optionally scoped
`describe_table(namespace, table, profile=None)`	Columns + types + nullability
`query(sql, profile=None, limit=1000, format="json")`	Run SQL, return results

query is read-only by default — it rejects anything that does not start with SELECT / SHOW / DESCRIBE / WITH / EXPLAIN. Set LAKESH_MCP_WRITE=1 in the server’s environment to unlock writes. Keeps LLM-driven SQL safe by default.

Claude Desktop config:

{
  "mcpServers": {
    "lakesh": {
      "command": "lakesh",
      "args": ["mcp"],
      "env": {
        "LAKESH_CONFIG": "/Users/you/.config/lakesh/config.toml"
      }
    }
  }
}

Same shape for Cline, Continue, and any other MCP client — point them at lakesh mcp on stdio.

Getting Started
#

pip install -e '.[dev]'

# Write an example config:
lakesh config init

# Verify connectivity:
lakesh doctor

# Drop into the REPL:
lakesh

# One-shot query:
lakesh exec -q 'SELECT COUNT(*) FROM analytics.events'

# JSON output for pipes:
lakesh exec -f json -q 'SHOW TABLES' | jq '.[].table_name'

To run against a local duckicelake stack:

cd ../duckicelake
pixi run backends-up && pixi run ducklake-init && pixi run serve &
cd ../lakesh && lakesh

Command Reference
#

Command	Purpose
`lakesh`	REPL against default profile
`lakesh -p <name>`	REPL against a named profile
`lakesh exec -q '<sql>'`	One-shot query, table output
`lakesh exec -f json -q '<sql>'`	One-shot query, JSON output
`lakesh exec -f csv -q '<sql>'`	One-shot query, CSV output
`lakesh doctor`	Connectivity smoke test
`lakesh mcp`	Run as MCP server on stdio
`lakesh profiles list`	List configured profiles
`lakesh config init`	Write an example config

lakesh is open source: github.com/KellerKev/lakesh

The Iceberg REST catalog it pairs with: github.com/KellerKev/duckicelake

Author

Kevin Keller

Personal blog about AI, Observability & Data Sovereignty. Snowflake-related articles explore the art of the possible and are not official Snowflake solutions or endorsed by Snowflake unless explicitly stated. Opinions are my own. Content is meant as educational inspiration, not production guidance.

Share this article

What This Project Is#

Business Problems This Showcases#

The Demo#

Profile Types#

The REPL#

The MCP Server#

Getting Started#

Command Reference#

Related