A DuckDB-powered SQL shell for Iceberg REST catalogs and DuckLake metastores — with OAuth2, profile switching, psql-style ergonomics, and an MCP server for LLM agents
lakesh is open source: github.com/KellerKev/lakesh
lakesh is a DuckDB-powered SQL shell built for working with Iceberg REST catalogs and DuckLake metastores. DuckDB does the heavy lifting — Parquet reads, predicate pushdown, joins across namespaces. lakesh adds the ergonomics that make daily catalog work fast: multiple named profiles, OAuth2 token fetch and reuse, S3 credential plumbing pre-configured correctly, psql-style meta-commands, and an MCP server so LLM agents can query your catalogs through the same plumbing.
Tested against duckicelake; works against any Iceberg REST catalog — Polaris, Nessie, Lakekeeper, managed REST — or any DuckLake catalog.
What This Project Is#
lakesh brings named connection profiles, automatic OAuth2 token management, correctly pre-configured S3 credential plumbing, scriptable one-shot queries with JSON or CSV output, and namespace-scoped autocomplete to DuckDB-based catalog work.
Individually those are thin conveniences. Together they remove enough friction that day-to-day catalog work becomes meaningfully faster — especially when switching between a local duckicelake dev catalog, a staging catalog, and a production catalog in the same session.
Business Problems This Showcases#
Multi-catalog development workflows. Engineers who work across multiple Iceberg catalogs (local dev, staging, prod) without re-pasting ATTACH statements and OAuth tokens in every session. Profile switching is one flag: lakesh -p prod.
LLM agents over your data lake. The MCP server exposes query, list_tables, describe_table, and list_namespaces to any MCP client — Claude Desktop, Cline, Continue. Read-only by default, so LLM-driven SQL is safe to enable. A data team can give their AI assistant direct catalog access without building custom tooling.
Scripted data pipelines. The exec mode outputs clean JSON or CSV for downstream consumption — lakesh exec -f json -q 'SELECT ...' | jq ... works as expected, no REPL noise.
DuckLake direct access alongside Iceberg REST. Both profile types share the same shell: Iceberg REST for catalog-layer reads, DuckLake direct for full read/write access (INSERT / UPDATE / DELETE) — one tool for both paths.
The Demo#
REPL session against a local duckicelake catalog — profile switching, \d / \l meta-commands, autocomplete, and a query through the Iceberg REST → DuckDB iceberg-ext path.

Profile Types#
lakesh has two profile types, both sharing the same REPL and commands.
Iceberg REST — connects via the Iceberg REST Catalog API. OAuth2 token fetch and reuse happen automatically per session. S3 credential plumbing is pre-configured correctly for both cloud and self-hosted MinIO deployments.
default = "local"
[profiles.local]
uri = "http://127.0.0.1:8181"
warehouse = "lake"
[profiles.local.s3]
endpoint = "http://127.0.0.1:9000"
region = "us-east-1"
access_key = "minioadmin"
secret_key = "minioadmin"
path_style = true
[profiles.local.oauth]
client_id = "demo-client"
client_secret = "demo-secret"DuckLake direct — bypasses the Iceberg REST layer entirely. Attaches the DuckLake catalog via Postgres + S3 directly. Useful for writes, since the iceberg extension does not support INSERT / UPDATE / DELETE over REST.
[profiles.lake_direct]
type = "ducklake"
postgres_dsn = "dbname=ducklake host=/path/.pgsock port=55432 user=ducklake"
data_path = "s3://lakehouse/data/"
catalog = "lake"Secrets (client_id, client_secret, access_key, postgres_dsn) can all be sourced from environment variables via *_env siblings — no credentials in config files for production profiles.
The REPL#
Inside the REPL, \-prefixed lines are meta-commands rather than SQL:
\? help
\l list namespaces
\d list all tables
\d <ns> list tables in one namespace
\d <ns>.<tbl> describe a table (columns + types)
\timing [on|off] toggle elapsed-time reporting
\format [table|json|csv] change result format
\q quitSQL autocomplete is scoped to the attached catalog. History persists across sessions in $XDG_STATE_HOME/lakesh/history. Multi-line statements terminate on ;.
The MCP Server#
lakesh mcp runs a Model Context Protocol server on stdio. Configure any MCP client to spawn it, and the LLM gets these tools:
| Tool | Purpose |
|---|---|
list_profiles() | Discover configured catalogs |
list_namespaces(profile=None) | List schemas |
list_tables(profile=None, namespace=None) | List tables, optionally scoped |
describe_table(namespace, table, profile=None) | Columns + types + nullability |
query(sql, profile=None, limit=1000, format="json") | Run SQL, return results |
query is read-only by default — it rejects anything that does not start with SELECT / SHOW / DESCRIBE / WITH / EXPLAIN. Set LAKESH_MCP_WRITE=1 in the server’s environment to unlock writes. Keeps LLM-driven SQL safe by default.
Claude Desktop config:
{
"mcpServers": {
"lakesh": {
"command": "lakesh",
"args": ["mcp"],
"env": {
"LAKESH_CONFIG": "/Users/you/.config/lakesh/config.toml"
}
}
}
}Same shape for Cline, Continue, and any other MCP client — point them at lakesh mcp on stdio.
Getting Started#
pip install -e '.[dev]'
# Write an example config:
lakesh config init
# Verify connectivity:
lakesh doctor
# Drop into the REPL:
lakesh
# One-shot query:
lakesh exec -q 'SELECT COUNT(*) FROM analytics.events'
# JSON output for pipes:
lakesh exec -f json -q 'SHOW TABLES' | jq '.[].table_name'To run against a local duckicelake stack:
cd ../duckicelake
pixi run backends-up && pixi run ducklake-init && pixi run serve &
cd ../lakesh && lakeshCommand Reference#
| Command | Purpose |
|---|---|
lakesh | REPL against default profile |
lakesh -p <name> | REPL against a named profile |
lakesh exec -q '<sql>' | One-shot query, table output |
lakesh exec -f json -q '<sql>' | One-shot query, JSON output |
lakesh exec -f csv -q '<sql>' | One-shot query, CSV output |
lakesh doctor | Connectivity smoke test |
lakesh mcp | Run as MCP server on stdio |
lakesh profiles list | List configured profiles |
lakesh config init | Write an example config |
lakesh is open source: github.com/KellerKev/lakesh
The Iceberg REST catalog it pairs with: github.com/KellerKev/duckicelake
