Skip to main content
  1. Posts/

DuckIceLake: An Iceberg v3 REST Catalog Proxy on Top of DuckLake

A full Iceberg v2/v3 REST Catalog proxy on top of DuckLake — so standard Iceberg clients speak directly to a DuckLake-backed lakehouse without modification

DuckIceLake is open source: github.com/KellerKev/duckicelake


DuckIceLake — ducks around an iceberg in a lake

DuckIceLake is an Iceberg REST Catalog proxy that sits in front of DuckLake — DuckDB’s SQL-native lakehouse format — and materialises DuckLake’s snapshot and schema state into Iceberg-spec manifests on demand. Standard Iceberg clients — PyIceberg, DuckDB’s iceberg extension, Trino, Spark — connect, read rows directly from S3, and write back via register-in-place commits that DuckLake atomically records.

The result: a lightweight, pixi-managed stack (no Docker required) with a real object store, real STS credential vending, OAuth2 + RBAC, Prometheus observability, and full Iceberg v3 support — including Puffin deletion vectors, row lineage, and the new primitive types.


What This Project Is
#

DuckLake is a compelling lakehouse format: Postgres as the catalog, Parquet on S3 as the data layer, DuckDB as the query engine. No Hive Metastore, no heavyweight infrastructure. But DuckLake speaks its own protocol — standard Iceberg tooling cannot connect to it without translation.

DuckIceLake bridges that gap. It exposes a complete Iceberg REST Catalog API surface in front of DuckLake, materialising manifests and metadata on demand and keeping the DuckLake HEAD == Iceberg current-snapshot-id identity invariant tight. The two extension paths — Iceberg REST and DuckLake direct — see exactly the same data. A row written through one appears in the other automatically.


Business Problems This Showcases
#

Interoperability without migration. Teams running DuckLake who want to connect PyIceberg, Trino, Spark, or any other Iceberg-native client without migrating away from DuckLake’s Postgres-backed catalog.

Iceberg v3 in production today. PyIceberg 0.11.1 still cannot write v3 manifests — the upstream fix stalled. DuckIceLake ships a monkey-patch shim that unblocks v3 writes end-to-end, so teams can adopt deletion vectors, row lineage, and new types now rather than waiting for upstream.

Governed, credential-scoped S3 access. The STS vending layer issues per-table MinIO credentials scoped to exactly the data and metadata paths a client needs — no shared root credentials passed to query engines.

Low-cost open lakehouse stack. Everything runs on a single pixi environment — Postgres, MinIO, the REST proxy, and the query clients. No JVM, no Hadoop, no cloud-managed catalog service. The same stack runs locally for development and on a small cloud VM in production.

Standard tooling, no lock-in. Because the REST surface is spec-complete, any conformant Iceberg client works without modification. Switching away from DuckLake as the backing store is an implementation detail behind the proxy — clients never need to change.


The Architecture at a Glance
#

  Iceberg REST client (PyIceberg, DuckDB iceberg ext, Trino, Spark, …)
              │  HTTP (Iceberg OpenAPI v3)
       FastAPI proxy (duckicelake.server)  ──▶ Prometheus /metrics
       │     │     │                       ──▶ /healthz /readyz
       │     │     │
       │     │     │  STS AssumeRole (per-table session policy)
       │     │     ▼
       │     │   MinIO STS  ──▶ vended creds (s3.access-key-id, …)
       │     │
       │     │  SQL via DuckDB + ducklake (write conn + read pool)
       │     ▼
       │  Postgres (psycopg pool)
       │     ├── ducklake_*       — schemas, tables, snapshots, stats, deletes
       │     └── duckicelake_*    — properties, tags, branches, partition sidecar
       │  S3 / MinIO (object I/O)
   data/<ns>/<tbl>/metadata/
        ├── vN.metadata.json              ── TableMetadata, versioned per commit
        ├── snap-<id>-<uuid>.avro         ── manifest list (one per snapshot)
        ├── <id>-<uuid>-m0-data.avro      ── data manifest
        └── <id>-<uuid>-m1-deletes.avro   ── delete / DV manifest

The proxy is FastAPI with sync endpoints run in uvicorn’s threadpool — blocking I/O (Postgres, S3, DuckDB) does not pin the event loop. The serve-hi task boots 4 workers for production shape.

Everything runs out of a single pixi environment — no Docker, no JVM.


DuckIceLake architectural proof — two clients, one data layer

Real-terminal recording: same Parquet on S3, two extension paths (Iceberg REST and DuckLake direct) reading the same rows. A write via DuckLake direct appears in the Iceberg reader automatically. Ends with the snapshot-id identity check.


The DuckLake Bridge
#

The core challenge is translation: DuckLake tracks snapshots, files, schemas, and statistics in Postgres using its own schema. Iceberg clients expect versioned vN.metadata.json, manifest lists, and per-file Avro manifests on S3.

DuckIceLake’s materialiser (materialize.py) runs lazily: on each LoadTable request it checks whether the in-process LRU cache has a hit for (ns, table) → (snap_id, metadata). On a cache miss it reads DuckLake’s Postgres state, writes the Avro manifests and metadata JSON to S3, and populates the cache. Post-commit reads hit the cache immediately — materialisation is eager after each commit and lazy on read.

The identity invariant is strict: DuckLake HEAD snapshot id == Iceberg current-snapshot-id. No random int64s — direct correlation makes operations debuggable.

DuckLake HEAD == Iceberg current-snapshot-id == ducklake.snapshot-id property

This is verified at the end of the demo suite. Two clients — one via Iceberg REST, one via DuckLake direct — read from the same Parquet files on S3 and see the same rows.


Full Iceberg Commit Surface
#

DuckIceLake implements the complete commit-table action set. Every Iceberg action translates to DuckLake SQL:

ActionTranslation
add-snapshot (append / overwrite / delete)ducklake_add_data_files() + snapshot tombstone
Position-delete fileINSERT INTO ducklake_delete_file; v3 tables rewrite to Puffin DV
Equality-delete filePer-file scan + emit Iceberg position-deletes, sequence-number scoped
add-schema + set-current-schemaDiff by field-id → ALTER TABLE ADD/DROP COLUMN
add-partition-specALTER TABLE … SET PARTITIONED BY
add-sort-orderINSERT into ducklake_sort_info + ducklake_sort_expression
set-properties / remove-propertiesSidecar duckicelake_table_property
set-snapshot-ref type=tagSidecar duckicelake_table_tag
upgrade-format-version to 2 or 3Sidecar property; manifests re-emit in matching Avro schema

Partition transforms are handled correctly end-to-end: identity and bucket[N] pass through DuckLake’s stored values; year / month / day / hour are recomputed server-side; truncate[N] is synthesised via a custom transform since DuckLake has no native equivalent. Partition pruning is verified — PyIceberg pushdown reduces file reads correctly.


Iceberg v3: The PyIceberg Shim
#

PyIceberg 0.11.1 raises Cannot write manifest list for table version: 3. The upstream fix (iceberg-python#3070) stalled in March 2026.

DuckIceLake ships pyiceberg_v3.py — a targeted monkey-patch that vendors the essentials:

  • ManifestWriterV3 / ManifestListWriterV3 subclasses
  • SUPPORTED_TABLE_FORMAT_VERSION bumped to 3 in both pyiceberg.table.metadata and pyiceberg.table.update
  • DataFile.from_args rewired to resolve defaults dynamically (V2-shape records into V3 writers caused IndexError)
  • Client-side gates patched in Transaction.upgrade_table_version + _apply_table_update
  • v3 primitive types (variant, geometry, geography) added to PyIceberg’s pydantic validator

One call before any RestCatalog operation:

from duckicelake.pyiceberg_v3 import install
install()

V3 writes — including deletion vectors, row lineage, and new primitive types — work end-to-end through the patched client and the proxy’s matching v3 Avro emit paths.


Puffin Deletion Vectors
#

For format-version 3 tables, position-delete Parquets are rewritten into a single Puffin file per snapshot containing one deletion-vector-v1 blob per affected data file:

  • Roaring64 portable serialisation, Iceberg-spec compatible
  • Magic D1 D3 39 64, big-endian length + CRC-32 framing per spec
  • Manifest entry carries file_format=puffin, content_offset, content_size_in_bytes, referenced_data_file, and record_count (= cardinality)

V2 tables keep the legacy Parquet position-delete shape — readers that only understand v2 still work.


Credential Vending
#

X-Iceberg-Access-Delegation: vended-credentials triggers a real MinIO AssumeRole call with a session policy scoped to the table’s data-file keys and its metadata/* prefix. The LoadTable response returns s3.access-key-id / s3.secret-access-key / s3.session-token / s3.credentials-expiration in the config map.

Query engines get short-lived, table-scoped credentials — no shared root keys distributed to clients.


OAuth2 + RBAC
#

POST /v1/oauth/tokens issues HMAC-signed JWTs. Middleware enforces Authorization: Bearer on every /v1/* route. Scope grammar: ns:<name>:<cap> (per-namespace) or * (superuser), where cap ∈ {r, w, rw, *}.

DUCKICELAKE_OAUTH_CLIENTS="id:secret|ns:analytics:rw,admin:secret|*"
DUCKICELAKE_REQUIRE_AUTH=1   # fail boot if no clients configured

PyIceberg consumes via credential="id:secret". DuckDB via CREATE SECRET (TYPE ICEBERG, TOKEN '<token>').


Performance and Observability
#

  • In-process LRU metadata cache (DUCKICELAKE_CACHE_MAX, default 1024) — cache-hit LoadTable measured at ~349 req/s at concurrency 32
  • Postgres ConnectionPool via psycopg-pool — most LoadTable work hits PG directly, bypassing the DuckDB write-conn lock
  • DuckDB read pool for parallel equality-delete scans
  • Per-snapshot S3 writes parallelised via thread pool; head_object before put_object skips re-uploads of byte-identical content
  • Single Postgres transaction per commit via contextvars-driven shared cursor

Prometheus exposition at /metrics: per-endpoint latency histograms, request counts by status class, cache hit/miss counters, PG pool state. /healthz and /readyz for liveness and readiness probes.


The lakesh Companion
#

lakesh is a small DuckDB-powered SQL shell for Iceberg REST catalogs and DuckLake direct. Profile-based connection management, an interactive REPL with psql-style meta-commands, one-shot exec mode for scripts, and an MCP server so LLM agents can query your catalogs through the same plumbing.

It pairs naturally with duckicelake — point it at the proxy and get a familiar SQL shell over your Iceberg tables.

lakesh companion demo — SQL shell over the Iceberg REST catalog

Getting Started
#

git clone https://github.com/KellerKev/duckicelake.git
cd duckicelake
pixi install
pixi run backends-up     # Postgres + MinIO
pixi run ducklake-init   # creates bucket + default namespace
pixi run serve           # Iceberg REST catalog on :8181

In another terminal:

pixi run smoke           # catalog-only smoke
pixi run duckdb-client   # full demo — 20+ assertion blocks across all features
pixi run test            # pytest integration suite (19 tests)

Teardown: pixi run backends-down.

No Docker. No JVM. One pixi install and it runs.


What’s Left
#

The Iceberg spec surface is effectively complete. Remaining gaps are architectural (DuckLake-blocked: true divergent branches, per-table set-location, real KMS encryption), upstream (Spark v3 writes, DuckDB iceberg-ext v3 features), and production-readiness ops (HA backends, TLS, distributed tracing, shipped Grafana dashboards, Spark/Trino integration tests).

See MISSING.md for the full punch list.


DuckIceLake is open source: github.com/KellerKev/duckicelake

Companion SQL shell: github.com/KellerKev/lakesh

Kevin Keller
Author
Kevin Keller
Personal blog about AI, Observability & Data Sovereignty. Snowflake-related articles explore the art of the possible and are not official Snowflake solutions or endorsed by Snowflake unless explicitly stated. Opinions are my own. Content is meant as educational inspiration, not production guidance.
Share this article

Related