DuckIceLake: An Iceberg v3 REST Catalog Proxy on Top of DuckLake

A full Iceberg v2/v3 REST Catalog proxy on top of DuckLake — so standard Iceberg clients speak directly to a DuckLake-backed lakehouse without modification

DuckIceLake is open source: github.com/KellerKev/duckicelake

DuckIceLake is an Iceberg REST Catalog proxy that sits in front of DuckLake — DuckDB’s SQL-native lakehouse format — and materialises DuckLake’s snapshot and schema state into Iceberg-spec manifests on demand. Standard Iceberg clients — PyIceberg, DuckDB’s iceberg extension, Trino, Spark — connect, read rows directly from S3, and write back via register-in-place commits that DuckLake atomically records.

The result: a lightweight, pixi-managed FastAPI stack with a real object store, real STS credential vending, OAuth2 + RBAC, Prometheus observability, and full Iceberg v3 support — including Puffin deletion vectors, row lineage, and the new primitive types.

What This Project Is
#

DuckLake is a compelling lakehouse format: Postgres as the catalog, Parquet on S3 as the data layer, DuckDB as the query engine. DuckIceLake extends it with a full Iceberg REST Catalog API surface, materialising DuckLake’s snapshot and schema state into spec-compliant manifests so any Iceberg client connects without modification.

The two extension paths — Iceberg REST and DuckLake direct — see exactly the same data. A row written through one appears in the other automatically, with the DuckLake HEAD == Iceberg current-snapshot-id identity invariant kept tight throughout.

Business Problems This Showcases
#

Interoperability without migration. Teams running DuckLake who want to connect PyIceberg, Trino, Spark, or any other Iceberg-native client without migrating away from DuckLake’s Postgres-backed catalog.

Iceberg v3 in production today. DuckIceLake ships a monkey-patch shim (pyiceberg_v3.py) that enables full v3 writes end-to-end — deletion vectors, row lineage, and the new primitive types — through both the client and the proxy’s matching Avro emit paths.

Governed, credential-scoped S3 access. The STS vending layer issues per-table MinIO credentials scoped to exactly the data and metadata paths a client needs — no shared root credentials passed to query engines.

Low-cost open lakehouse stack. Everything runs from a single pixi environment — Postgres, MinIO, the REST proxy, and the query clients. The same stack runs locally for development and on a small cloud VM in production.

Standard tooling, no lock-in. Because the REST surface is spec-complete, any conformant Iceberg client works without modification. Switching away from DuckLake as the backing store is an implementation detail behind the proxy — clients never need to change.

The Architecture at a Glance
#

  Iceberg REST client (PyIceberg, DuckDB iceberg ext, Trino, Spark, …)
              │  HTTP (Iceberg OpenAPI v3)
              ▼
       FastAPI proxy (duckicelake.server)  ──▶ Prometheus /metrics
       │     │     │                       ──▶ /healthz /readyz
       │     │     │
       │     │     │  STS AssumeRole (per-table session policy)
       │     │     ▼
       │     │   MinIO STS  ──▶ vended creds (s3.access-key-id, …)
       │     │
       │     │  SQL via DuckDB + ducklake (write conn + read pool)
       │     ▼
       │  Postgres (psycopg pool)
       │     ├── ducklake_*       — schemas, tables, snapshots, stats, deletes
       │     └── duckicelake_*    — properties, tags, branches, partition sidecar
       │
       │  S3 / MinIO (object I/O)
       ▼
   data/<ns>/<tbl>/metadata/
        ├── vN.metadata.json              ── TableMetadata, versioned per commit
        ├── snap-<id>-<uuid>.avro         ── manifest list (one per snapshot)
        ├── <id>-<uuid>-m0-data.avro      ── data manifest
        └── <id>-<uuid>-m1-deletes.avro   ── delete / DV manifest

The proxy is FastAPI with sync endpoints run in uvicorn’s threadpool — blocking I/O (Postgres, S3, DuckDB) does not pin the event loop. The serve-hi task boots 4 workers for production shape.

Everything runs from a single pixi environment.

DuckIceLake architectural proof — two clients, one data layer

Real-terminal recording: same Parquet on S3, two extension paths (Iceberg REST and DuckLake direct) reading the same rows. A write via DuckLake direct appears in the Iceberg reader automatically.

The DuckLake Bridge
#

The core challenge is translation: DuckLake tracks snapshots, files, schemas, and statistics in Postgres using its own schema. Iceberg clients expect versioned vN.metadata.json, manifest lists, and per-file Avro manifests on S3.

DuckIceLake’s materialiser (materialize.py) translates DuckLake’s Postgres catalog state into the full Iceberg metadata chain on S3: versioned vN.metadata.json, manifest list, data manifest, and delete manifest when needed. The snapshot id is deterministic — DuckLake HEAD == Iceberg current-snapshot-id — so operations are debuggable and the two extension paths are always in sync.

Eager materialisation via Postgres NOTIFY
#

DuckDB clients can also write directly via the ducklake: extension (INSERT / UPDATE / DELETE) without going through the REST proxy. These commits land in ducklake_snapshot immediately. To keep the Iceberg layer current without waiting for the next LoadTable call, the proxy runs an eager materialisation path:

An AFTER INSERT trigger on ducklake_snapshot fires NOTIFY duckicelake_snapshot with the new snapshot id.
src/duckicelake/notify.py runs an async LISTEN loop. One worker per fleet is elected via pg_try_advisory_lock; the others poll and take over if the elected worker dies.
Each notification resolves the snapshot id to the touched (schema, table) pairs and runs materialize_all() per table — the full S3 metadata chain appears within ~1s, with no LoadTable call required.
A duckicelake_materialisation_log sidecar tracks done/failed/pending for idempotency and provides an operator-visible audit trail. A startup catch-up scan covers any snapshots that landed during a listener outage.

The lazy LoadTable path remains the correctness floor. Set DUCKICELAKE_DISABLE_NOTIFY=1 to turn off the eager path — readers still get correct data on demand. PostgreSQL is required for this feature (the LISTEN/NOTIFY machinery and pg_try_advisory_lock are Postgres-specific).

Full Iceberg Commit Surface
#

DuckIceLake implements the complete commit-table action set. Every Iceberg action translates to DuckLake SQL:

Action	Translation
`add-snapshot` (append / overwrite / delete)	`ducklake_add_data_files()` + snapshot tombstone
Position-delete file	`INSERT INTO ducklake_delete_file`; v3 tables rewrite to Puffin DV
Equality-delete file	Per-file scan + emit Iceberg position-deletes, sequence-number scoped
`add-schema` + `set-current-schema`	Diff by field-id → `ALTER TABLE ADD/DROP COLUMN`
`add-partition-spec`	`ALTER TABLE … SET PARTITIONED BY`
`add-sort-order`	INSERT into `ducklake_sort_info` + `ducklake_sort_expression`
`set-properties` / `remove-properties`	Sidecar `duckicelake_table_property`
`set-snapshot-ref type=tag`	Sidecar `duckicelake_table_tag`
`upgrade-format-version` to 2 or 3	Sidecar property; manifests re-emit in matching Avro schema

Partition transforms are handled correctly end-to-end: identity and bucket[N] pass through DuckLake’s stored values; year / month / day / hour are recomputed server-side; truncate[N] is synthesised via a custom transform since DuckLake has no native equivalent. Partition pruning is verified — PyIceberg pushdown reduces file reads correctly.

Iceberg v3: The PyIceberg Shim
#

DuckIceLake ships pyiceberg_v3.py — a targeted monkey-patch that enables full Iceberg v3 writes from PyIceberg today. It vendors the essentials:

ManifestWriterV3 / ManifestListWriterV3 subclasses
SUPPORTED_TABLE_FORMAT_VERSION bumped to 3 in both pyiceberg.table.metadata and pyiceberg.table.update
DataFile.from_args rewired to resolve defaults dynamically (V2-shape records into V3 writers caused IndexError)
Client-side gates patched in Transaction.upgrade_table_version + _apply_table_update
v3 primitive types (variant, geometry, geography) added to PyIceberg’s pydantic validator

One call before any RestCatalog operation:

from duckicelake.pyiceberg_v3 import install
install()

V3 writes — including deletion vectors, row lineage, and new primitive types — work end-to-end through the patched client and the proxy’s matching v3 Avro emit paths.

Puffin Deletion Vectors
#

For format-version 3 tables, position-delete Parquets are rewritten into a single Puffin file per snapshot containing one deletion-vector-v1 blob per affected data file:

Roaring64 portable serialisation, Iceberg-spec compatible
Magic D1 D3 39 64, big-endian length + CRC-32 framing per spec
Manifest entry carries file_format=puffin, content_offset, content_size_in_bytes, referenced_data_file, and record_count (= cardinality)

V2 tables keep the legacy Parquet position-delete shape — readers that only understand v2 still work.

Credential Vending
#

X-Iceberg-Access-Delegation: vended-credentials triggers a real MinIO AssumeRole call with a session policy scoped to the table’s data-file keys and its metadata/* prefix. The LoadTable response returns s3.access-key-id / s3.secret-access-key / s3.session-token / s3.credentials-expiration in the config map.

Query engines get short-lived, table-scoped credentials — no shared root keys distributed to clients.

OAuth2 + RBAC
#

POST /v1/oauth/tokens issues HMAC-signed JWTs. Middleware enforces Authorization: Bearer on every /v1/* route. Scope grammar: ns:<name>:<cap> (per-namespace) or * (superuser), where cap ∈ {r, w, rw, *}.

DUCKICELAKE_OAUTH_CLIENTS="id:secret|ns:analytics:rw,admin:secret|*"
DUCKICELAKE_REQUIRE_AUTH=1   # fail boot if no clients configured

PyIceberg consumes via credential="id:secret". DuckDB via CREATE SECRET (TYPE ICEBERG, TOKEN '<token>').

Performance and Observability
#

In-process LRU metadata cache (DUCKICELAKE_CACHE_MAX, default 1024) — cache-hit LoadTable measured at ~349 req/s at concurrency 32
Postgres ConnectionPool via psycopg-pool — most LoadTable work hits PG directly, bypassing the DuckDB write-conn lock
DuckDB read pool for parallel equality-delete scans
Per-snapshot S3 writes parallelised via thread pool; head_object before put_object skips re-uploads of byte-identical content
Single Postgres transaction per commit via contextvars-driven shared cursor

Prometheus exposition at /metrics: per-endpoint latency histograms, request counts by status class, cache hit/miss counters, PG pool state. /healthz and /readyz for liveness and readiness probes.

The lakesh Companion
#

lakesh is a small DuckDB-powered SQL shell for Iceberg REST catalogs and DuckLake direct. Profile-based connection management, an interactive REPL with psql-style meta-commands, one-shot exec mode for scripts, and an MCP server so LLM agents can query your catalogs through the same plumbing.

It pairs naturally with duckicelake — point it at the proxy and get a familiar SQL shell over your Iceberg tables.

lakesh companion demo — SQL shell over the Iceberg REST catalog

Getting Started
#

The entire stack — Postgres, MinIO, the proxy, and all Python dependencies — is managed by a single pixi.toml. pixi install sets everything up; the task runner handles the rest. Local and production environments are identical.

git clone https://github.com/KellerKev/duckicelake.git
cd duckicelake
pixi install
pixi run backends-up     # Postgres + MinIO
pixi run ducklake-init   # creates bucket + default namespace
pixi run serve           # Iceberg REST catalog on :8181

In another terminal:

pixi run smoke           # catalog-only smoke
pixi run duckdb-client   # full demo — 20+ assertion blocks across all features
pixi run test            # pytest integration suite (22 tests)

Teardown: pixi run backends-down.

If you want to take the full DuckLake stack further — row-level access control, S3 bucket policies, and a governed deployment on Hetzner for under 10 euros a month — the DuckLake on Hetzner with Pixi article covers the complete pixi-managed setup.

What’s Left
#

The Iceberg spec surface is effectively complete. Remaining gaps are architectural (DuckLake-blocked: true divergent branches, per-table set-location, real KMS encryption), upstream (Spark v3 writes, DuckDB iceberg-ext v3 features), and production-readiness ops (HA backends, TLS, distributed tracing, shipped Grafana dashboards, Spark/Trino integration tests).

See MISSING.md for the full punch list.

DuckIceLake is open source: github.com/KellerKev/duckicelake

Companion SQL shell: github.com/KellerKev/lakesh

Author

Kevin Keller

Personal blog about AI, Observability & Data Sovereignty. Snowflake-related articles explore the art of the possible and are not official Snowflake solutions or endorsed by Snowflake unless explicitly stated. Opinions are my own. Content is meant as educational inspiration, not production guidance.

Share this article

What This Project Is#

Business Problems This Showcases#

The Architecture at a Glance#

The DuckLake Bridge#

Eager materialisation via Postgres NOTIFY#

Full Iceberg Commit Surface#

Iceberg v3: The PyIceberg Shim#

Puffin Deletion Vectors#

Credential Vending#

OAuth2 + RBAC#

Performance and Observability#

The lakesh Companion#

Getting Started#

What’s Left#

Related