Skip to main content
  1. Posts/

Python Camouflage: Format-Preserving Encryption for Snowflake Using FF3-1

Code: github.com/sfc-gh-kkeller/python_camouflage — FF3-1 Python UDFs, tag-based masking policies, install scripts, sample data.

This is an MVP and educational proof of technology — a framework you can embrace and extend. For production-grade tokenization at scale, consider Snowflake’s external tokenization partners. Python Camouflage can complement those solutions for less critical datasets or serve as a starting point for building your own.


The Problem: Masking Destroys Information
#

Every organisation with PII in Snowflake faces the same tension: compliance says mask it, analytics says we need to join on it.

Traditional masking is destructive. Take a phone number like (888) 235-8756 and mask it to (***) ***-8756. Now try to join two tables on that column — you get false matches wherever the last four digits happen to collide. The larger the dataset, the worse this gets. Masking destroys the information value of the data.

Tokenization solves this. A tokenized phone number is not the real number, but it is consistent — the same input always produces the same token. You can join across tables, deduplicate, group by, and filter on tokenized values. The information structure of the data is preserved, but the actual values are not readable.

Format-preserving encryption (FPE) goes one step further. The token looks like the original data. A tokenized phone number is still a valid-looking phone number. A tokenized email still has an @ and a .com. A tokenized integer has the same number of digits. This means your downstream systems — dashboards, BI tools, data quality checks, ML pipelines — do not break because the data format changed.

Python Camouflage implements FPE inside Snowflake using the FF3-1 algorithm (based on AES-256), delivered as Python UDFs with tag-based masking policies. Your key, your control, no external service required.


How It Works
#

The FF3-1 Algorithm
#

FF3-1 is a NIST-recommended format-preserving encryption standard. It takes a plaintext value, an AES-256 key, and a 56-bit tweak, and produces a ciphertext of the same length and format as the input. Encryption is deterministic with the same key and tweak — the same input always produces the same output, which is what makes joins work.

The Python implementation uses the mysto/python-fpe library with PyCryptodome for AES primitives.

Architecture
#

┌──────────────────────────────────────────────────────────────────┐
│                         Snowflake                                 │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                    Tag-Based Policies                       │  │
│  │                                                            │  │
│  │  Tag on column     →   Masking policy triggered            │  │
│  │  ff3_encrypt=''    →   encrypt_ff3_string/number/float     │  │
│  │  ff3_data_sc=''    →   format_ff3 (token formatting)       │  │
│  │  email=''          →   format as email                     │  │
│  │  uspostal=''       →   format as US postal code            │  │
│  │  usphone=''        →   format as US phone number           │  │
│  │  sqljoin=''        →   strip metadata, raw token for joins │  │
│  │  fake=''           →   Faker-generated realistic names     │  │
│  │  fake_email=''     →   Faker-generated email addresses     │  │
│  │  fuzzy=''          →   fully encrypt (no partial decrypt)  │  │
│  │  decrypt_this=''   →   decrypt (testing only!)             │  │
│  └────────────────────────────────────────────────────────────┘  │
│                              │                                    │
│                              ▼                                    │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                    Python UDFs                              │  │
│  │                                                            │  │
│  │  encrypt_ff3_string()     encrypt_ff3_number_integer()     │  │
│  │  decrypt_ff3_string()     decrypt_ff3_number_decimal38_8() │  │
│  │  format_ff3_string()      encrypt_ff3_float()              │  │
│  │  sqljoin_ff3_string()     partial_decrypt_ff3_number()     │  │
│  │  format_email_ff3()       format_ff3_number_integer()      │  │
│  │                                                            │  │
│  │  Powered by: ff3 (mysto/python-fpe) + PyCryptodome         │  │
│  └────────────────────────────────────────────────────────────┘  │
│                              │                                    │
│                              ▼                                    │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │                    Role-Based Access                        │  │
│  │                                                            │  │
│  │  ff3_encrypt   →  Can encrypt plaintext data               │  │
│  │  ff3_decrypt   →  Can decrypt back to plaintext            │  │
│  │  ff3_data_sc   →  Sees formatted tokens (for analytics)    │  │
│  │  ff3_masked    →  Sees standard masking (*** redacted ***)  │  │
│  └────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────┘

The key design: what you see depends on your role. The same column, the same query — but the masking policy checks your role and the column’s tags to decide whether you get plaintext, a formatted token, a Faker-generated value, or a redacted mask.


What Each Role Sees
#

Imagine a table with an email column containing kevin@example.com:

RoleTag on columnWhat the user seesCan join on it?
ff3_encryptkevin@example.com (encrypts on write)N/A
ff3_decryptkevin@example.com (decrypted)Yes
ff3_data_scemail=''xkf7q@hj3mn.com (token formatted as email)Yes
ff3_data_scfake_email=''sarah.jones@gmail.com (Faker-generated)No (regenerated each query)
ff3_data_scsqljoin=''[C1]dfsgfdghsgtrt45w4004 (raw token)Yes (100% unique)
ff3_masked** masked ** (traditional redaction)No

The same principle applies to numbers, floats, phone numbers, and postal codes — each with its own format-aware UDF.


Data Type Support
#

FF3-1 in Python Camouflage handles three data types, each with its own encryption, decryption, formatting, and partial-decryption UDFs:

Strings
#

FF3-1 can encrypt a maximum of 30 characters at a time. For longer strings, Python Camouflage chunks the input into 30-character pieces and concatenates the encrypted chunks with metadata:

Original:  "kevin.keller@example.com"
Encrypted: "[C1]dfsgfdghsgtrt45w4024"
            ^^^                  ^^^
            chunk marker         original length (024 = 24 chars)
            + padding count

The chunk metadata enables the formatting UDFs to reconstruct tokens that look like the original format — valid email addresses, phone numbers, names.

Integers
#

Encrypted integers embed metadata for format preservation:

Original:  55
Encrypted: 4543402
           ^     ^^
           padding+1  original digit count (02)
           (4 = 3 digits padding + 1)

The formatting UDF strips this metadata and returns a number that has the same order of magnitude as the original.

Decimals and Floats
#

Encrypted decimals preserve the comma position:

Original:  55.78
Encrypted: 24535452242.00000000
           ^          ^^^^
           comma pos  before/after comma counts + original length + padding

Partial Decryption: Analytics on Encrypted Data
#

One of the more interesting features: for number types, Python Camouflage can decrypt just the first digit while keeping the rest encrypted. This gives data scientists enough signal to see trends, run comparisons, and detect outliers — without revealing the actual values.

-- Data scientist sees:
-- Original: 55     →  Token: 5XXXX  (first digit real, rest encrypted)
-- Original: 1200   →  Token: 1XXXX  (first digit real, rest encrypted)
-- Original: 350    →  Token: 3XXXX  (first digit real, rest encrypted)

You can tell immediately that the second value is in the thousands and the first is in the tens — enough for trend analysis, not enough to identify individuals.

Disable this with the fuzzy='' tag on a column to encrypt all digits, including the first.


Tag-Driven Experience
#

The power of Python Camouflage comes from Snowflake’s tag-based masking policies. Instead of writing custom masking policies per column, you set tags on columns and the policies react automatically:

-- Make encrypted emails look like emails
ALTER TABLE customers MODIFY COLUMN email SET TAG email='';

-- Make encrypted postal codes look like US zip codes
ALTER TABLE customers MODIFY COLUMN postcode SET TAG uspostal='';

-- Make encrypted phone numbers look like US phone numbers
ALTER TABLE customers MODIFY COLUMN phone SET TAG usphone='';

-- Use Faker to generate realistic-looking names
ALTER TABLE customers MODIFY COLUMN name SET TAG fake='';

-- Raw tokens for SQL joins (100% unique, no formatting)
ALTER TABLE customers MODIFY COLUMN customer_id SET TAG sqljoin='';

-- Disable partial decryption of first digit
ALTER TABLE customers MODIFY COLUMN salary SET TAG fuzzy='';

Tags are composable. You can change the experience per column, per use case, without touching the masking policy code.


Use Cases
#

1. Hold Your Own Key
#

The AES-256 key never leaves your control. It is used inside Python UDFs that you deploy. Snowflake does not have access to the key material. This satisfies “bring your own key” requirements without external tokenization infrastructure.

2. Protect Against Superusers
#

Even ACCOUNTADMIN cannot read encrypted data without the key. Encryption adds a layer of protection beyond RBAC — a compromised admin account cannot access plaintext without also compromising the key.

3. Data Clean Rooms
#

Two partners in a Snowflake Data Clean Room can agree on the same FF3-1 key and tweak. Both encrypt their customer IDs with the same parameters. Now they can join on the encrypted IDs without either party revealing their actual customer identifiers. The join works because FF3-1 is deterministic — same input, same key, same output.

4. Analytics on Encrypted Data
#

Data scientists with the ff3_data_sc role see formatted tokens that preserve the structure of the original data. They can build dashboards, run aggregations, and identify patterns — all on data that is never decrypted in Snowflake.

5. On-Premise Decryption
#

For maximum control, decrypt outside Snowflake entirely. Python Camouflage works with:

  • PowerBI — load encrypted data from Snowflake and decrypt in the local PowerBI dataset
  • Python scripts — decrypt before ingest or after select using any Python ORM (PyDAL, SQLAlchemy)
  • PostgreSQL — use Postgres as a database proxy with Python UDFs for decryption (same pattern as Column-Level Encryption with Postgres)

Setup
#

# 1. Clone the repo
git clone https://github.com/sfc-gh-kkeller/python_camouflage.git

# 2. Run the install script (macOS or Linux)
./install_macos.sh   # or install_linux.sh

# 3. Follow the prompts:
#    - Snowflake account identifier
#    - Admin username
#    The script will:
#    - Clone the FF3 Python library and package it as ff3.zip
#    - Build the full install.sql from UDF components
#    - Execute via SnowSQL against your account

The installer creates:

  • Database ff3_test_db with schema ff3_test_schema
  • Warehouse ff3_test_wh
  • Roles: ff3_standard (install), ff3_encrypt, ff3_decrypt, ff3_data_sc, ff3_masked, ff3_tag_admin
  • All UDFs for encrypt, decrypt, format, partial-decrypt across strings, numbers, and floats
  • Tag-based masking policies
  • A sample INSURANCE_SOURCE1 table loaded from CSV for testing

Key Takeaways
#

  1. Tokenization preserves joins. Unlike masking, FF3-1 encryption is deterministic — the same input always produces the same token. Join, deduplicate, group by, and filter on encrypted columns.

  2. Format preservation keeps pipelines intact. Encrypted emails look like emails. Encrypted numbers have the same digit count. Downstream systems do not break.

  3. Tag-based policies scale. Set a tag on a column, get the right encryption/formatting behaviour automatically. No per-column policy writing.

  4. Four roles, four views. The same data looks different depending on who queries it — plaintext, formatted token, Faker-generated value, or redacted mask.

  5. Works outside Snowflake too. The Python functions work with any database that can execute Python — PostgreSQL, PyDAL, SQLAlchemy, PowerBI datasets. Decrypt on-premise if you need to.

  6. Your key, your control. The AES-256 key and tweak are yours. No external tokenization service, no key escrow, no third-party dependency.


Python Camouflage builds on the mysto/python-fpe and PyCryptodome libraries. See also Column-Level Encryption with Postgres as a Crypto Proxy for the AES-256-CBC approach.

Kevin Keller
Author
Kevin Keller
Personal blog about AI, Observability & Data Sovereignty. Snowflake-related articles explore the art of the possible and are not official Snowflake solutions or endorsed by Snowflake unless explicitly stated. Opinions are my own. Content is meant as educational inspiration, not production guidance.
Share this article

Related