Skip to main content
  1. Posts/

Decrypting PGP Files Inside Snowflake: Zero-Infrastructure MFT Ingestion

Table of Contents

Download: All source files for this post:

FileDescription
pgp_decrypt.sqlComplete SQL workbook — stage, UDF, stored procedure
pgpy.zipModified pgpy library bundle for Snowflake (see appendix)
MFT24_07-Private.keySample PGP private key (demo only)
service_volume_report_20240903_000001.csv.gz.gpgSample PGP-encrypted gzipped CSV

What This Is
#

Here is all the code to decrypt a PGP-encrypted, gzip-compressed file directly from a Snowflake stage — no middleware, no external compute, no file shuffling.

Two approaches:

  1. A UDF that decrypts and returns the content as a string. With a slight modification this could also be turned into a UDTF that returns a table directly.
  2. A stored procedure that decrypts and writes the decrypted gzip file back to the stage, ready for COPY INTO.

I had to modify the pgpy library slightly to make it compatible with Snowflake’s Python UDF sandbox. My modified version is attached as a zip file above, along with the SQL workbook, a sample PGP private key, and a sample encrypted file so you can test the full flow end to end.

The stored procedure currently decrypts one file at a time. It could be modified to batch-process files automatically — I have already put in a couple of code pointers in the procedure showing how that could be done, but the hard part is done. It would just be a matter of writing those loops and taking a list of files as a parameter instead of one file. Alternatively, you can wrap the stored procedure in another stored procedure for batching, or you can use the UDF code and nest it into a COPY INTO statement to copy the data straight from the encrypted compressed file into a Snowflake table.

One important note on stage encryption: the stage must be set to SNOWFLAKE_SSE if the customer wants to download the decrypted files for local testing. With the default SNOWFLAKE_FULL encryption, files on internal stages can only be read by Snowflake — they cannot be downloaded via GET. If the decrypted files only need to stay inside Snowflake (which is the normal production case), either encryption type works.


Architecture
#

Before: Traditional MFT Pipeline
#

┌──────────┐     ┌─────────────────────────┐     ┌───────────────┐
│  Partner  │     │    MFT Middleware        │     │   Snowflake   │
│           │     │                         │     │               │
│  Encrypt  │────▶│  SFTP receive           │     │               │
│  with     │     │  PGP decrypt            │────▶│  COPY INTO    │
│  your     │     │  Decompress (.gz)       │     │  target table │
│  pub key  │     │  Stage to cloud storage │     │               │
│           │     │                         │     │               │
└──────────┘     └─────────────────────────┘     └───────────────┘
                  Private key lives here
                  Server must be maintained
                  Network rules, patching, IAM

After: Snowflake-Native Decryption
#

┌──────────┐     ┌─────────────────────────────────────────────┐
│  Partner  │     │                 Snowflake                    │
│           │     │                                             │
│  Encrypt  │     │  ┌────────────┐                             │
│  with     │────▶│  │ @pgp_libs  │  Internal stage             │
│  your     │     │  │            │  (SSE encrypted)            │
│  pub key  │     │  │ *.gpg      │───┐                         │
│           │     │  │ *.key      │   │                         │
└──────────┘     │  └────────────┘   │                         │
                  │                    ▼                         │
                  │  ┌────────────────────────────────────────┐ │
                  │  │  Python UDF / Stored Procedure          │ │
                  │  │                                        │ │
                  │  │  1. Read .gpg from stage               │ │
                  │  │  2. Read private key from stage        │ │
                  │  │  3. Unlock key with passphrase         │ │
                  │  │  4. Decrypt PGP message                │ │
                  │  │  5. Decompress gzip                    │ │
                  │  │  6. Return plaintext / write to stage  │ │
                  │  │                                        │ │
                  │  │  Packages: pgpy, cryptography,         │ │
                  │  │  snowflake-snowpark-python              │ │
                  │  └────────────────────────────────────────┘ │
                  │                    │                         │
                  │                    ▼                         │
                  │  ┌────────────────────────────────────────┐ │
                  │  │  Decrypted CSV on stage or in query    │ │
                  │  │  → COPY INTO target table              │ │
                  │  └────────────────────────────────────────┘ │
                  └─────────────────────────────────────────────┘

The entire decryption pipeline runs inside Snowflake’s secure execution environment. The private key is read from an internal stage and never leaves Snowflake’s compute boundary.


Setup: Stage and Dependencies
#

First, create an internal stage to hold the PGP artefacts. Use SNOWFLAKE_SSE encryption — this is required if you want to be able to GET (download) the decrypted files back to a local machine for testing. If the decrypted files will only ever be consumed inside Snowflake via COPY INTO, SNOWFLAKE_FULL also works.

CREATE STAGE pgp_libs
  ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');

Upload the required files to the stage:

PUT file://MFT24_07-Private.key @pgp_libs AUTO_COMPRESS=FALSE;
PUT file://pgpy.zip @pgp_libs AUTO_COMPRESS=FALSE;
PUT file://service_volume_report_20240903_000001.csv.gz.gpg @pgp_libs AUTO_COMPRESS=FALSE;

After uploading, your stage looks like this:

LIST @pgp_libs;

┌──────────────────────────────────────────────────────┬───────┬──────────────────────────────────┬─────────────────────────────┐
│ name                                                 │  size │ md5                              │ last_modified               │
├──────────────────────────────────────────────────────┼───────┼──────────────────────────────────┼─────────────────────────────┤
│ pgp_libs/MFT24_07-Private.key                        │  3680 │ f236ef65eed67d19d05ea753986a4f24 │ Thu, 5 Sep 2024 12:50:17 GMT│
│ pgp_libs/pgpy.zip                                    │ 81808 │ 76b902de0b863adb0d7cb1c92be8f727 │ Thu, 5 Sep 2024 13:31:03 GMT│
│ pgp_libs/service_volume_report_20240903_000001.csv…  │   208 │ bc7238f7cbf253328b87e545d1ae0b6e │ Thu, 5 Sep 2024 15:23:24 GMT│
│ pgp_libs/service_volume_report_20240903_000001.csv…  │   800 │ aa4e8ce47dfe9e595d1be23bdb278a1d │ Thu, 5 Sep 2024 12:51:08 GMT│
└──────────────────────────────────────────────────────┴───────┴──────────────────────────────────┴─────────────────────────────┘

Why pgpy.zip? Snowflake’s Python UDF sandbox does not include the pgpy library by default. By uploading a pre-built zip of the package to the stage and referencing it via IMPORTS, the UDF can import it at runtime. The zip was built with pip install pgpy -t ./pgpy_pkg && cd pgpy_pkg && zip -r ../pgpy.zip .


Approach 1: UDF — Decrypt and Return as String
#

The simplest approach. A scalar UDF reads the encrypted file from the stage, decrypts it, decompresses it, and returns the plaintext as a string. Ideal for ad-hoc inspection or inline SQL processing.

This is a scalar UDF that returns the entire decrypted content as a single string. With a slight modification — changing RETURNS STRING to RETURNS TABLE (...) and yielding rows instead of a single string — this could be turned into a UDTF (User-Defined Table Function) that returns the CSV as a proper table. That opens up a powerful pattern: you can nest the UDTF directly inside a COPY INTO statement to go straight from an encrypted compressed file on stage into a Snowflake table in a single statement, with no intermediate files.

CREATE OR REPLACE FUNCTION decrypt_pgp(
    pgp_key     VARCHAR,
    pgp_password VARCHAR,
    pgp_file_in VARCHAR,
    file_out    VARCHAR
)
RETURNS STRING
LANGUAGE PYTHON
RUNTIME_VERSION = 3.11
PACKAGES = ('cryptography', 'pyasn1', 'six', 'snowflake-snowpark-python')
IMPORTS = ('@pgp_libs/pgpy.zip')
HANDLER = 'udf'
AS $$
from pgpy import PGPMessage, PGPKey
from snowflake.snowpark.files import SnowflakeFile
import gzip
from io import BytesIO


def udf(pgp_key, pgp_password, pgp_file_in, file_out):
    # 1. Load the encrypted message and private key from stage
    message = PGPMessage.from_file(pgp_file_in)
    key, _ = PGPKey.from_file(pgp_key)

    # 2. Unlock the private key and decrypt
    with key.unlock(pgp_password):
        decrypted_message = key.decrypt(message).message

    # 3. Decompress the gzip layer
    fgz = BytesIO(decrypted_message)
    with gzip.GzipFile(fileobj=fgz) as fh:
        final_message = fh.read()

    return final_message.decode()
$$;

Call it:

SELECT decrypt_pgp(
    '@pgp_libs/MFT24_07-Private.key',
    'your_passphrase',
    '@pgp_libs/service_volume_report_20240903_000001.csv.gz.gpg',
    '@pgp_libs/service_volume_report_20240903_000001.csv.gz'
);

The function returns the full CSV content as a single string. From here you can parse it with SPLIT_TO_TABLE, wrap it in a view, or use it as input for COPY INTO via a table function.

How the Decryption Works
#

 .csv.gz.gpg file on stage
 ┌──────────────────┐
 │  PGP Envelope    │  ← pgpy decrypts with private key + passphrase
 │  ┌──────────────┐│
 │  │  gzip layer  ││  ← gzip.GzipFile decompresses
 │  │  ┌──────────┐││
 │  │  │  CSV data │││  ← returned as UTF-8 string
 │  │  └──────────┘││
 │  └──────────────┘│
 └──────────────────┘

Approach 2: Stored Procedure — Decrypt and Write Back to Stage
#

For larger files or production pipelines, you want the decrypted file written back to the stage as a gzip-compressed file, ready for COPY INTO. This stored procedure currently processes one file at a time. Note the commented-out code pointers for batch processing — the hard part (PGP decryption, stage I/O) is done; extending to batch is just a matter of accepting a list of files and wrapping the core logic in a loop:

CREATE OR REPLACE PROCEDURE decrypt_pgp_writeback(
    pgp_key      VARCHAR,
    pgp_password VARCHAR,
    pgp_file_in  VARCHAR,
    stage_out    VARCHAR,
    file_out     VARCHAR
)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.11'
PACKAGES = ('cryptography', 'pyasn1', 'six', 'snowflake-snowpark-python')
IMPORTS = ('@pgp_libs/pgpy.zip')
HANDLER = 'udf'
EXECUTE AS CALLER
AS $$
from pgpy import PGPMessage, PGPKey
from snowflake.snowpark.files import SnowflakeFile
import os


def udf(session, pgp_key, pgp_password, pgp_file_in, stage_out, file_out):

    # -- For batch processing, accept a list and loop: --
    # counter = 0
    # for pgp_file_in in pgp_files_in:
    #     counter = counter + 1

        # 1. Load encrypted message and private key
        message = PGPMessage.from_file(pgp_file_in)
        key, _ = PGPKey.from_file(pgp_key)

        # 2. Prepare local path for decrypted output
        file_path = f'@{stage_out}/{file_out}'
        gz = session.file.get(file_path, '/tmp')
        path = os.path.join(os.getcwd(), 'tmp', file_out)

        # -- For batch: iterate file_paths too --
        # for file_path in file_paths:

        # 3. Decrypt and write to local temp file
        with key.unlock(pgp_password):
            decrypted_message = key.decrypt(message).message
            with open(path, "wb") as binary_file:
                binary_file.write(decrypted_message)

        # 4. Upload decrypted file back to stage
        session.file.put(path, '@' + stage_out, overwrite=True)

        return path
$$;

Call it:

CALL decrypt_pgp_writeback(
    '@pgp_libs/MFT24_07-Private.key',
    'your_passphrase',
    '@pgp_libs/service_volume_report_20240903_000001.csv.gz.gpg',
    'pgp_libs',
    'service_volume_report_20240903_000001.csv.gz'
);

After the procedure completes, LIST @pgp_libs shows the decrypted .csv.gz file alongside the original encrypted .gpg file. From here, standard COPY INTO takes over:

COPY INTO target_table
FROM @pgp_libs/service_volume_report_20240903_000001.csv.gz
FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1 COMPRESSION = 'GZIP');

UDF vs Stored Procedure: When to Use Which
#

AspectUDF (decrypt_pgp)Stored Procedure (decrypt_pgp_writeback)
ReturnsDecrypted content as stringFile path on stage
Best forSmall files, ad-hoc inspection, inline SQLLarge files, production pipelines
Next stepParse in SQL (SPLIT_TO_TABLE)COPY INTO from stage
Bulk supportOne file per callEasily extended to loop over multiple files
Caller contextUDF execution contextEXECUTE AS CALLER — inherits caller’s privileges

Extending to Bulk and Direct-to-Table
#

The hard part — PGP decryption inside Snowflake’s sandbox — is done. There are three ways to extend this for production workloads:

Option 1: Modify the Stored Procedure for Batch
#

Uncomment the loop pointers already in the code. Change the signature to accept a list of files instead of one, iterate, and decrypt them all in a single call. The code pointers are already there — it is just a matter of writing the loops.

Option 2: Wrap in Another Stored Procedure
#

Leave decrypt_pgp_writeback as-is (single file) and write a thin wrapper procedure that queries the stage directory and calls it in a loop:

CREATE OR REPLACE PROCEDURE decrypt_all_pgp_files(
    stage_name VARCHAR, pgp_key VARCHAR, pgp_password VARCHAR
)
RETURNS VARCHAR
LANGUAGE SQL
EXECUTE AS CALLER
AS
$$
DECLARE
    files RESULTSET;
BEGIN
    files := (SELECT "name" FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
              WHERE "name" LIKE '%.gpg');
    -- loop and call decrypt_pgp_writeback for each file
    RETURN 'done';
END;
$$;

Option 3: UDF Nested in COPY INTO (Direct-to-Table)
#

Convert the UDF into a UDTF (table function) and nest it directly inside a COPY INTO. This skips the intermediate stage file entirely — data goes straight from the encrypted compressed file into a Snowflake table in a single statement:

-- Conceptual: UDTF version that yields rows instead of a single string
-- SELECT * FROM TABLE(decrypt_pgp_table(...))
-- can be used as source in COPY INTO or CREATE TABLE AS SELECT

Scheduled Execution with Tasks
#

Whichever approach you choose, combine it with a Snowflake Task for automated processing:

CREATE TASK decrypt_mft_files
  WAREHOUSE = compute_wh
  SCHEDULE = 'USING CRON 0 6 * * * Europe/Zurich'
AS
  CALL decrypt_pgp_writeback(
      '@pgp_libs/MFT24_07-Private.key',
      'your_passphrase',
      '@pgp_libs/latest_report.csv.gz.gpg',
      'pgp_libs',
      'latest_report.csv.gz'
  );

This turns Snowflake into a self-contained MFT endpoint: files arrive encrypted, get decrypted on schedule, and are loaded into tables — all without leaving the platform.


Security Considerations
#

Private Key Storage
#

The private key sits on an internal stage with SSE encryption. Snowflake encrypts it at rest with a platform-managed key. Access is controlled by stage privileges:

-- Only the service role can read the key
GRANT READ ON STAGE pgp_libs TO ROLE mft_decrypt_role;
REVOKE ALL ON STAGE pgp_libs FROM ROLE PUBLIC;

Passphrase Management
#

The examples above pass the passphrase as a literal for clarity. In production, use a Snowflake secret:

CREATE SECRET pgp_passphrase
  TYPE = GENERIC_STRING
  SECRET_STRING = 'your_passphrase';

-- Reference in the UDF via Snowpark's secret access
-- or pass from an external secrets manager via External Access Integration

Audit Trail
#

Every call to the UDF or stored procedure is logged in SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY. You get a full audit trail of who decrypted which file and when — for free.

Network Isolation
#

The Python UDF runs in Snowflake’s sandboxed execution environment. It has no outbound network access by default. The private key cannot be exfiltrated by the UDF code unless you explicitly create an External Access Integration — which itself requires ACCOUNTADMIN privileges and is logged.


The pgpy Library
#

pgpy is a pure-Python implementation of the OpenPGP standard (RFC 4880). It supports:

  • RSA, DSA, ElGamal, and ECDSA/ECDH keys
  • AES, Blowfish, Twofish, CAST5 symmetric ciphers
  • SHA-1, SHA-256, SHA-384, SHA-512 hashes
  • Compression (ZIP, ZLIB, BZ2)

Because it is pure Python with only cryptography, pyasn1, and six as dependencies, it runs cleanly in Snowflake’s Python UDF sandbox. No native extensions, no C compilation, no platform-specific binaries.

Building the pgpy Bundle
#

# Create a clean environment
mkdir pgpy_pkg && cd pgpy_pkg
pip install pgpy -t .

# Remove unnecessary files to keep the bundle small
find . -name "__pycache__" -type d -exec rm -rf {} +
find . -name "*.dist-info" -type d -exec rm -rf {} +

# Package
cd .. && zip -r pgpy.zip pgpy_pkg/

# Upload to Snowflake
PUT file://pgpy.zip @pgp_libs AUTO_COMPRESS=FALSE;

End-to-End Pipeline Example
#

Putting it all together — a complete pipeline from encrypted file drop to queryable table:

-- 1. Partner drops encrypted file to external stage (S3/Azure/GCS)
--    or you PUT it to an internal stage

-- 2. Decrypt and write back to stage
CALL decrypt_pgp_writeback(
    '@pgp_libs/MFT24_07-Private.key',
    'your_passphrase',
    '@pgp_libs/service_volume_report_20240903_000001.csv.gz.gpg',
    'pgp_libs',
    'service_volume_report_20240903_000001.csv.gz'
);

-- 3. Load into table
CREATE TABLE IF NOT EXISTS service_volume_report (
    report_date   DATE,
    service_name  VARCHAR,
    volume        NUMBER,
    region        VARCHAR
);

COPY INTO service_volume_report
FROM @pgp_libs/service_volume_report_20240903_000001.csv.gz
FILE_FORMAT = (TYPE = 'CSV' SKIP_HEADER = 1 COMPRESSION = 'GZIP')
ON_ERROR = 'ABORT_STATEMENT';

-- 4. Verify
SELECT * FROM service_volume_report LIMIT 10;

-- 5. Clean up encrypted source (optional)
REMOVE @pgp_libs/service_volume_report_20240903_000001.csv.gz.gpg;

Key Takeaways
#

  1. Eliminate the middleware. PGP decryption runs inside Snowflake’s Python sandbox. No external servers, no agents, no cron jobs on a VM somewhere.

  2. The private key stays within Snowflake’s security boundary. It is encrypted at rest on an internal stage, access-controlled by RBAC, and never transits an external network.

  3. Two patterns for two use cases. Use the UDF for ad-hoc decryption and inline SQL processing. Use the stored procedure for production pipelines where you need the decrypted file on stage for COPY INTO.

  4. Fully auditable. Every decryption operation is logged in Snowflake’s query history. Pair with access history for a complete chain of custody from encrypted file to table row.

  5. Schedule with Tasks. Combine the stored procedure with Snowflake Tasks and Streams for a fully automated, event-driven MFT pipeline — no orchestrator required.

  6. Production hardening. Replace literal passphrases with Snowflake Secrets, restrict stage access to a dedicated service role, and consider key rotation strategies where the old and new keys overlap during a transition window.


Appendix: pgpy Modifications for Snowflake
#

The stock pgpy library cannot run inside a Snowflake Python UDF out of the box. The reason: pgpy’s from_file() methods use Python’s built-in open() to read files from the local filesystem. Inside Snowflake’s sandbox, files on stages are not local files — they must be accessed through the SnowflakeFile API provided by snowflake.snowpark.files.

Two files were modified. The changes are minimal and surgical — only the file I/O paths were replaced.

1. pgpy/types.pyArmorable.from_file()
#

This is the base class method that PGPMessage.from_file() and PGPKey.from_file() both inherit.

Original (stock pgpy):

@classmethod
def from_file(cls, filename):
    with open(filename, 'rb') as file:
        obj = cls()
        data = bytearray(os.path.getsize(filename))
        file.readinto(data)
    # ...

Modified (Snowflake-compatible):

from snowflake.snowpark.files import SnowflakeFile

@classmethod
def from_file(cls, filename):
    with SnowflakeFile.open(filename, 'rb', require_scoped_url=False) as file:
        obj = cls()
        # os.path.getsize() doesn't work on stage paths,
        # so seek to end to determine file size
        file.seek(0, os.SEEK_END)
        data = bytearray(file.tell())
        file.seek(0)
        file.readinto(data)
    # ...

Two changes here:

  • open()SnowflakeFile.open() — routes file access through Snowflake’s stage file API instead of the local filesystem.
  • os.path.getsize() → seek-based size detection — stage file paths (like @pgp_libs/key.key) are not real filesystem paths, so os.path.getsize() would fail. Instead, we seek to the end of the file, read file.tell() to get the byte count, then seek back to the beginning.

2. pgpy/pgp.pyPGPMessage.new()
#

The PGPMessage.new() class method reads a file when constructing a new message from a file path.

Original:

with open(filename, 'rb') as mf:
    mf.readinto(message)

Modified:

from snowflake.snowpark.files import SnowflakeFile

with SnowflakeFile.open(filename, 'rb', require_scoped_url=False) as mf:
    mf.readinto(message)

Same pattern — replace the standard file handle with SnowflakeFile.

Why require_scoped_url=False?
#

By default, SnowflakeFile.open() expects a scoped URL (a temporary, pre-signed URL that Snowflake generates for secure file access). Setting require_scoped_url=False allows the function to accept raw stage paths like @pgp_libs/MFT24_07-Private.key directly. This is necessary because pgpy receives the filename as a plain string — it has no mechanism to request a scoped URL.

Rebuilding the Modified Bundle
#

# Start from stock pgpy
pip install pgpy -t pgpy_build --no-deps

# Apply the two modifications:
#   pgpy_build/pgpy/types.py  — add SnowflakeFile import, patch from_file()
#   pgpy_build/pgpy/pgp.py    — add SnowflakeFile import, patch PGPMessage.new()

# Strip unnecessary files
find pgpy_build -name "__pycache__" -type d -exec rm -rf {} +
find pgpy_build -name "*.dist-info" -type d -exec rm -rf {} +

# Package — zip must contain the pgpy/ directory at the root
cd pgpy_build && zip -r ../pgpy.zip pgpy/

# Upload
PUT file://pgpy.zip @pgp_libs AUTO_COMPRESS=FALSE;

The modified pgpy.zip is available for download at the top of this post.

Kevin Keller
Author
Kevin Keller
Personal blog about AI, Observability & Data Sovereignty. Snowflake-related articles explore the art of the possible and are not official Snowflake solutions or endorsed by Snowflake unless explicitly stated. Opinions are my own. Content is meant as educational inspiration, not production guidance.
Share this article

Related