How SecuPi is Using Snowflake Java UDFs to Protect Sensitive Data

As organizations in all industries continue to move more and more sensitive data to the public cloud, ensuring the security of that data is always a top priority. The Snowflake Data Cloud natively includes a host of best-in-breed security controls, that combined with SecuPi’s security policy and privacy enforcement allows our joint customers to comply with all relevant security and privacy regulations and to ensure that data can securely be hosted in the Snowflake Data Cloud.
How SecuPi Protects Sensitive Data in the Snowflake Data Cloud
SecuPi provides completely independent, high performance, transparent data protection for Snowflake, the Data Cloud Company, with SecuPi’s Hold Your Own Key (HYOK) and Purpose Based Access Control (PBAC) solution. Snowflake customers can now control and manage the encryption keys used to de-identify (pseudonymize) records prior to loading to Snowflake to satisfy geo-fencing, consent and preference management and Schrems-II compliance challenges. The solution is equally relevant for other privacy compliance challenges and cross-border data flows under GDPR, HIPAA, CCPA, PIPEDA, TPA, PDPA, POPI, PDPB and many others.
The solution leverages fully integrated, independent, internal User Defined Functions (UDF) that do not require external system calls or additional encryption or tokenization servers in the cloud to ensure seamless integration with any Snowflake analytics workloads.
This now enables the most risk averse organizations in highly regulated industries to leverage cloud economics and migrate more of their analytics workloads to Snowflake while simultaneously reducing risk, enabling greater data monetization and privacy optimization.
Joint architecture – Split processing between SecuPi and Snowflake
Encryption and Periodic Re-Keying of Sensitive Data
For customers who require that sensitive data is never stored in cleartext in the Cloud, there is SecuPi HYOK. Snowflake can restrict access only to authorized users, and the data may be masked for unauthorized users. SecuPi can de-identity the data (using encryption) before it is loaded into Snowflake. The data is encrypted using the customer’s preferred Key Management System (KMS), which could be on-prem or a Cloud KMS. The data is then decrypted by SecuPi Policy Enforcement Points (PEP) at runtime only for authorized users.
Customers can also use SecuPi to encrypt data already stored in Snowflake, for example, when it has been determined that certain data has been deemed sensitive either by SecuPi’s advanced classification models, or by other Data Discovery tools or methods.
With alternative methods, data is initially encrypted or re-keyed within Snowflake using calls out to an external server to perform the encryption. There are a few drawbacks to this approach:
- The process can take a while, since callouts are needed to external functions
- It is difficult to quickly validate the data quality of the encryption process
- Time Travel info is lost
With SecuPi, completely self-contained Java functions can perform initial data encryption or Data Encryption Key (DEK) key rotations, encrypting within Snowflake without exposing the keys to Snowflake Admins or DBA’s.
A New Way to Do Encryption Using Java UDFs
As part of the launch of Snowpark, Snowflake’s native support for non-SQL languages such as Scala and Java, Snowflake has also introduced native support for Java User Defined Functions (UDFs). These functions allow developers to embed new functionality, or functionality already developed in Java, in JVMs that execute directly inside the Snowflake Data Cloud. This offers better performance, and the ability to execute functions with all of the functionality of the Java language, all without having to rely on additional servers or external processing.
Once a sensitive column such as credit card number or SSN is identified, SecuPi automatically encrypts or tokenizes the column with the appropriate key, using a Java UDF in a Bring Your Own Key (BYOK) model, leveraging an On-Prem HSM or any cloud hosted KMS.
All required SecuPi functionality is loaded into the Java Virtual Machine (JVM) at run-time within Snowflake’s Data Cloud with no other external UDF API calls or agents required.
Post initial data protection, customers can revert to a full Hold Your Own Key (HYOK) Model:
What Does This Mean for Snowflake and SecuPi Customers
By leveraging Snowflake Java UDF’s, SecuPi satisfies use cases such as:
- Initial data encryption – encrypt sensitive and regulated data at rest inside Snowflake. New encrypted columns will be added to the table, and after finishing the process all cleartext versions of the sensitive columns are dropped.
- Encryption tool changes – if replacing an existing tokenization/encryption method implemented with different or stronger encryption is a simple process. Steps will be similar to point number one.
- Bulk key rotations – a simple and fast way to rotate Data Encryption Keys (DEK).
All of the use-cases above include benefits of reduced data movement, validation of the decrypt/encrypt process, and Snowflake Time-Travel preservation, all while avoiding downtime and greatly reducing total run-time. This further expands on existing SecuPi support for all Snowflake features and connections.
All required SecuPi functionality is loaded into the Java Virtual Machine (JVM) at run-time within Snowflake’s Data Cloud with no other external UDF API calls or agents required.
Referential integrity is maintained enabling data analytics on protected Columns while ensuring privacy compliance. All of this is virtually transparent to Users, Applications and processes accessing Snowflake.
More To Come
Now even the most risk averse organizations in highly regulated industries can enjoy greater data mobility, leverage cloud economics and migrate more of their analytics workloads to Snowflake while simultaneously reducing risk, enabling greater data monetization and privacy optimization.
But this is just the beginning. SecuPi will continue to take advantage of Snowflake’s innovations in data security and privacy features, as well as in leveraging the power of Snowpark and Java UDFs to move as much of the SecuPi Policy Definition Point (PDP) and Policy Enforcement Point (PEP) functionality directly into the Snowflake engine, to deliver superior performance and data privacy protection.
In the near future, SecuPi can leverage Snowflake’s new tagging and data classification features to automatically identify the sensitive and regulated workloads and immediately encrypt those columns.
Written by Atalia Horenshtien, Director of Sales Engineering at SecuPi and Paul Gancz, Partner Solutions Architect at Snowflake
June 9, 2021