CDO Blog Series: Part 3 – Key Questions on Data Protection
Data protection spans across multiple dimensions, encompassing tasks such as managing access to sensitive data, enforcing segregation of duties, applying dynamic and at-rest protection at the data level, securing data in development and testing environments across operational and analytical workloads, and ensuring compliance with regulatory requirements such as data sovereignty, privacy, security, and governance, among others.
As a core asset contributing to the success of many organizations, data must be made available to different users (data consumers) to support their various needs and daily operations.
With data scattered across multiple on-premises and cross-cloud data stores, accessed and processed by various data consumers using a variety of tools and technologies, including analytical tools, cloud-native products, business applications, and data sharing agreements, ensuring consistent data protection across the organization requires a delicate balance between business operations, security, and regulatory requirements.
Balancing business operations, data protection, and customer privacy requirements requires Chief Data Officers (CDOs), Chief Information Security Officers (CISOs), and executive leadership to make informed decisions while understanding the business implications of various data protection approaches.
Regulated organizations, alongside other companies handling sensitive data, must gain a comprehensive understanding of various data operations and data processing requirements. These include:
- Privacy regulations such as GDPR, CPRA, PDPA, PDA, and others, which introduce an extensive set of data protection and data subject rights articles. These encompass:
- Records of processing, involving monitoring and tagging.
- The right to be forgotten, which includes both soft and hard deletion of customer data.
- Consent enforcement for specific usage and processing.
- Security by design and default, utilizing techniques such as Format-Preserving Encryption (FPE) and tokenization.
- Data sharing agreements, which require de-identification of sensitive data when sharing it with partners, vendors, and offshore employees. This applies to both data file exports and imports, as well as providing real-time access.
- Sovereignty laws, which necessitate ensuring that customer data in clear text is never accessible within Cloud platforms and that Cloud account administrators cannot employ any methods that would return clear-text data to the Cloud platform.
UNDERSTANDING THE BUSINESS ECOSYSTEM
To enforce data privacy, security, and governance across an organization’s operations, it is essential to have a comprehensive understanding of the end-to-end flow of data. This involves addressing questions such as:
- What data is stored on-premises, and what is stored in the cloud? Is the data classified?
- Who currently HAS access to sensitive data, and who SHOULD have access?
- How is the data being accessed, and which data processing technologies are being employed?
- Which data should be protected at rest, and who should have access to it in clear-text form?
- Which regulatory frameworks are applicable to the business, such as GDPR, HIPAA, and data sovereignty regulations, among others?
Modeling the technology architecture alongside analyzing data flows and data consumers is crucial for establishing a consistent data protection model across the IT infrastructure, supporting end-to-end business processes.
UNDERSTANDING YOUR DATA
Given the complexity of the technology stack used, which often incorporates both on-premise data stores and cloud platforms, it becomes crucial to undertake key activities during the planning phase. These activities include understanding the type of data being stored and processed, classifying it, and locating it.
Organizations may classify data in various ways to suit their specific needs. For instance, the US Department of Defense (US-DOD) employs three classification levels: Top Secret, Secret, and Confidential. Others adopt different variations, such as Public data, Private data, Internal data, Confidential data, and Restricted data.
Each classification level is defined based on the potential impact on the organization’s customers’ privacy, data security, reputation, and business resilience. Consequently, each level should be treated differently from a data protection perspective. Further categorizing these classifications provides an additional layer of abstraction, simplifying the definition of enforcement policies.
DATA STORAGE AND ITS JOURNEY
Generated from multiple sources, internal and external, data is collected, aggregated, processed, replicated and stored in multiple data-stores, both on-premise and in one or more cloud platforms.
Using cloud platforms for analytical and operational workloads is surging, enabling organizations to leverage capabilities and technologies to foster growth. At the same time, it further complicates data security and privacy as data leaves the well-contained on-premise environment, into a shared platform, managed by a third party provider.
The cloud providers’ ‘Shared Responsibility Model’ means the infrastructure security is within the cloud provider’s responsibility, while data-security, access-control, etc. are for the customer to enforce.
As directed by different regulatory frameworks and security guidelines, customers must make sure sensitive data, stored and processed within cloud must be secured at all times, from ingestion to consumption.
WHO HAS ACCESS TO SENSITIVE DATA AND WHO SHOULD?
Protecting data, either at-rest (i.e., encrypted or tokenized within the data store) or in-use (using dynamic masking and filtering) is merely a capability within your data protection strategy. Access to data needs to be granted on a need-to-know basis, regardless to how the data is stored.
End-to-end visibility into every data access and data processing activity is critical for understanding your current risk (e.g., everybody has access to everything) and the desired ‘access on a need-to-now’ approach.
Different data-consumers requires access to data for a purpose, and in some cases, consent. Understanding who has access to sensitive data (roles, users, APIs, etc.) is a fundamental step for designing and implementing the desired data access policy.
DATA ACCESS & PROCESSING TECHNOLOGIES
While the state of the data in the data store is important within itself, it is only a part of the overall data process. Different data consumers access and process data stored in various on-premise and cloud data-stores, from different locations (office, home, coffeeshop, etc.), using a variety of technologies such as business applications, analytical, reporting, support and DBA tools, and so on. Such tools provide basic security controls, however, they are limited by their nature and requires setup, implementation and maintenance on each and every technology separately. Moreover, data consumers tend to use more than one access tools for data processing, which means that consistent access policies must be configured across all data stores and access technologies to enable business operations. At the same time, different data-consumers with different purpose and need-to-know basis, may be using the same technology, requiring multiple policies to by defined and enforced within each technology separately, creating further complexity and risks.
WHERE IS THE DATA REIDENTIFICATION TAKING PLACE?
The most fundamental question for cloud data security is where does the reidentification function is executed. If the answer is ‘in the Cloud’, or ‘Using API/UDF/External Function’, your data is simply not secured.
Once sensitive data is encrypted at-rest, granting access to it in clear-text requires executing a deidentification function. Cloud service providers offer CloudKMS and Bring Your Own Key, enabling reidentification of data upon consumption from the Cloud platform.
This approach means that you, as the customer who is responsible for the data, have to share your encryption key with the Cloud provider. This approach has several critical security and regulatory challenges:
- No segregation of Duties (SoD): This practically like giving the keys of your safe to someone else. Once you have shared the encryption key, the cloud administrator can run the decryption function and gain access to the data in clear-text.
- Reidentification of the data is done in the cloud: The clear-text data is returned to the data platform and then sent to the user over the internet in clear-text format.
- Data sovereignty: Regulated in multiple countries under different articles (e.g., ePrivacy, Data Secrecy, etc.) global organizations and data-sharing processes must ensure sensitive data cannot be accessed from off-shore, by a user with different nationality, etc. Using a shared platform means sensitive and regulated data may be accessed by global users.
It is imperative to ensure reidentification of the data is NEVER done in the cloud and that sensitive data is selectively reidentified ONLY for authorized users.
HOW TO PROCEED
As data is distributed throughout the organization and utilized by various data consumers, it is crucial to avoid a narrow, feature-specific, or point-solution approach that only addresses immediate issues. A data-centric security approach demands
- A well-defined, forward-looking scope that aligns with the organization’s data, cloud, and digital transformation initiatives, encompassing cloud analytics workloads, cloud-native applications, multi-cloud environments (e.g., AWS and GCP), and hybrid cloud operations (i.e., both cloud and on-premises). This entails the abstraction of security from the data and cloud platforms and the utilization of a centralized dashboard for managing all cloud data operations.
- Data resides in cloud data warehouse, data lake, data lake house, in various formats with commonly used databases. This means your data protection platform must be able to address protection over both SQL and non-SQL datastores.
- Integrations with 3rd parties in your data and IT ecosystem is critical for ensuring business as usual and streamline operations. The data-protection tool must integrate with other governance tools such as Collibra, Informatica, Precisely, and identity stores such as Okta, Azure AD, and others, providing end-to-end automation, consistently enforced within context over sensitive data, data-processing tools and data consumers.
- Many data consumer / Personas access data for multiple different reasons. Privacy requirements such as GDPR, Sovereignty and data security are not fixated only on “Data analysts/Scientists” or analytical workloads, but rather on all pathways to the data. This means various data protection and privacy enhancing controls must be consistently enforced across all data access and data processing technologies and consumers. This further requires a broader definition of which persona are in scope for monitoring and access control, including:
- Privileged users: DBAs and other admin users represent significant risk as such users has access to the raw data as it is stored across the organization data stores. Covering the unique needs for data-protection for personals such as DBAs, developers, offshore teams is critical for ensuring a robust sensitive data security posture.
- Business users: Accessing data through cloud applications, home-grown and packaged applications requires contextual access control, de-identification and data privacy requirements to be enforced across various business processes.
- Segregation of Duties (SoD): Data Security Solution should ensure that cloud or platform administrators cannot re-identify and access sensitive data. This means you cannot use an API, External Functions or BYOK as all these ultimately allows the cloud administrator to gain access to your sensitive data in clear text. To mitigate this risk:
- Use a technology that ensures that it is impossible to bypass and ingest sensitive data to Cloud without deidentifying it beforehand.
- Ensure the Key is not shared with the Cloud service provider or the data platform providers.
- Avoid using APIs or External Functions to re-identify the data as these activities will expose the data in clear-text while in cloud.
- Ensure access to the sensitive data in clear-text is only made available for authorized users, back on premise or within your VPC.
- Monitor and analyze all data access and processing activities across the full data sets and data consumers. Detect suspicious activity by setting risk scores, behavior analysis, and by alerting and blocking requests in real-time.