Organizations must be security conscious for the following reasons:
- Regulatory compliance
- Managing brand risk
- Protecting customers from identity theft and breaches of privacy
- Protecting internal operations, sensitive data and trade secrets
- Ensuring system availability
- Basic ethics
Data security is fundamentally about preventing the exposure of sensitive data such as personally identifiable information (PII) to unauthorized parties. In the context of data integration or data movement, data security should be applied in the pipeline before data is loaded to a destination.
As you decide what data movement platform to adopt, use the following five criteria to evaluate security.
1. Compliance with laws, regulations and industry standards
There are a number of laws, regulations and industry standards or certifications concerning the safe and secure use of data. The following security standards cover a range of industries and jurisdictions:
- SOC 1 Type 2 – a data platform should undergo an annual, independent SOC 1 Type 2 audit. This standard allows customers to process data in the platform that is material for financial reporting.
- SOC 2 Type 2 – a data platform should undergo an annual, independent SOC 2 Type 2 audit. This standard demonstrates common security, availability and confidentiality controls are in place within the platform.
- ISO 27001 – these international information security standards require a vendor to:
• Systematically account for information security risks
• Design and implement information security controls and contingencies
• Maintain plans to ensure continued and ongoing compliance
- PCI DSS Level 1 – this is the most stringent of the Payment Card Industry Data Security Standards, and mandatory for merchants that process at least 6 million credit card/financial data transactions a year, such as retailers.
- HIPAA BAA – although data platforms are not healthcare providers or other HIPAA-covered entities, they should comply with the stipulated standards for protected health information (PHI) and should sign a business associate agreement (BAA) with parties who directly handle healthcare data
- GDPR – an EU-wide privacy rule positing that end users have the following basic rights regarding personal data:
• The right to access
• The right to be informed
• The right to data portability
• The right to be forgotten
• The right to object
• The right to restrict processing
• The right to be notified
• The right to rectification
- CCPA – similar to but more expansive than GDPR, CCPA is a California standard that encompasses household as well as personal data.
Third-party penetration testing can ensure that an organization’s claims are credible. Make sure any data platform you evaluate complies with those standards above that are relevant to your organization.
2. Column masking
Column masking allows the user of a data platform to identify and obscure sensitive data before it lands in a destination. It takes two main forms:
- Blocking data by excluding it from entering the destination altogether. Note that primary keys, due to their importance for idempotence and deduplication, cannot be blocked. Blocked data may still pass through our systems and be stored temporarily but will not be accessible through either the user interface or the destination.
- Hashing data by anonymizing and obscuring it while preserving its analytical value. You will still be able to use hashed columns as keys, joining records across data sets but will not be able to read the original value. Unlike encryption, hashing is one-way and not intended to be reversible. Make sure the data platform uses a unique salt for every destination so that general knowledge of the hashing algorithm isn’t enough to decode hashed values.
Look for a platform that implements these features as far upstream as possible in order to minimize the risk of exposure.
3. Access control
There are two ways to think about access control. The first is about access for external parties, including the vendors who provide the data movement platform. The second is about internal access.
In the first case, data platforms should securely store your credentials and allow you to revoke access at any time. In the second case, a data platform should offer role-based access control (RBAC). This enables fine-grained control over:
- Onboarding
- Access
- Auditing
- Monitoring traffic
You should be able to create roles with specific, granular kinds of authority and access. You should be able to easily assign users to roles at scale.
4. End-to-end encryption
A data platform should encrypt data in transit and credentials using the following methods:
- Credentials are encrypted through a key management service and optionally dual-encrypted using a customer master key, which you can withdraw access to at any time.
- All communication between the customer’s cloud and the provider’s cloud is conducted through PrivateLink, VPN or SSH.
- Data in transit is encrypted and decrypted using ephemeral keys, meaning that a unique key is generated for each execution.
The entire architecture should depend on the principle of “least privilege” – that is, only the minimum necessary permission is granted for any task. All customer data should be purged from the provider’s infrastructure after it is synced to the destination. The only exceptions should be the following, which are required to maintain the continued functioning of connectors:
- Customer access keys – the provider must access destinations in order to extract and load data to them
- Customer metadata – the provider stores configuration and other account details in order to display them to you
- Data from email and event stream connectors – since these sources don’t persistently store data, the provider does so in case future re-syncs are ever needed
The system’s architecture should also maintain layers of strict separation between anything a user directly interacts with and the pipeline itself, meaning that, by design, data is never erroneously exposed through the user interface.
5. Flexible deployment methods
You may have a number of reasons related to specific industries, jurisdictions and use cases for choosing specific regions for residency. The data platform should accommodate a range of different cloud providers and regions. These may include AWS, Azure and GCP, with a choice of over 20 major cloud regions worldwide, across North America, Europe, Asia and the Pacific. The platform should support geographically bounded US-only access, as well.
Another consideration is private networking through services such as PrivateLink.
If you have a specific need to keep data on-premise, consider a data platform that offers private, on-premises deployments for database replication.
[CTA_MODULE]