How to handle HIPAA concerns with cloud data warehouses

Achieve HIPAA compliance and reduce risk while offloading some of your operational burden.
August 24, 2021

How do you balance an accessible data warehouse with data protection and HIPAA Compliance? To get the most value from your data, it should be available to everyone in your organization who can benefit from the data analysis, insights and value it holds. But if you’re a US-based organization and you deal with sensitive personal health data, you will need to strike a balance to remain in compliance.

HIPAA regulations are designed to keep sensitive personal health information private, and to force organizations to take precautions to comply with its provisions. Let’s break the process, and those provisions, down in detail.

Any data warehouse provider can sign a Business Associate Agreement, so any business can be configured to be HIPAA-compliant by following certain practices to handle and exchange protected health information (PHI). Complying with HIPAA is a shared responsibility between your organization and the data warehouse provider you use. That could be Amazon Redshift, Snowflake, Google BigQuery, Microsoft Azure Data Synapse, or another provider. Fivetran supports the vast majority of data warehouse providers, including all those just listed.

While the HIPAA compliant data warehouse provider manages operational issues, it’s incumbent upon each business to properly configure the safeguards necessary to provide HIPAA compliance —  though you should also be aware that the US Department of Health and Human Services, which administers HIPAA, doesn’t recognize any certification for HIPAA data compliance.

Read the Fivetran Security Whitepaper for a deeper treatment of ETL security.


HIPAA lays out a Security Rule that establishes national standards to protect individuals’ PHI. It specifies administrative, technical, and physical security safeguards meant to assure the confidentiality, integrity, and availability of PHI. Administrative procedures include risk analysis, risk management, a sanction policy against anyone who fails to comply with the procedures, and reviews of system activity via audit logs, access reports, and security incident tracking reports. Physical safeguards include protecting systems, equipment, and buildings from natural hazards and unauthorized intrusion. Technical factors include access control and encryption.

Let’s look at how these safeguards play out when it comes to using a data warehouse.

System access — HIPAA compliant data retention follows a “minimum necessary standard” when it comes to granting access to systems. Users should be granted the least privileges they need to perform their duties. All HIPAA-covered entities and business associates must restrict access to, and disclosure of, PHI to the minimum amount necessary. Thus, if you’re concerned about HIPAA compliance, you should learn about identity and access management (IAM) best practices.

When it comes to covered systems, only necessary staff should have access to the data warehouse. An often overlooked corollary to that rule is that organizations should use separate development and production environments and avoid storing real PHI in development environments.

Most systems support, and most organizations use, role-based access control (RBAC) to restrict access to systems and data. With RBAC, individuals authenticate with their own credentials and are assigned roles that control what rights they have to the data. Different systems provide different roles, and for HIPAA compliance, it’s best to be able to control access not only to systems, databases, and tables, but even to individual columns within tables. All cloud data warehouses provide these capabilities and they are key to creating a HIPAA compliant database.

Which brings up another point: We’ve been focusing on cloud data warehouses, given their advantages. HIPAA compliance is far more complicated for organizations that maintain legacy on-premise data warehouses. If your organization is one of them and you’re looking to add HIPAA compliance, we suggest using the need for HIPAA compliance as a compelling reason to migrate from your old system to a more scalable, more available, and less expensive cloud data warehouse.

The migration to a cloud-based data warehouse may seem costly, but when it’s complete your analysts will have faster access to data, your operating costs will go down and with the right processes and practices, your compliance risk is reduced by relieving yourself of responsibility for operational issues.

Data warehouses change all the time, with the addition of new tables and changes in schemas. Keeping your security roles up to date to avoid accidentally exposing data is a challenge. One way to ease the challenge is to avoid replicating new data tables and columns from data sources without explicitly specifying how they’ll be secured at the data warehouse destination. You can use column masking or column blocking in the data pipeline to avoid storing sensitive data. Or you can create bucketing datasets, where you group records (for example, “women aged 18-35”) rather than maintain identifiable PHI.

In accordance with the principle of least privilege, you should also have policies in place that allow access to be granted only for a specific purpose, and for a specific length of time. In other words, just because a given role has the privilege to see certain data, doesn’t mean it should be able to see it under all circumstances.

Not all data warehouses build in this capability, so it may be something you need to build into your own processes. One approach is to allow no access by default, but when someone with the proper permissions is authenticated, have the system create a user account on the fly that has access only to the necessary resources, and automatically remove or disable it when the task is completed.

Encryption — HIPAA requires that organizations encrypt PHI. Data should be encrypted both at rest, within the data warehouse, and in transit between the repository and the client systems that run queries against them. Transport Layer Security (TLS), an improvement on the older Secure Sockets Layer (SSL) protocol, is the encryption protocol most organizations use. Backups and log data should also be encrypted.

To maintain privacy, some systems support homomorphic encryption, with which you can perform operations on encrypted data without decrypting it. When you decrypt the results, no personally identifiable information (PII) remains.

Audit logging — HIPAA requires that organizations collect and analyze audit logs related to PHI access to detect suspicious activity. It should go without saying that organizations should regularly review audit logs to make sure they remain in compliance.

Availability — One of the least-considered provisions of HIPAA requires that PHI must be available in case of emergency even during service outages. Organizations should have redundant resources, available backups, and a disaster recovery policy and plan. If the provider has the capability, the data warehouse should be configured for high availability across multiple data centers and availability zones to minimize the impact of a service outage.

Putting best practices into play

To follow these practices, your organization should start by establishing policies and procedures that cover how sensitive data is to be used. You should classify data and state which tables and columns fall into the category of PHI. You can then assign appropriate access controls using the security mechanisms offered by the data warehouse provider.

You must monitor data access to ensure that it’s happening only as necessary and only by appropriate people. You should also have your HIPAA data compliance practices audited at least annually to make sure they still comply with HIPAA guidelines.

As we said at the start, a data warehouse should be useful to as many people as possible — but the more security you layer in, the less useful it becomes. The challenge for data analytics professionals is to find a balance.

For example, an organization may maintain compliance by creating a staging table for analysts to use that contains redacted or anonymized data, or by using column masking or column blocking, but it should try not make privacy controls so aggressive that they remove so much value that analysts can’t use the data.

The role of data integration software

This post has been about data warehouses, but you can’t have a data analytics stack without data integration software like Fivetran, and a data pipeline can help you achieve HIPAA compliance.

If you identify PHI in a data source, you can decide not to replicate it to your data warehouse, or replicate only certain tables and columns. A reliable automated data pipeline helps ensure data availability and timeliness.

We’re also focusing only on HIPAA; we have a security white paper that talks about other common privacy and security certifications your organization might care about.

Fivetran supports all the major cloud data warehouses: Amazon Redshift, Snowflake, Google BigQuery, Microsoft Azure Data Synapse, and more. We can be part of a robust HIPAA-compliant data analytics solution. Ask us for a demo.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.