To be competitive, modern companies must be agile and make smart business decisions based on data, not hunches. Unfortunately, important business data is often housed in different departments and managed by disparate teams. This results in siloed thinking, and prevents leaders from gaining a holistic view of the business.
Here’s how the disparity might look: The sales team manages data related to the CRM, sales conversion and more. Support holds the keys to customer success metrics. Procurement knows the facts on supply chain management, and so on.
The ultimate goal is attaining a single source of truth. For this you need a solution that enables data-driven decisions and a quicker time to insight. Plus, it must integrate data from multiple sources for centralized analysis and business intelligence (BI).
This is where an enterprise data warehouse (EDW) comes in.
What is an enterprise data warehouse?
A data warehouse is defined as a centralized data repository, sometimes called a database of databases, for reporting and analytical purposes.
An enterprise data warehouse (EDW) is a database of databases that houses data from all areas of a business.
EDWs store data from multiple departments, sources and applications to make centralized analytics available across an enterprise. This data typically comes from different systems, including: on-premises sources such as production applications and physical records, cloud sources such as customer relationship management (CRM), enterprise resource planning (ERP) and other web-based applications.
The data housed within an EDW is one of a business’s most important assets. It contains critical information that captures a view of the entire business — one that does not exist anywhere else.
Traditionally, data warehouses were hosted in on-premises data centers. The advent of cloud computing, however, has enabled “serverless,” cloud-based data warehouses where compute and storage resources can be independently scaled up and down.
Modern cloud data warehouses have become extremely accessible to organizations with modest resources. Examples include Databricks, Snowflake, Google BigQuery and AWS Redshift.
Why you need an enterprise data warehouse
Without a centralized data warehouse, departments often work in data silos. Teams independently pull in and analyze data from various sources. However, this is an error-prone, unrepeatable approach as the analysis is subject to the biases of the particular data importer.
Instead, enterprises need a data storage solution that enables repeatable and accurate reporting within departments and across the organization, for a 360-degree view. A single source of truth supports the following:
Immediate access to actionable data
Data analysts (and the business teams they support) can quickly build insights off the full dataset.
EDWs and the company-wide data housed in them provide a big-picture view of questions facing the enterprise. For example, in an ecommerce company, marketing might track how ads on social and earned media are driving website traffic upwards.
Department-driven metrics have their place, but a bigger impact comes from joining data between departments. For instance, if sales and marketing share an EDW, both functions could fully track the customer’s online path to purchase and work together to optimize the buyer journey — from discovery to conversion.
In this case and countless others, EDWs provide broader context and demonstrate the relationships between individual data points. This knowledge offers a better understanding of what the information means and how it can be put to use.
Consolidated and standardized data
EDWs are unique in that they store massive data sets from all over a business’s operations. When effective data transformation is applied, the data can be used for accurate comparison across the enterprise.
The standardized data store can also help make sense of seemingly random pieces of data flowing into the organization, saving valuable time by automatically and systematically aggregating the information.
Empowerment of less technical team members
When an EDW is combined with a data-driven culture, insights start to flow. Data professionals and decision-makers collaborate to streamline processes and impact revenue across the organization. An EDW benefits non-technical team members in all departments, including marketing, finance, HR, procurement and more. In these cases, teams can identify new, wider-ranging KPIs and gauge results, allowing key personnel to plan accordingly.
Benefits of a cloud enterprise data warehouse
Now that the business need for an EDW has been established, let’s discuss the benefits of a cloud-based EDW versus an on-premises solution.
Key features that cloud data warehouses offer include:
- Speed and scalability
- Lower total cost of ownership (TCO)
- Cloud elasticity and integration capabilities
- Better ability to enable self-service capabilities for business users
The three main providers of cloud data warehousing are Snowflake, AWS Redshift and Google BigQuery. Here’s how they compare:
Snowflake's platform delivers a fully elastic and highly flexible data warehouse that can collect, store, query and share data sets from a range of disparate sources, from structured data to JSON. With a fully integrated data lake, secure data sharing, data exchange and data application development workloads, Snowflake can easily scale up, down or out. This flexibility allows it to handle the constantly fluctuating data needs of the modern data enterprise, across departments, business units, geographies and clouds.
BigQuery is similar to Snowflake, except there is no concept of a compute cluster, just a configurable number of "compute slots." BigQuery on demand is a pure serverless model, where the user submits queries one at a time and pays per query.
Redshift is a part of the AWS suite of services that is aimed at development teams. With Redshift, you can query and combine exabytes of structured and semi-structured data across your data warehouse, operational database and data lake using standard SQL. Redshift lets you easily save the results of your queries back to your S3 data lake using open formats, like Apache Parquet. Then, you can add analytics from other services like Amazon EMR, Amazon Athena and Amazon SageMaker.
For more detailed information, see our 2020 Cloud Data Warehouse Benchmark.
How to evaluate enterprise data warehouses
Along with the main providers mentioned, there’s a slew of vendors offering centralized data storage in a cloud data warehouse, so beginning your search might seem daunting. While each company has specific needs, here are some key selection criteria that can help guide your decision.
Select a warehouse that complements your ecosystem
A cloud data warehouse should align with your business model and fit with existing systems. As mentioned, Snowflake, AWS, and Google Cloud all offer outstanding data warehouse options, as do Microsoft Azure and Databricks. Consider the rest of your infrastructure and existing data tool ecosystem to make certain your company’s data types and existing ecosystem is congruent with your EDW choice.
For example, conduct a compatibility assessment to confirm your cloud data warehouse provider will pair well with your data transformation, BI and data integration/ETL tools of choice.
Cloud data warehouse providers have different ways of calculating costs for compute and storage. Depending on your usage model, these cost configurations can have a drastic impact on your costs from month to month.
Perform a usage audit to check your model against your vendor choice to ensure costs align with your budget. To help you evaluate costs of migrating platforms, see our post on data egress costs.
Also, be sure to consider the labor cost associated with data migration and adoption of your new cloud data warehouse. Does your team have the internal resources to adopt the new data stack? Or will you need to engage with an external consultant? These questions are important to factor into your data warehouse decision.
Factor in the cost of scaling
As your company becomes more data mature, your data capacity and query volume and complexity will only rise. On top of the current data warehouse usage audit, build a projection of how these needs will grow over time. This exercise will help ensure you don’t spend on unneeded capacity.
Armed with this, ask your prospective vendors how cost will change as your needs expand and flex.
Don’t forget security
As your business data usage scales, so does the number of data sources. Security concerns for transferring and storing all this data increase as well.
Choose a cloud data warehouse that has features such as locking schemas, monitoring utilities, remote maintenance capabilities and similar functionality as baseline offerings.
Depending on your company’s use case, additional security components to shop around for include:
- Strong user authentication and authorization that will inhibit unauthorized access
- Data encryption
- Data erasure, which protects data with a complete override on physical storage to prevent malicious recovery
- Protection of data attributes via data shielding
Assess user access
In both regulated and unregulated organizations, ensuring proper data access is very important. When evaluating vendors, check for features such as read-only permissions, custom user groups with locked-down permissions, encrypted columns and tools to enable consistent account audit and clean up.
These access control processes vary widely among cloud data warehouse vendors.
Evaluate fault tolerance
Natural and manmade disasters are unavoidable, so your cloud data warehouse must have adequate fault tolerance. Ask questions such as: does your prospective cloud data warehouse provide adequate high availability and durability in the event of a catastrophic failure at a data center?
Specifically, find out if they provide proper backup as well as fault-tolerant and redundant hardware. These features and equipment will protect your data stack from loss due to power failures and potential equipment malfunctions.
Trust your peers
One last tip: Contact peers in your field — especially those who have similar data usage models to your company’s. Ask them their preferences on availability, ease of use and cost, and how they made their cloud data warehouse decision.