Big data requires big storage.
From data warehouses to data lakes and even data estates, a solid storage infrastructure is crucial for supporting modern businesses and applications — as long as it’s truly “modern,” that is.
But what makes a data warehouse “modern”?
Short answer: cloud infrastructure. Where traditional data warehouses and other storage solutions generally use on-premises infrastructure, modern data warehouses utilize cloud-based solutions to improve scalability, flexibility and, of course, return on investment.
However, the difference between traditional and modern data warehouses isn’t quite as clear as “cloud vs. on-premises.” In this article, we’ll explore the modern data warehouse, including its key differences, benefits and how the right tools can help you build and migrate to one with little effort.
What is a modern data warehouse?
Modern data warehouses use cloud technologies to deliver more flexible data processing and analytics from various data sources.
Okay, so that’s a pretty dense definition of a modern data warehouse. While our definition does cover every part of what makes a data warehouse “modern,” each part deserves a little more explanation — especially how it contrasts with more traditional data warehousing.
The key difference is that modern data warehouses are cloud-based data warehouses. All of the other differences (architecture flexibility, data sources, etc.) generally come from this difference.
But before we can truly compare each type of data warehouse, let’s take a closer look at the exact role data warehousing plays in the first place.
The purpose of data warehouses
A data warehouse is any system used for data analytics and reporting.
- Data analytics: Extracting useful information and insights from data that’s been collected, cleansed, transformed and modeled for a specific purpose. Common examples include business intelligence (BI) and machine learning.
- Reporting: Generating human-readable reports from the results of data analytics, often in the form of charts or graphs.
Just like an actual warehouse, data warehouses store data in a convenient location for completing these tasks. This data can come from sources like sensors, APIs and even blogs — similar to how an actual warehouse can take inventory from trucks, trains or other forms of transport.
Many traditional data warehouses are purpose-built for conducting analytics for business intelligence. As a result, they gather only specific sales data from specific sources, then conduct specific analyses to generate specific results like sales forecasts.
Modern data warehouses are far more flexible, providing a “mixed bag” of data from various resources that can be formatted and used for various applications. It’s like if our actual warehouses from earlier joined together to stock both pipe fittings and lumber, allowing construction companies to use whatever combination of those materials they might need.
Unfortunately, this is where our real-world analogy drops off. To provide such flexibility, modern data warehouses need to leverage equally flexible architecture.
Modern data warehouse architecture
Spoiler alert: A modern or cloud data warehouse doesn’t have any set architecture.
While that might seem anticlimactic, it’s one of the core benefits — if not the entire point — of modern data warehouses.
Since modern data warehouses are meant to be used for any number of purposes, they must be capable of adopting whatever architecture is necessary for the problem at hand. Here, traditional data warehouses are usually limited to the Extract, Transform, Load (ETL) process, making star schema perhaps the most common architecture.
Modern data warehouses are much less limited. Not only can they use star schema, but they can also use specialized architectures such as:
- Hybrid architectures: Utilize a combination of on-premises and cloud infrastructure, usually with on-premises resources only serving to augment the cloud where necessary.
- Massively Parallel Processing (MPP) architectures: Data processing is distributed across multiple nodes or servers.
- Lambda architectures: Process vast amounts of data using a combination of layers (batch, speed and service). Here, data is simultaneously fed to the batch and speed layers, with the batch layer supporting raw data processing and the speed layer supporting low-latency data not already delivered to the batch layer. Meanwhile, the service layer supports queries in real time. This architecture is common in big data applications.
Traditional vs. modern data warehouses
Location and architecture aren’t the only differences between modern and traditional data warehouses.
However, we’ll give credit where credit is due: these two key differences are what lead to all the others, such as differences in purpose, scope and even cost.
Here are some of the major differences to keep in mind.
Where traditional data warehouses are usually located on-site, modern data warehouses use cloud infrastructure to maintain flexibility. However, as mentioned earlier, modern data warehouses can also utilize a hybrid architecture where appropriate.
Since traditional data warehouses are hosted on physical, on-premises servers, they’re often purpose-built for particular workloads and data types. These workloads typically support decision-making for specific business areas.
By contrast, modern data warehouses process high volumes of various data (whether structured, semi-structured or unstructured) from various sources. As a result, they’re highly desirable for workloads where the volume of data exceeds the capacity of more traditional tools — a case that has become increasingly common.
By extension of fulfilling only specific purposes, traditional data warehouses are also limited to a specific scope, such as BI or online analytical processing (OLAP). Modern data warehouses are far less limited, being “free” to analyze and extract insights from big data characterized by the four Vs (Volume, Variety, Velocity and Veracity).
Traditional data warehouses often source data from more conventional sources, such as operational or transactional databases. While modern data warehouses can also pull data from these sources, they aren’t limited to them, either.
Instead, modern warehouses can use a range of data sources, including social media feeds, sensors, blogs, audio and video. This capability is also made possible by cloud infrastructure, where interfaces and resources can be allocated instantly in a virtual environment.
Since most traditional data warehouses support ETL processes, star schema is usually the most obvious architecture choice. Modern data warehouses can adopt various architectures to suit their workloads, including hybrid, lambda and MPP architectures. Note that most modern data warehouses use Extract, Load, Transform (ELT) rather than ETL to support larger amounts of data.
While this flexibility may sound expensive, the opposite is true: Cloud computing (and, by extension, modern data warehouses) has actually become more affordable than traditional, on-premises data warehouses.
Benefits of a modern data warehouse
By leveraging the flexibility and scalability of the cloud, organizations can enjoy greater flexibility and larger workloads without sacrificing time and money maintaining physical, on-premises data centers.
Here are some of the top benefits of modern data warehouses.
Lower upfront costs
Even as hardware costs continue to decrease, it’s still expensive to purchase your own equipment.
If the upfront costs weren’t enough, on-premises equipment also requires regular maintenance, increases power bills and depreciates over time to the point where it needs to be replaced.
By contrast, modern data warehouses’ cloud infrastructure takes advantage of economies of scale in computing. In other words, service providers dedicated to providing cloud resources have already invested millions of dollars in providing massive amounts of computing power — and the ongoing maintenance and upgrades that follow.
As a result, renting a relatively tiny fraction of cloud resources is much less expensive than purchasing and maintaining the equivalent on-premises.
Using cloud infrastructure eliminates the need for ongoing maintenance. This not only saves money and time but also eliminates potential errors and downtime due to scheduled updates, equipment replacement, security breaches and so on.
While cloud providers aren’t necessarily immune to these issues, they can usually guarantee near-constant availability and strong security while keeping hardware upgrades and replacements in the background.
Modern, cloud-based warehousing is typically much faster than its traditional, on-premises counterpart.
While part of this is due to more computing power and processing resources available, the use of ELT over ETL is another major contributor. Here, ELT can better leverage data replication tools to load vast amounts of data at once and then transform as needed rather than the other way around.
Since cloud infrastructure deploys virtual instances on top of physical, distributed hardware, providers can easily switch between various formats, data types and warehouse architectures — and even logically combine them. By contrast, traditional data warehouses that use relational databases are often limited to data of a similar type or format.
Easier to scale
Cloud flexibility is also synonymous with scalability. Again, since space is allocated virtually, it’s possible to allocate more or less resources at a moment’s notice. This makes it easier to take on larger workloads and import more data as you find new ways to utilize your data warehouse.
Migrating to a modern data warehouse
Thankfully, modern data warehouses are just as easy — if not easier — to migrate to as they are to use. However, to choose the right solution, you’ll need to consider your needs, goals and processes, as well as select the right architectures and integrations.
While there’s no one solution for everyone, following these steps can help you find the right solution for your organization.
1. Define your data goals
What do you want to achieve with your data? If you already have data processes or a warehouse in place, what are your pain points or bottlenecks? Is any of your data siloed?
These are just a few questions to ask yourself when defining your data goals. Answering them can also give you a better understanding of your current data estate, allowing you to identify problem areas or inefficiencies that might benefit from a modern data warehouse.
2. Identify business needs
The data goals from the previous step are likely linked to some type of business need. Whether that’s analyzing financial data or providing machine learning capabilities for an app, your warehousing solution will need to facilitate your data goals and satisfy the requirements of your customers and stakeholders.
Note that this step isn’t just about IT or even technology: It’s all about business. Try to understand what you’re trying to measure, why it matters to the business and how future trends and digital transformations might impact your insights. In doing so, you’ll be better suited to prioritize your business needs and deliver data solutions to meet them.
3. Know your core data processes
What are your data sources and how do you plan to (or currently) use your data?
This may be a simple question, but the answer will help identify the right warehousing solution. Data processes and business processes are strongly linked here in terms of how your customer data will define your data models and eventually inform your business processes.
4. Assess accessibility
Who and what will access your data — and how?
Though a modern enterprise data warehouse offers more than enough flexibility to adapt to various data sources and formats, you should also consider what connectors you’ll need, available metadata and how often you’ll be able to read data from your sources.
As with anything in IT, you should also consider your security governance as it relates to accessing your data warehouse.
5. Select the right architecture
Though modern data warehouses can take on various architectures, you may be limited based on the rest of your data estate.
Most data estates consist of a data lake, a data warehouse and a data mart.
Each of these layers flows into the next through your data pipeline. Raw data flows from the data lake into the data warehouse, where it’s cleaned, formatted and organized. From there, relevant data sets are delivered to business users and applications through the data mart.
Getting a clear view of this topology in your own organization is a great way to establish the architecture(s) you’ll need in your warehouse.
6. Use data integration tools
Though modern data warehousing allows you to gather data from a wide range of disparate sources, you’ll still need to combine them in a way that provides a single, unified view for advanced analytics and management.
This process is known as data integration, with ELT being the most well-known example. Most modern data warehouses rely on data integration tools to seamlessly centralize and store data.
7. Stay flexible
Your data goals and business needs will likely change over time. And so will your data sources, architectures and warehousing requirements as a result.
Staying on top of these changes means staying flexible in your approach. As you migrate to a modern data warehouse, it may help to start small as you test each new approach and strategy.
The modern data warehouse isn’t just convenient for big data: it’s necessary. Thankfully, it’s never been easier to build and use your own data warehouse.
Fivetran makes it easy to extract and load to your data warehouse from any number of data sources. With a range of connectors and lightning-fast data replication supported by change data capture (CDC), you can quickly move large volumes of data for migrations, advanced analytics and more.