Data marts are subsets of data warehouses, meant to be accessed by users in specific lines of business or teams. Like data warehouses, data marts are “single sources of truth” that guide and support decisions made by users. Data marts are also meant to easily integrate with business intelligence platforms, allowing analysts to quickly build reports and dashboards. Unlike data warehouses, data marts are not comprehensive. They don’t include all of an organization’s data. Rather, they may include some combination of finished, analysis-ready data models and selections of raw data.
Traditionally, data marts were physically separated from data warehouses. They could be constructed top-down, by spinning off data models from a single enterprise data warehouse into separate machines for specific lines of business to access. They could also be constructed bottom-up by combining the data warehouses of individual business units into a single enterprise-wide data warehouse. This definition prevailed before the emergence of the cloud and is now outmoded.
How to use a data mart
The modern concept of a data mart is an organizational structure within a data warehouse. The purpose of a data mart is to compartmentalize analytics for the convenience of specific teams and business units. As organizations grow, data models become more complex and teams become more specialized, it becomes more important to curate data for specific users.
Specifically, curating and compartmentalizing data allows an organization to:
- Manage access and permissions
- Avoid overloading analysts and other end users by offering a set of data models that are easier to search and navigate
- Make data models easier to manage by organizing them hierarchically
Traditionally, data marts were separate machines that contained small tables of typically fewer than 100,000 rows and were completely siloed. The modern approach takes place on a single cloud-based data warehouse and involves building logical data models, such as views and materialized views. Combined with rigorous data governance practices, this simpler, more flexible approach does not require separate machines, enables granular levels of access as needed, and leverages the scalability of a cloud-based data warehouse.
Depending on how an organization implements its technology and organizes its analytics team, the specifics of ownership and access for a data mart can vary. In some cases, teams and business units may be wholly responsible for their own data marts, and the data marts may effectively be siloed. In other cases, boundaries and access may be looser.
Combining a data mart with a data warehouse
Although data marts as traditionally defined are obsolete, the modern approach of using views and materialized views to divide your data into models for specific teams and business units remains valuable. A simple data stack combining data marts with a data warehouse could look like the following, where every data mart is assigned to a separate team or line of business:
- Sources
- Data pipeline
- Data warehouse
- Data marts
- Business intelligence tool
Your exact configuration will depend on your exact use case, the size and composition of your company, and the skill sets of your analysts and engineers.
Data warehouses, data marts and data lakes are all destinations for centralizing data. They form the lynchpin of the modern data stack, a suite of tools and technologies used to make data from disparate sources available on a single platform. These activities are collectively known as data integration.
To learn more, download The Essential Guide to Data Integration.