Implementing a modern data stack (MDS) — data integration tool, cloud data warehouse and business intelligence platform — is the best way to establish a successful analytics program as data sources and data volumes multiply. An MDS automates data integration, offers cost-effective, infinitely scalable compute and storage for analytics, and makes it easy to create shareable reports and dashboards. Here’s how to set up a modern data stack in three steps:
- Establish success criteria
- Estimate total cost of ownership
- Choose your tools and test them
Step 1: Establish success criteria
Before implementing a modern data stack, you need to understand the concrete benefits it will offer. How much time and money will your organization save? How will an MDS improve your analytics practice? Here are seven benefits every MDS should provide:
- Time, labor and monetary savings compared with the previous solution. A modern data stack should dramatically reduce your data engineering costs, primarily by eliminating the need to build and maintain data connectors and normalize data.
- Expanded capabilities of the data team. By increasing data sources without consuming engineering or analyst resources, a modern data stack should expand the capabilities of your data team.
- Successful execution of new data projects, such as customer attribution models. More time and data mean your team will be able to focus on new analytics projects.
- Reduced turnaround time for reports. A modern data stack should dramatically shorten report generation time, ensuring up-to-date reports.
- Reduced data infrastructure downtime. A modern data stack should improve reliability and eliminate your maintenance burden.
- Higher rates of business intelligence tool adoption within your organization. By combining automated data integration with a modern, intuitive BI tool, a modern data stack should be able to increase data literacy manyfold.
- New metrics that are available and actionable. With additional data sources and an easy-to-use BI tool, a modern data stack should significantly boost the number of metrics used in decision-making.
Step 2: Estimate total cost of ownership
As much as you’re enamored by the prospect of what a modern data stack might do for your organization, you still need to appease your CFO and CEO before making this important decision — which means you need to speak their language in terms of dollars and cents: How much will it cost? When do we get our money back?
To answer these questions, you’ll need to estimate total cost of ownership (TCO) for each of the core technologies of a modern data stack: data connectors, data warehouse and business intelligence tool. That means comparing your current workflow with available MDS technologies, and considering a range of factors, both quantitative and qualitative.
Data integration TCO
Calculating the cost of your current data pipeline might require a careful audit of prior spending on data integration activities. You’ll need to consider sticker price, costs of configuration and maintenance, and any opportunity costs incurred by failures, stoppages and downtime.
A simpler but useful initial calculation involves figuring out how many data engineers contribute to maintaining your current data pipeline, and what percentage of time they devote to such. As evidenced by the customer stories in this ebook, you should have a clear line of sight to dramatic annual cost savings that exceed the average salary of a data engineer, not to mention the value of enabling your analysts to make timely, accurate decisions based on up-to-date data.
When considering automated data integration solutions, keep in mind that the following costs apply:
- The cost of subscription for one year, which may be flat or (more likely) based on consumption. There are many different kinds of pricing available, but monthly active rows (MAR) is one standard.
- Although some tools offer out-of-the-box transformations, you may still have to write your own. Your mileage will vary based on how complex the reports and dashboards you want are.
Labor costs for an automated solution should be very low, and measured in minutes or hours per year rather than weeks. Your analysts, engineers and data scientists should be free to spend more time analyzing data or building predictive models.
Data warehouse TCO
Traditionally, data warehouses were built from scratch and installed on-premise in data centers, a process that incurred substantial hardware, software, labor and expertise costs. Modern cloud data warehouses range from architectures that resemble on-premise data centers (but online), to purely “serverless” architectures that can instantly scale compute and storage resources as needed.
Pricing can be highly variable. For more traditional cloud data warehouses, you may have to forecast your computation and storage needs and carefully design your architecture, much as you would for an on-premise setup. In general, you will most likely need to consult the pricing schedules of individual vendors, and may have to run tests to determine exactly how they calculate compute costs. For some full-service data warehouses, you might be able to find a flat, monthly fee.
BI tool TCO
Business intelligence tools typically bill on a monthly basis. Pricing is often adjusted by the number of seats/users your organization reserves, typically with smaller per-person costs as your subscription grows.
Outside of explicit monetary costs, however, the most important considerations for a business intelligence tool are its features, performance, and whether your team has the manpower and expertise to fully leverage it. A poor choice of BI tool can easily produce delays or additional work for your team if you are not careful.
Step 3: Choose your tools and test them
Do your homework! Implementing a modern data stack is one of the most important infrastructure decisions you will make — and if you choose each layer of the stack wisely, you will be set for years. Besides talking to customer references for each data integration tool, data warehouse and business intelligence tool, you should also read industry reports from analyst firms and cloud data warehouse benchmark reports. You can also check peer review sites like Gartner Peer Insights.
As noted above, it’s very important to factor in compatibility and future-proofing in your decision. Make sure the data integration tools you are considering are compatible with the data warehouses and BI tools you have or are evaluating, and also consider open-source tools like dbt for transformations.
Here are specific criteria to consider for each technology. Before we look at data integration tools, we’ll start a little downstream and think about what features you will need in a cloud data warehouse and business intelligence tool.
Criteria for choosing a cloud data warehouse
Your data warehouse will be the repository of record for your organization’s structured data, and you should choose one carefully. Here are the nine criteria you should focus on:
- Centralized vs. decentralized data storage. Does the data warehouse store all of its data on one machine, or is it distributed across multiple machines, trading redundancy for performance?
- Elasticity. Can the data warehouse scale compute and storage resources up and down quickly? Are compute and storage independent from each other or coupled together?
- Concurrency. How well does the data warehouse accommodate multiple simultaneous queries?
- Load and query performance. How quickly can you complete typical loads and queries?
- Data governance and metadata management. How does the data warehouse handle permissions and regulatory compliance?
- SQL dialect. Which dialect of SQL does the warehouse use? Does it support the kinds of queries you want to make? Will your analysts have to change their current syntax?
- Backup and recovery support. If your data warehouse somehow gets corrupted or breaks, can you easily revert to a previous state?
- Resilience and availability. What about preventing database failures in the first place?
- Security. Does the data warehouse follow current security best practices?
Criteria for choosing a business intelligence tool
Business intelligence tools enable you to easily build reports and dashboards, but different tools have different strengths and weaknesses. Here are the key factors to consider:
- Seamless integration with cloud data warehouses. Is it easy to connect this BI tool to your cloud data warehouse of choice?
- Ease of use and drag-and-drop interfaces. Ease of use is especially important to popularizing data-driven decisions across your organization
- Automated reporting and notifications. Does the BI tool allow you to schedule reports to publish automatically? What about alerting users when the data changes?
- Ability to conduct ad hoc calculations and reports by ingesting and exporting data files. Your data team might sometimes want to explore data without the overhead of having to go through a data warehouse first.
- Speed, performance and responsiveness. Basic quality-of-life considerations are important, like dashboards and visualizations loading in a timely manner.
- Modeling layer with version control and development mode. Does the BI tool allow your analysts to work collaboratively by sharing data models and code?
- Extensive library of visualizations. Pie charts, column charts and trendlines can only take you so far. Does the BI tool feature specialized visualizations like heat maps or radar charts? Does it allow you to build your own custom visualizations?
Make sure any data warehouses and BI tools you evaluate are compatible with each other. It also pays to carefully review a range of perspectives on different tools. Publications like Gartner often aggregate such information.
Criteria for choosing a data integration tool
There are many data integration tools in the market, and their technical approaches and feature sets vary significantly. Here are the foremost factors to consider when you’re choosing a data integration tool:
- Data connector quality. Take these factors into account when evaluating connector quality:
- Open-source vs. proprietary. There are more open-source connectors for a wider range of data sources, but proprietary connectors tend to be of higher quality, cover a broader range of data requirements, and integrate more seamlessly with other elements of a data stack.
- Standardized schemas and normalization. Data from API feeds is not usually provided in normalized form, but normalization fosters data integrity and lends itself to schema standardization.
- Incremental vs. full updates. Incremental updates using logs or other forms of change data capture to allow for more frequent updates that do not interfere with business operations or waste resources.
- Support for sources and destinations. Does your prospective tool support your sources and destinations? If not, does the provider offer a way for customers to suggest new sources and destinations? Do they routinely add new ones?
- Configuration vs. zero-touch. Zero-touch, fully managed tools are extremely accessible, with connectors that are standardized, stress-tested and maintenance-free. Configurable tools require engineers who have experience with scripting languages, orchestration, and software development.
- Automation. Integration tools should remove as much manual intervention and effort as possible. Consider whether a tool offers features like automated schema migration, automatic adjustment to API changes, and continuous sync scheduling.
- Transforming within vs. before the data warehouse. Avoid tools that use a traditional ETL (extract-transform-load) approach. Under a more modern approach known as ELT (extract-load-transform), transformations are performed in an elastic, cloud-based warehouse, allowing limitless resource scaling.
- Recovery from failure. Bugs and errors will inevitably crop up over the course of data integration; you don’t want to permanently lose data as a result. Find out whether your prospective tools are idempotent and perform net-additive integration.
- Security and compliance. These are key areas, both in terms of data protection and public perception. Specifically, learn whether prospective tools offer:
- Regulatory compliance
- Limited data retention
- Role-based access
- Column blocking and hashing
Many publications offer aggregate reviews and ratings of data integration tools, as they do for data warehouses and business intelligence tools. Be sure to comparison-shop, and make sure the data integration tools you are considering are compatible with your prospective warehouses and BI tools.
Test before you buy
Once you’ve narrowed your search to a few specific tools, test them in a low-stakes manner. Most will offer free trials for a few weeks at a time. Set up data connectors between your sources and data warehouse, and measure how much time and effort it takes to sync your data. Perform some basic transformations. Add a BI tool and create a few dashboards and visualizations.
During this process, be sure to set aside dedicated trial time for your team, and encourage them to stress-test the new systems in every way imaginable. Compare the results of your trials against your technical criteria and standards for success. Don’t hesitate to contact support teams with questions, concerns or requests for validation. There’s no need to wonder whether a modern data stack will help you achieve your analytics goals — you should be able to experience success firsthand.