There are many ways to mentally model the process of building a data team and all of its associated processes and operations. You can envision it as a progression of increasingly sophisticated capabilities and processes, beginning with data extraction and loading and ending with artificial intelligence and business process automation. You can also frame it in terms of moving from the reactive, ad hoc use of data to predictive and innovative uses.
Unfortunately, these models do not offer a great deal of detailed guidance on getting started. Here, we’ll take a step back and discuss practical, concrete steps to begin the process.
There are six practical steps in question:
- Rule out barriers to getting started
- Decide whether to migrate or start fresh
- Evaluate the tools and technologies you will need
- Calculate total cost of ownership and ROI
- Establish success criteria
- Set up a proof of concept
1. Rule out barriers to getting started
At Fivetran, we are proponents of the modern data stack and the extensive use of third-party, cloud-based providers. There are, however, legitimate reasons not to consider this approach.
The first and most obvious reason is if your organization is very small or operates with a very small scale or complexity of data. You might not have data operations at all if you’re a tiny startup still attempting to find product-market fit. The same might be true if you only use one or two applications, are unlikely to adopt new applications and your integrated analytics tools for each application are already sufficient.
A second reason not to rely on a modern data stack is that it may not meet certain performance or regulatory compliance standards. If nanoseconds of latency can make or break your operations, you might avoid third-party cloud infrastructure and build your own hardware.
Lastly, maybe your organization is in the business of producing its own specialized software products, and using or selling the data produced by their software. What if you are a streaming web service that produces terabytes of user data every day and also surfaces recommendations for users?
Even so, in the latter two cases, your organization may still outsource data operations for external data sources. Otherwise, if your organization is of sufficient size or maturity to take advantage of analytics and data refresh cycles of a few minutes or hours are acceptable, proceed.
2. Decide whether to migrate or start fresh
Data integration providers should be able to migrate data from old infrastructure to your new data stack, but the task is a hassle because of the volume and intrinsic complexity of data. Whether your company decides to migrate or simply start a new instance from scratch depends heavily on how important historical data is to you.
Consider the cost of ending existing contracts for products or services. Beyond cost, familiarity with and preference for certain tools and technologies are also important considerations.
Ensure that prospective solutions are compatible with any products and services you intend to keep.
3. Evaluate the tools and technologies you will need
A modern data stack consists of specific tools and technologies. You will need a data warehouse, data integration tool, business intelligence platform and transformation layer. Most commercial offerings are of high quality and differ mainly in slight details.
Modern data warehouses are more alike than different in terms of pricing and performance. The choice is largely a matter of trading off between tunability and ease of use.
Chief considerations for choosing a data integration tool include support for sources and destinations, the performance and reliability of data connectors, support for transformations and the degree to which the tool simplifies and automates the process of connecting data from sources to destinations.
For business intelligence platforms, user friendliness, performance, the types of visualizations supported and support for collaboration through version control are key considerations.
On a related note, we highly recommend transforming data at the destination under an ELT (Extract, Load, Transform) architecture. The simplest approach is to simply create new data models in SQL, but with the help of dedicated data transformation tools, your team can apply CI/CD and other engineering best practices.
Most importantly, you must ensure that interoperability is possible between the different tools and technologies you’ve chosen. There are many additional considerations that go into choosing the elements of your data stack. For more information, check this more detailed checklist for evaluating each of these tools.
4. Calculate total cost of ownership and ROI
The modern data stack promises substantial savings of time, talent and money. Compare your existing data integration workflow with the alternatives.
Calculate the cost of your current data pipeline. The main factor is likely the amount of engineering time your data team spends building and maintaining data pipelines. This may require a careful audit of your project management practices.
You’ll need to consider the sticker price of the tools and technologies involved as well. Finally, you’ll need to consider any opportunity costs incurred by failures, stoppages and downtime. Include the costs of your data warehouse and BI tool as well.
On the other side of the ledger, you should evaluate the benefits of the potential replacement. Some may not be very tangible or calculable (i.e., improvements in the morale of analysts), but others, such as time and money gains, are readily quantified.
5. Establish success criteria
An automated data integration solution can serve a number of goals. Base your success criteria on the following:
- Time, money and labor savings – A modern data stack should dramatically reduce your data engineering costs by eliminating the need to build and maintain data connectors. Labor savings may amount to hundreds of hours of engineering time per week, with the corresponding monetary figures. You can use our calculator to get a high-level estimate.
- Expanded capabilities – A modern data stack (MDS) should expand the capabilities of your data team by making more data sources available without additional labor.
- Successful execution of new data projects, such as customer attribution models – More time and data sources allow your team to build new data models, including those that track the same entities across multiple data sources.
- Reduced turnaround time for reports – A modern data stack should dramatically shorten the turnaround time for reports, ensuring that key decision-makers stay up to date.
- Reduced data infrastructure downtime – A modern data stack should dramatically improve reliability and virtually eliminate your maintenance burden.
- Greater business intelligence usage – By combining automated data integration with a modern, intuitive BI tool, a modern data stack should promote data access, literacy and usage across your organization.
- New available and actionable metrics – With additional data sources and an easy-to-use BI tool, a modern data stack should enable new metrics and KPIs for decision-making.
6. Set up a proof of concept
Once you have narrowed your search to a few candidates and determined the standards for success, test the products in a low-stakes manner. Most products will offer free trials for a few weeks at a time.
Set up connectors between your data sources and data warehouses, and measure how much time and effort it takes to sync your data. Perform some basic transformations. Set aside a dedicated trial time for your team and encourage them to stress-test the system in every way imaginable. Compare the results of your trial against your standards for success.
While you may have ruled out technical barriers, there are also institutional barriers to adopting a modern data stack. Your data team could lack funding, headcount or expertise. Data engineers might be protective of the systems they have already built and help maintain. Leaders might not immediately recognize the power offered by the ability to rapidly scale data integration. It’s important to earn buy-in from someone with the authority to purchase the necessary tools and technologies, and to carefully cultivate a modern data mindset from the very start of your journey.
A carefully constructed minimum viable product (MVP) demonstration that proves the worthiness of the modern data stack on a single data source, report or test case can accomplish exactly that.