The recent Modern Data Stack Conference EMEA, held this past May, demonstrated that a new approach to analytics can deliver spectacular results. But the steps to reaping those rewards aren't always clear. Nas Radev, data architecture expert and CEO of Infinite Lambda, walked his audience through the fundamentals of building a modern data stack from scratch.
Having implemented nearly 30 modern data platforms across 14 industries, Radev has rich experiences to share on the practicalities of building a game-changing solution from the ground up. The function of a modern data stack, he explained, is to facilitate the ingestion of data from multiple sources into a centralized place where it can be analyzed. The components of a modern data stack are:
- Data ingestion solution
- Data warehouse or data lake
- Transformation tool
- Business intelligence (BI) tool
Assembling them is about picking the best-fit combinations for your business.
Step 1: Identify data sources and the right ingestion tool
The starting point is working out exactly what it is that you're trying to calculate and measure, according to Radev. This will steer you towards the type of data you need to source, whether it’s transactions from a platform like Shopify to be used to steer ecommerce strategies, or clicks from Facebook Ads to inform marketing spends. “Once you know your use case, you need to understand where the data is that will support that use case,” he said.
There are many ingestion tools to choose from, from simple open-source software to more automated, lower-maintenance solutions. “Obviously you should have a list of the sources you want to connect to and then pick a tool that is able to connect with them out of the box,” said Radev. “That's always the easiest.”
Step 2: Plan for transformation and choose a data warehouse
The function of the data warehouse is to provide a central repository for sourced data and prepare it for reporting and analytics. Radev also recommends using a data lake for all historical data, whether it’s used for analysis immediately or later. With the help of the right data ingestion tool, it should be nearly effortless to move data from your sources to your repositories.
An open-source tool like dbt will further simplify the process, coupling with connectors like Fivetran to transform data that’s been loaded into the warehouse. Different parts of the stack can be used at different stages for cleansing, integration and preparation for reporting. They will only work if there is a consistent business logic and if you use data models designed to properly represent the real world.
There is no shortage of “tremendous solutions” when it comes to choosing a cloud-based data warehouse, according to Radev. As with all the modern data stack components, he stressed the importance of finding the best fit for the business. “If you have an internal technical team, make sure you consult with them because it's good to have a cohesive tech strategy as opposed to picking a tool because it has the most features,” he said.
Step 3: Pick a BI tool and address the use case
With a modern data stack in place, you can tackle the practicalities of the use case that prompted the investment. “You can improve your customer support by integrating some of these insights with customer support tools; you can even start pushing the stack towards machine learning tools to get some recommendations and forecasting,” he said.
Key takeaway: Data quality across the stack Is critical
Your component choices will depend on your needs, but Radev warned that an abiding principle stays the same. “The most important thing I'm going to say is that if you don't obsess about data quality, you will fail,” he said. “If you can earn trust and people start being more data-driven in day-to-day decision-making, you have succeeded. If they don't because they don't trust your data, that's when you fail.”
Radev recommends quality checks at each stage in every project — from ingestion through to transformation. Different solutions with varying capabilities are fine, but they need to be tested constantly. “Validate, validate and validate again,” he said. “Once you are absolutely certain you are not going to upset anyone by giving them bad data — only then should you actually promote things into a production environment.”