Cloud skeptics and critics are often quick to note that cloud computing isn’t new.
In the 1970s, companies commonly accessed mainframe computers via telephone lines to remotely share managed computing resources — similar to how cloud services are shared today. But the current glut in available data is unique in the history of humanity, and managing unprecedented amounts of information presents dramatically different problems than those faced in the 1970s.
To understand the big data deluge, consider that in 2020, the World Economic Forum estimated that the existing amount of data equaled 44 zettabytes (ZB). The WEF reported that bytes in the digital universe outnumbered stars in the observable universe by a factor of 40. The rise of IoT devices and huge amounts of consumer tracking data have generated a significant percentage of that mind-numbing growth, and will continue to contribute floods of data for the foreseeable future.
If you’re a chief data officer (CDO), you need to take a hard look at this tidal wave of data, and figure out how to surf it before it crashes down on top of you.
Any data management system will need to contend with not only unprecedented scale, but also an enormous diversity of data sources and data types. For companies to uncover insights buried in gargantuan data sets and then share them across different business units, they’ll need to move that data into a central location, transform it into a state that is fit for analysis and then make analytics tools available across the company. In short, they need a modern data stack (MDS) — a set of cloud-native technologies for managing enormous and heterogeneous data sets.
While on-premises servers won’t disappear soon, the cloud and MDS have the power to fundamentally simplify the way they approach large dataset analysis.
Key considerations for other MDS components
Here are some of the key MDS components that help CDOs demonstrate results, fast. For more on this topic, download our free ebook, The Chief Data Officer's Guide to Generating Impact.
Fully managed ELT data pipeline: The MDS begins with a fully managed extract-load-transform (ELT) solution, such as Fivetran. The technology should be able to centralize and transform data from hundreds of SaaS and on-prem data sources to a cloud-based destination. Data engineers can connect new data sources in minutes. Any changes to the data source schemas or APIs are handled automatically, in the background, without affecting the flow of data. That frees up the engineers to focus on more impactful work, like creating new data analysis tools or models — and enables the CDO to demonstrate meaningful results far sooner.
Data warehouse: This MDS component is a cloud-based data platform that enables organizations to store data in a consistent manner and analyze data sets from a wide variety of sources in one place. Some of the top players here include Snowflake, Amazon Redshift and Google Big Query. Before choosing one, you should determine the kind of data, data security capabilities, management capabilities and price involved — see our guide on evaluating cloud data warehouses for more detail.
In some cases, when you have a lot of unstructured data to manage, a data lake may be a better choice than a data warehouse. Also worth considering: A data repository that has characteristics of both a data lake and a data warehouse such as Databricks, also known as a data lakehouse.
Data transformation: Data transformation describes the conversion of raw data into formats that make it easier to analyze and interpret. A popular choice here is dbt. You’ll want a solution that provides a clear picture of how a transformation affects tables and also provides the ability to track data lineage. Version control features and clear documentation are critical.
Data visualization: The days are long gone when database experts were the only people responsible for interpreting and then distributing data to the rest of the company. Modern business intelligence and visualization tools must be accessible and easy enough to understand so that people in any business group can make use of them in a self-service way. Tools in this category include Tableau, Looker, PowerBI and more.
The cloud promises infinite scale and the ability to handle limitless quantities of data. But to manage that data effectively and turn it into insights that make a difference for businesses, CDOs need a technology stack that can handle the scale and velocity of these data flows. That means using a modern data stack to radically shorten the time to value for data — and to begin using data to move the company forward.