Just a few years ago, data professionals relied entirely on remote infrastructure for infinite compute and storage. But today, armed only with a modern laptop and Python notebooks, they can reimagine their data infrastructure. MotherDuck allows users to perform high-speed data analysis on laptops, breaking the traditional reliance on remote servers.
Using the new Fivetran Partner SDK, MotherDuck has developed a new destination connector so MotherDuck users can consume normalized data from any of Fivetran’s 500+ data sources. That means users can process data quickly on their local machines, interacting with cloud-hosted data only when necessary.
MotherDuck’s co-founder and CEO Jordan Tigani sat down with Fivetran’s co-founder and CEO George Fraser to discuss these surprising evolutions in data processing, data lakes and more.
Fast laptops are changing the modern data user experience
The modern data landscape is undergoing a transformation, driven by the impressive capabilities of today's hardware. Compute and storage, including for personal devices, have outpaced the practical data sizes normally used for analytics to the point that Apple M2-powered laptops can outperform traditional cloud data warehouses. Moreover, complex, distributed data warehouse architectures are no longer necessary. Tigani points out, "Servers are actually huge now and you don't need to split things up the way you used to."
Fraser agrees: “The hardware is growing faster than the data sizes people actually routinely use.” He believes that “giant dataset legends are driven by terrible data pipelines.” Poor design and implementation have led to massive duplication of the same data, creating the illusion that infrastructure needs to be bigger than necessary. Fivetran, for example, easily manages change data capture (CDC) for most data sources on a single node without sharding or scaled infrastructure.
“People are starting to wake up to the idea that the most salient feature about data is not how big it is,” notes Tigani. “They’re also realizing they spent a ton of money building infrastructure and not getting much value out of it.” The most salient data feature is the data’s user experience. By designing around user experience and not dataset size, companies can get to value much faster with actionable insights.
All your use cases belong on one data platform
Data processing workloads like artificial intelligence and machine learning require large data volumes. Companies are turning to centralized data lakes for predictive and generative AI use cases. Data lakes are a great choice for AI initiatives, but the architecture needs to be right.
Many companies have “built multiple data platforms for different workloads, which you should not do,” Fraser cautions. It’s better to share one data platform for “as many workloads as possible,” whether it’s AI, ML or analytics and reporting.
Having multiple data platforms will inevitably multiply the costs and efforts of managing pipelines. A single data platform can simplify the operational and administrative burden for most data use cases. “It's a lot of work building and maintaining data pipelines,” Fraser adds. “Having two separate systems, just because you have two different goals, is something I'm always urging people not to do.”
Tigani believes having one data platform comes down to data trustworthiness and curation. For example, the biggest benefit of a data lake is that it supports many kinds of structured and unstructured data, but he warns, “If you’re trying to make decisions based on using AI and you haven’t carefully curated what’s in that, it’s very hard to predict what’s going to come out.” It’s much easier — and more affordable — to curate one data platform than two (or more).
We’re just skimming the surface of this in-depth discussion. Listen to the full episode here.