When data lakes first emerged in the 2010s, businesses choosing between data warehouses and data lakes had to contend with a specific set of trade-offs.
Data lakes offered the following benefits:
- Scalability
- The flexibility to support all data types
- Low costs
However, these came at the expense of:
- Limited structure and governance
- Complex pipeline and metadata management
- Query performance issues
- The potential to turn into unusable “data swamps”
By contrast, data warehouses offered:
- Structured, reliable analytics
- Governance and ACID compliance
- Easy-to-use, fully managed platforms
- High-performance SQL at scale
At the expense of:
- High storage and compute costs at high volumes
- Proprietary data formats requiring proprietary query engines
- Limited support for unstructured data
- The potential to create siloes and duplication across organizations
These tradeoffs have become increasingly irrelevant thanks to the modern data lake, in which data is structured through open table formats.
[CTA_MODULE]
Benefits of a modern data lake (with a managed service)
A modern data lake combines the benefits of both data lakes and data warehouses — scalability, flexibility, low costs, structured analytics, governance, and ease of use — while sidestepping the limitations of legacy data repositories.
More importantly, the modern data lake is the essential centerpiece of the Open Data Infrastructure. With open standards, interoperability, and the flexibility to scale and support all forms of data, a modern data lake can be combined with all kinds of query engines and other complementary tools to support the full range of analytical and operational use cases.
An Open Data Infrastructure will become increasingly important as organizations rely more heavily on automation and agentic AI. For artificial intelligence to produce human-like behavior, it must have access to essential context and information that may otherwise be tacit or based on human judgment and discretion. That means that organizations must have the flexibility to integrate data in all formats and from all origins, as well as the capacity to effortlessly keep track of and otherwise manage data once in the destination.
The Fivetran Managed Data Lake Service is our solution to this need. We combine a modern data lake with efficient ingestion, allowing users to save up to 30% on data management costs; open table format creation and management, such as compaction, deduplication, and cleanup; and governance through catalog integration.
How to upgrade from a data warehouse to a modern data lake
The following are a few high-level steps to upgrading your data repository from a data warehouse to a modern data lake. For more technical details, consult our technical guide and consider our professional services.
1. Stand up the new destination and keep the old one running for now
To avoid disrupting your existing operations and analytics, your migration must be incremental. Keep your old destination active while starting up your new one.
2. Copy or create (programmatically, if possible) connections for your new destination
Depending on how many applications, databases, feeds, etc., you’ll need to integrate, you may benefit from using the Fivetran REST API to programmatically build or copy connections to your new destination. You will need to verify that the schemas match in both the old and new destinations.
3. Additional migration work
Raw data isn’t the only thing you will need to migrate to your new data repository. You will need to port analytics-ready data models — and the transformations that support them — to your new destination, as well.
You will also need to ensure that every tool that relies on your former destination, such as BI platforms, is compatible with your new destination.
4. Populate the catalog
With the Fivetran’s Polaris-based Iceberg REST catalog or catalog integrations for Unity, AWS Glue, BigLake, and Onelake, you can maintain a persistent, accurate inventory of all tables in your data lake.
5. Cut over when everything is present in the new environment
When everything you need is present in the new environment, you can cut over. Fivetran’s normalization and standardization provide you with a high-quality “bronze layer,” a strong foundation from which to continue refining data products for analytics and operations.
In addition, you will still be able to use your former data warehouse’s query engine, but now have the option of also using any other as needed.
You have now upgraded to an AI-ready, Open Data Infrastructure
The financial and labor savings of an Open Data Infrastructure and a managed, modern data lake, will provide your team with additional resources to dedicate to every downstream, higher-value activity. This could include expanding the scale and scope of your data infrastructure and operations, building data models, and pursuing all other activities that support analytics, machine learning, and AI. You may now choose the best compute engine for each job. This includes the choice of retaining your old data warehouse as a query engine in order to leverage the analytics and AI/ML services your cloud provider bundled with it.
As organizations increasingly reorganize their workflows to take leverage the strengths of AI, the Open Data Infrastructure, characterized by commoditized tools, interoperability, and open standards, will only continue to grow in importance. Data lakes, chiefly characterized by affordable, commodity data storage, will be central to this infrastructure, giving organizations the flexibility to build workflows of all kinds to leverage their unique proprietary data.
Fivetran Managed Data Lake Service helps you easily store data once in an open format, separate storage from compute, and preserve flexibility as AI, analytics, and the engines that support them evolve. You will be able to build a cost-efficient, reliable, and flexible foundation for every future AI workload.
[CTA_MODULE]


