How Fivetran simplifies open table format adoption for modern data lakes

Open table formats offer enterprises the capabilities of data warehouses and the flexibility of data lakes.

April 7, 2025

Open table formats are reshaping how enterprises interact with data lakes. These formats—most notably Delta Lake and Apache Iceberg—bring much-needed ACID compliance, schema evolution, and data versioning to traditionally schema-less, append-only data lakes. Despite growing adoption, the industry has yet to settle on a single technical standard, resembling previous format wars. The good news is that with Fivetran, you don’t have to limit yourself to one choice - you get both formats automatically.

This ongoing competition offers both opportunities and challenges for those looking to build scalable, future-proof data architectures in pursuit of advanced analytics, including AI.

[CTA_MODULE]

What are open table formats and why do they matter?

Open table formats are a layer of abstraction that wraps around data files like Parquet, organizing them into a database-like structure. Their capabilities transform traditional data lakes into true data lakehouses, blending the best of data warehouses and data lakes through:

ACID transactions: Guaranteeing data integrity and validity in the midst of concurrent workloads and potential errors, and the ability to govern data
Support for upserts and deletes: Enabling incremental updates
Schema evolution: Allowing schema modifications without breaking existing queries and reducing the need to rewrite data files – historically a costly and time-consuming operation for structured data on the data lake
Integration with data catalogs: Offering an easy way for teams to take inventory of their data assets
Time travel and versioning: Enabling rollback or historical data analysis
Partitioning and performance optimizations: Enhancing query speed and efficiency for relational data stored on a data lake

Traditionally, data lakes have provided inexpensive storage for vast amounts of structured and unstructured data. However, their lack of transaction support and governance mechanisms has led to significant challenges in maintaining data integrity, especially as regulatory compliance becomes increasingly stringent and organizations increasingly pursue advanced analytics and AI.

Delta Lake vs. Apache Iceberg

Today, two major contenders in the open table format space stand out: Delta Lake, developed by Databricks, and Apache Iceberg, an open-source project originally developed at Netflix. Both provide robust transactional support and performance enhancements, but they differ in key areas.

Delta Lake evolved closely alongside and is highly optimized for Databricks and Spark, gaining traction among enterprise customers due to its top-notch performance and support for real-time use cases.

Under the hood, Delta Lake uses a transaction log (_delta_log) to record actions like inserts, deletes, or schema changes. This log is append-only and provides a linear history of changes, making it simple to implement time travel and rollback functionality. Owing to the linear nature of the filesystem, querying Delta tables does not require a technical catalog (metastore). Data governance, however, still requires a catalog. Unity Catalog, in particular, is optimized to work with the broader Databricks ecosystem, although Delta Lake can also be combined with other data catalogs.

Apache Iceberg is a table format initially created by Netflix but has since seen a wide breadth of contributions from Apple, Netflix, Amazon, Adobe, and others. It enjoys broad query service ecosystem support through dedicated query platforms like Trino, Presto, and Flink, as well as warehouse platforms like Snowflake, BigQuery, and Redshift Spectrum. This makes Iceberg a popular choice for companies looking for a broadly accessible format.

Under the hood, Iceberg organizes metadata in a tree-like structure, with snapshots and manifests, which helps it scale more efficiently as the table grows. This makes it well-suited for petabyte-scale datasets and high-concurrency environments. Schema evolution can happen in place through metadata file updates, rather than needing to rewrite the data files themselves. However, due to the non-linear nature of Iceberg table management, the format requires a technical catalog for querying – to store and retrieve a table’s latest metadata, then return the correct state of the table. As with Delta Lake, Apache Iceberg may also be combined with a wide range of technical catalogs for governance.

Despite their respective strengths, neither format has emerged as an undisputed leader. As of April 2025, the Delta Lake GitHub shows 7.9k stars and 364 contributors, while that for Apache Iceberg shows 7.1k stars and 572 contributors; effectively neck and neck.

The risks posed by the format wars

For teams that intend to build data lakes with open table formats, the lack of a clear market leader presents some risks:

Interoperability concerns: Choosing one format over another may limit compatibility with certain data processing engines, catalogs, cloud providers, or analytics tools, leading to potential vendor lock-in.
Future-proofing data architectures: Investing heavily in a single format may pose migration challenges down the road, especially in the (somewhat unlikely) event one format eventually runs the other out of business.
Operational complexity: Different business units within the same enterprise may manage multiple architectures with multiple open table formats, increasing maintenance overhead and the need for specialized expertise.

Fivetran’s approach: Interoperability without complexity

Fivetran’s approach to data lake management mitigates these risks by supporting the ability to write to both formats across all data lakes, substantially increasing the number of query engines that can be used on the same data without duplication or time-consuming format conversion. Fivetran delivers data once but provides metadata in both formats at the same time.

By breaking down format silos, a team can choose an open table format based on use case rather than the default offering of a storage provider. With standardized, query-ready data, teams can avoid vendor lock-in and ensure interoperability.

The Fivetran Managed Data Lake Service features automated data integration to both Delta Lake and Apache Iceberg. This approach provides three key benefits:

Interoperability: A universal storage layer supports multiple query engines (e.g., Spark, Flink, Trino, Presto) and data warehouses.
Optionality: You can choose the appropriate open table format based on evolving needs, without being locked into a single vendor.
Simplicity: Through automated data integration, Fivetran handles the complexities of converting data from disparate sources into the open table format of your choice, reducing the engineering overhead of data teams.

With Fivetran’s approach, organizations gain the ability to support their data lakehouse with an automated, fully managed service without gambling on the outcome of the format wars.

The road ahead for open table formats

Open table formats will play a critical role in shaping the future of data lakes and data catalogs. While the industry has yet to declare a definitive winner between Delta Lake format and Apache Iceberg, businesses don’t have to wait for a resolution.

By leveraging the Fivetran Managed Data Lake Service, companies can enjoy the best of both worlds—ensuring ACID transactions, scalability, and governance without adding unnecessary complexity. Organizations that prioritize flexibility and interoperability will be best positioned for long-term success in the era of open table formats.

[CTA_MODULE]

Data insights

How Fivetran simplifies open table format adoption for modern data lakes

April 7, 2025

Aaron Dear

Senior Sales Engineer

Fivetran

Anchor Link

Aaron Dear

Senior Sales Engineer

Fivetran

THEMEN

data lakes

Open table formats offer enterprises the capabilities of data warehouses and the flexibility of data lakes.

This ongoing competition offers both opportunities and challenges for those looking to build scalable, future-proof data architectures in pursuit of advanced analytics, including AI.

[CTA_MODULE]

What are open table formats and why do they matter?

ACID transactions: Guaranteeing data integrity and validity in the midst of concurrent workloads and potential errors, and the ability to govern data
Support for upserts and deletes: Enabling incremental updates
Schema evolution: Allowing schema modifications without breaking existing queries and reducing the need to rewrite data files – historically a costly and time-consuming operation for structured data on the data lake
Integration with data catalogs: Offering an easy way for teams to take inventory of their data assets
Time travel and versioning: Enabling rollback or historical data analysis
Partitioning and performance optimizations: Enhancing query speed and efficiency for relational data stored on a data lake

Delta Lake vs. Apache Iceberg

The risks posed by the format wars

For teams that intend to build data lakes with open table formats, the lack of a clear market leader presents some risks:

Interoperability concerns: Choosing one format over another may limit compatibility with certain data processing engines, catalogs, cloud providers, or analytics tools, leading to potential vendor lock-in.
Future-proofing data architectures: Investing heavily in a single format may pose migration challenges down the road, especially in the (somewhat unlikely) event one format eventually runs the other out of business.
Operational complexity: Different business units within the same enterprise may manage multiple architectures with multiple open table formats, increasing maintenance overhead and the need for specialized expertise.

Fivetran’s approach: Interoperability without complexity

The Fivetran Managed Data Lake Service features automated data integration to both Delta Lake and Apache Iceberg. This approach provides three key benefits:

Interoperability: A universal storage layer supports multiple query engines (e.g., Spark, Flink, Trino, Presto) and data warehouses.
Optionality: You can choose the appropriate open table format based on evolving needs, without being locked into a single vendor.
Simplicity: Through automated data integration, Fivetran handles the complexities of converting data from disparate sources into the open table format of your choice, reducing the engineering overhead of data teams.

With Fivetran’s approach, organizations gain the ability to support their data lakehouse with an automated, fully managed service without gambling on the outcome of the format wars.

The road ahead for open table formats

[CTA_MODULE]

Data lakes vs. data warehouses: A cost comparison by GigaOm

Read the full report

Data lakes vs. data warehouses: A cost comparison by GigaOm

Read the full report

Topics

data lakes

Heading

Kostenlos starten

Schließen auch Sie sich den Tausenden von Unternehmen an, die ihre Daten mithilfe von Fivetran zentralisieren und transformieren.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Demo buchen

How Fivetran simplifies open table format adoption for modern data lakes

What are open table formats and why do they matter?

Delta Lake vs. Apache Iceberg

The risks posed by the format wars

Fivetran’s approach: Interoperability without complexity

The road ahead for open table formats

How Fivetran simplifies open table format adoption for modern data lakes

How Fivetran simplifies open table format adoption for modern data lakes

What are open table formats and why do they matter?

Delta Lake vs. Apache Iceberg

The risks posed by the format wars

Fivetran’s approach: Interoperability without complexity

The road ahead for open table formats

Verwandte Beiträge

Heading

Kostenlos starten