We’re thrilled to expand our support for data lakes with our newest release: Amazon S3 fueled by Fivetran with Apache Iceberg. Amazon S3 offers industry-leading scalability, data availability, security and performance making it the perfect place to store data.
With expansive storage capacity and support for multiple data formats, the data lake is a highly popular destination for teams analyzing massive data sets or running extensive data science projects that fuel their business. According to 451 Research, “Nearly three quarters of enterprises are currently using or piloting a data lake environment, or plan to do so within the next 12 months.” Of the many enterprise teams that have already put them to work, a majority cite enhanced business agility, improvement in developing products and services, and enhancing customer service and engagement as benefits of data lakes.
Challenges of data lakes
Data ingestion into a data lake, however, has been one of the most challenging tasks for data teams. The process requires both custom ETL code and ongoing maintenance.
And some of the qualities that make data lakes great — like massive storage — can also present challenges related to compliance and usability, especially for organizations looking to maximize value. 451 Research continues, “Data security is the most cited challenge by enterprises that are already in deployment or proof-of-concept with data lakes (37%), followed by data privacy concerns (33%), and configuring and managing data pipelines (31%).”
Concerns about governance, security and automation inspired the efforts of Fivetran.
Engineering a modern data lake
We designed a solution that provides a compliant and secure data lake with support for atomic, consistent, isolated and durable (ACID) transactions and granular access control. Fully managed Fivetran pipelines anonymize personally identifiable information (PII) while cleansing, normalizing and automatically loading data into the lake.
We now automatically extract, cleanse, deduplicate and make ready for analysis large volumes of semi-structured data to power data lakes in the same reliable and secure way our customers get their data into their cloud warehouses today. Without structure, governance and accuracy of data in the lake, organizations are not realizing the full value of the data they store there.
”We are delighted that the accessibility of Amazon S3 with Iceberg continues to grow,” said Greg Khairallah, Director of Analytics at Amazon Web Services. “It’s an easy way for our customers to simplify data ingestion while providing customers the scalability of a data lake and the reliable data transformation of a data warehouse.”
The Fivetran mission is to make access to data as simple and reliable as electricity, and S3, fueled by Fivetran with Iceberg, brings that promise to the world of data lakes.
Ultimately, the new data lake from Fivetran, Amazon S3 and Iceberg removes much of the manual work required to build and maintain pipelines into your S3 destination, as well as the time-consuming efforts to cleanse and deduplicate the data once it lands. Efforts for compaction, introducing a thin layer of metadata for cataloging and more, are taken off your plate. You unlock a data lake by automation and governance, reducing time to insight.
The modern data lake in action
As data lakes gain in popularity, progressive analytics teams are demanding more from them, and solutions like Amazon S3 and automated pipeline support are equipped to meet that demand.
“The data lake is an easy, affordable, secure and robust way to store all our customers' data,” said Lakshmi Ramesh, Vice President, Data Services at Tinuiti. “The main challenge is in optimizing performance and accessibility but with Fivetran’s support for Amazon S3 with Iceberg it will further optimize our Fivetran pipeline. Since the data lake is our single source of truth, it is critical that all the data ingested from different sources be accessible in the data lake.”
Instead of focusing on all the manual steps required to ingest data, cleanse it, prepare it for usage, hash and block sensitive data, and then start querying it, modern organizations like Tinuiti see great value in reducing data lake management efforts through automation and governance.