Fivetran Managed Data Lake Service and Google’s Cloud Storage power a modern data lake

Build an interoperable data architecture for analytics and AI using Fivetran Managed Data Lake Service and Google’s Cloud Storage.

Charles Wang

April 25, 2025

Until recently, organizations were often forced to maintain separate data architectures for different use cases. Use cases requiring batched data integration, such as reporting and decision support, involve small volumes of structured data and are easily accommodated through data warehouses. Other use cases, such as fraud detection, real-time recommendations, and business process automation, rely on real-time, high-volume feeds including unstructured data, requiring the flexibility of data lakes. Because data lakes are so flexible and can easily absorb large volumes of data, they can be difficult to manage and often become unnavigable, ungoverned data swamps.

Open table formats — a layer of abstraction that wraps around data files, organizing them into a database-like structure — and automation have radically changed this reality. Open table formats bring governance, security, and a query-ready relational structure to a data lake, in effect granting them the capabilities of data warehouses. Automation, especially in data integration, allows data teams to sidestep the considerable engineering overhead of building and maintaining bespoke data infrastructure. By assembling architectures from interoperable off-the-shelf tools and technologies, data teams can use engineering time for higher-value projects.

With these enabling technologies, data lakes become a universal storage layer that supports any data use case. This interoperable storage layer will only become more essential as the volume, velocity, and variety of data continue to grow, and advanced analytics, such as AI, become increasingly critical to competitiveness.

[CTA_MODULE]

Fivetran Managed Data Lake Service makes using Google’s Cloud Storage easy

Although data lakes with open table formats offer considerable capabilities, they also require management. On the data integration side, building and maintaining data pipelines is deceptively complex, requiring a keen understanding of the underlying data, the ability to incrementally capture new records as well as schema changes, and ensuring performance and reliability.

On the data management side, ongoing management and maintenance requirements such as cleaning, deduplicating, compacting, partitioning, and clustering are critical, especially as data continues to be updated.

Fivetran Managed Data Lake Service solves these challenges by automatically extracting data, converting it into either Apache Iceberg or Delta Lake format, and normalizing, compacting, and deduplicating data as it lands in Google’s Cloud Storage. We continuously monitor and maintain your Cloud Storage instance, updating, merging, and deleting as needed to ensure your data is optimized, up-to-date, and query-ready. The Fivetran data ingestion process is so efficient and optimized that we absorb the cost of ingest compute, further saving users' data management costs.

Cloud Storage offers simplicity, performance, and intelligence through a unified data platform. Like other data lakes, Cloud Storage is compatible with a wide range of query engines and technical data catalogs. Google provides both BigQuery as a query engine and BigQuery Metastore as a technical data catalog - the tight integration between BigQuery Metastore and Google’s Cloud Storage ensures users have the same experience whether the data is stored in object storage or native BigQuery tables.

In addition, BigQuery features several capabilities that demonstrate Google’s commitment to open table formats, including streaming writes to open table formats, continuous queries for real-time data access, and querying unstructured data directly from Google’s Cloud Storage using object tables. By writing to open table formats and automating their management, Fivetran Managed Data Lake Service directly complements BigQuery’s support for advanced analytics workloads.

Most importantly, the engineering effort saved by the Fivetran Managed Data Lake Service empowers your data professionals to pursue higher-value analytics projects. With a solid data foundation, your Cloud Storage can become the centerpiece of a unified data architecture and, in combination with the Google ecosystem, solve any data use case.

Example: Using Google’s Cloud Storage for JIT inventory management

Data use cases span a very wide spectrum. In order of growing complexity, they include:

Reporting and business intelligence
Predictive analytics and machine learning
Generative AI
Agentic AI

For each of these projects, once data has arrived in Cloud Storage, Google Cloud Platform offers complementary tools and technologies or supports third-party offerings.

Consider inventory management, a complex undertaking that challenges multiple industries from retail to manufacturing. Let’s walk through how companies can leverage Cloud Storage and Fivetran Managed Data Lake Service to build a sophisticated architecture and power just-in-time inventory management and stock replenishment, starting from basic reporting and progressing toward system automation.

Step 1: Build reporting and business intelligence foundations

Reporting and business intelligence are indispensable for understanding and visualizing operations across an organization’s supply chain. The goal is to understand product movement, supplier relations, and customer purchasing trends.

A data stack for reporting and business intelligence comes together like this:

Cloud Storage serves as a centralized, cost-effective storage layer for incoming structured and semi-structured data, including sales data from CRMs, supplier data from spreadsheets, ERP data, and more.
Fivetran’s Modern Data Lake Service automatically ingests data from hundreds of sources, like Salesforce and SAP, into Cloud Storage in open table formats like Apache Iceberg or Delta Lake as well as automatically updating metadata into BigQuery Metastore.
BigQuery performs analytics directly on Cloud Storage data or stages it for further transformation.
Looker visualizes key metrics like low stock alerts, historical sales trends, and supplier lead times.

This architecture, often referred to as the modern data stack, helps teams build reliable inventory reporting pipelines without manually wrangling CSVs or relying on stale data extracts.

Step 2: Enable predictive analytics and machine learning

With solid data visibility, companies can initiate predictive analytics, enabling operations teams to proactively make decisions by recognizing inventory patterns and preventing critical shortages. This deeper understanding of an organization’s inventory management requires machine learning models that can forecast demand based on historical patterns and external signals like seasonality or sales promotions.

A data stack for predictive inventory analytics on Google Cloud could include:

Cloud Storage acts as a storage layer, including historical sales, lead times, and third-party data for models and training/testing sets. Fivetran Managed Data Lake Service supports history mode, enabling you to track changes to the values of records over time.
Vertex AI and its Workbench development environment offer a Python-based notebook enabling prototyping and training models on data loaded directly from Cloud Storage.

Now, operations teams can anticipate future inventory needs and pre-order products based on expected demand, reducing waste and avoiding inventory shortages.

Step 3: Layer on generative AI for faster decision making

After gaining a firm grasp of reporting and predictive analytics, we can pursue insights into complex, sometimes qualitative questions such as “what are the top three products with inventory risks?” and “what is a summary of delivery delays over the past quarter?”

A common architecture for generative AI is retrieval-augmented generation (RAG), in which a foundation model is augmented with proprietary data, usually stored in a vector database. Generative AI offers an unparalleled ability to retrieve and synthesize information. It accelerates creative and intellectual work of all kinds and can also support use cases requiring personalization and automated communication.

A Google Cloud stack for generative AI may involve:

Cloud Storage serves as a storage layer for data sets of every kind and medium, including emails, PDFs, and product reviews, including emails, PDFs, and product reviews.
Vertex AI powers a RAG architecture system using its Vertex AI Vector Search and Google AI Studio to run pre-trained foundation models.

Step 4: Embrace agentic AI and automated execution

Where generative AI is like a power tool for the human mind, agentic AI is like an additional colleague that doesn’t just inform decisions, but makes them. It requires the utmost mastery over data.

Agentic AI combines the information retrieval and synthesis capabilities of generative AI with the ability to strategize, plan, and execute complex workflows, resulting in an autonomous system with decision-making capability. It shares an architecture and stack with generative AI but also must be able to access systems and orchestrate workflows through APIs.

The architecture closely mirrors the generative AI stack with the addition of orchestration:

API calls connect to internal systems to execute workflows like placing orders, updating information in the CRM, or alerting suppliers.

Imagine an AI assistant that monitors inventory levels, predicts demand, and automatically places purchase orders, all while updating internal systems and notifying stakeholders along the way. At this stage, AI not only synthesizes information but takes action on it to streamline a true JIT inventory management strategy.

A managed data lake is your foundation for every data use case

None of these powerful capabilities – BI dashboards, predictive models, chat-based interfaces, or autonomous agents – can exist without high-quality data. Fivetran Managed Data Lake Service makes it easy to land reliable, structured data into Google’s Cloud Storage, with all the governance and performance benefits of open table formats and metadata management.

Instead of spending your engineering cycles on pipeline management or table maintenance, your team can focus on the data work that actually drives value. Whether you're running reports or building the future of AI, it all starts with a trusted foundation– and Fivetran and Google’s Cloud Storage give you exactly that.

Try it for free!

Now, Fivetran users can try our Managed Data Lake Service for Google’s Cloud Storage with free usage from today, April 9th, through May 31st, 2025. Connectors set up for a new Google’s Cloud Storage data lake destination will be eligible for this promotion*.

To take advantage of this promotion, you need to:

Have a Fivetran account in good standing, and
Create a new connector with Google’s Cloud Storage as the destination during the Promotion Period (between April 9th, 2025 at 00:01am UTC and May 31st, 2025 at 11:59pm UTC).

To get started, head straight to your Fivetran dashboard, sign up for a 14 day free trial of Fivetran or reach out to sales@fivetran.com with any questions.

[CTA_MODULE]

‍

Data insights

Fivetran Managed Data Lake Service and Google’s Cloud Storage power a modern data lake

April 25, 2025

Charles Wang

Lead Product Evangelist

Fivetran

Anchor Link

Charles Wang

Lead Product Evangelist

Fivetran

Topics

Google Cloud

Data Lakes

Build an interoperable data architecture for analytics and AI using Fivetran Managed Data Lake Service and Google’s Cloud Storage.

[CTA_MODULE]

Fivetran Managed Data Lake Service makes using Google’s Cloud Storage easy

Example: Using Google’s Cloud Storage for JIT inventory management

Data use cases span a very wide spectrum. In order of growing complexity, they include:

Reporting and business intelligence
Predictive analytics and machine learning
Generative AI
Agentic AI

For each of these projects, once data has arrived in Cloud Storage, Google Cloud Platform offers complementary tools and technologies or supports third-party offerings.

Step 1: Build reporting and business intelligence foundations

A data stack for reporting and business intelligence comes together like this:

Cloud Storage serves as a centralized, cost-effective storage layer for incoming structured and semi-structured data, including sales data from CRMs, supplier data from spreadsheets, ERP data, and more.
Fivetran’s Modern Data Lake Service automatically ingests data from hundreds of sources, like Salesforce and SAP, into Cloud Storage in open table formats like Apache Iceberg or Delta Lake as well as automatically updating metadata into BigQuery Metastore.
BigQuery performs analytics directly on Cloud Storage data or stages it for further transformation.
Looker visualizes key metrics like low stock alerts, historical sales trends, and supplier lead times.

This architecture, often referred to as the modern data stack, helps teams build reliable inventory reporting pipelines without manually wrangling CSVs or relying on stale data extracts.

Step 2: Enable predictive analytics and machine learning

A data stack for predictive inventory analytics on Google Cloud could include:

Cloud Storage acts as a storage layer, including historical sales, lead times, and third-party data for models and training/testing sets. Fivetran Managed Data Lake Service supports history mode, enabling you to track changes to the values of records over time.
Vertex AI and its Workbench development environment offer a Python-based notebook enabling prototyping and training models on data loaded directly from Cloud Storage.

Now, operations teams can anticipate future inventory needs and pre-order products based on expected demand, reducing waste and avoiding inventory shortages.

Step 3: Layer on generative AI for faster decision making

A Google Cloud stack for generative AI may involve:

Cloud Storage serves as a storage layer for data sets of every kind and medium, including emails, PDFs, and product reviews, including emails, PDFs, and product reviews.
Vertex AI powers a RAG architecture system using its Vertex AI Vector Search and Google AI Studio to run pre-trained foundation models.

Step 4: Embrace agentic AI and automated execution

Where generative AI is like a power tool for the human mind, agentic AI is like an additional colleague that doesn’t just inform decisions, but makes them. It requires the utmost mastery over data.

The architecture closely mirrors the generative AI stack with the addition of orchestration:

API calls connect to internal systems to execute workflows like placing orders, updating information in the CRM, or alerting suppliers.

A managed data lake is your foundation for every data use case

Try it for free!

To take advantage of this promotion, you need to:

Have a Fivetran account in good standing, and
Create a new connector with Google’s Cloud Storage as the destination during the Promotion Period (between April 9th, 2025 at 00:01am UTC and May 31st, 2025 at 11:59pm UTC).

To get started, head straight to your Fivetran dashboard, sign up for a 14 day free trial of Fivetran or reach out to sales@fivetran.com with any questions.

[CTA_MODULE]

‍

Data lakes vs. data warehouses: A cost comparison by GigaOm

Read the full report

Try Fivetran Managed Data Lake for yourself.

‍*Google Cloud Storage Data Lake promotional terms and conditions

Topics

Google Cloud

Data Lakes

Heading

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Get demo

Fivetran Managed Data Lake Service and Google’s Cloud Storage power a modern data lake

Fivetran Managed Data Lake Service makes using Google’s Cloud Storage easy

Example: Using Google’s Cloud Storage for JIT inventory management

Step 1: Build reporting and business intelligence foundations

Step 2: Enable predictive analytics and machine learning

Step 3: Layer on generative AI for faster decision making

Step 4: Embrace agentic AI and automated execution

A managed data lake is your foundation for every data use case

Try it for free!

Fivetran Managed Data Lake Service and Google’s Cloud Storage power a modern data lake

Fivetran Managed Data Lake Service and Google’s Cloud Storage power a modern data lake

Fivetran Managed Data Lake Service makes using Google’s Cloud Storage easy

Example: Using Google’s Cloud Storage for JIT inventory management

Step 1: Build reporting and business intelligence foundations

Step 2: Enable predictive analytics and machine learning

Step 3: Layer on generative AI for faster decision making

Step 4: Embrace agentic AI and automated execution

A managed data lake is your foundation for every data use case

Try it for free!

Related blog posts

Heading

Start for free