Data insights

7 things data leaders need to know about Open Data Infrastructure

May 20, 2026

Senior Technical Product Marketing Manager

As AI scales and workloads become continuous, traditional architectures fall short — making Open Data Infrastructure a critical priority for data leaders.

Data leaders don’t usually think of infrastructure as a risky decision, but in practice, it’s one of the biggest bets you’ve already made, and it’s one that becomes harder to change over time.

The architecture you chose for dashboards and analytics now shapes everything that comes next: how quickly your teams can experiment, how predictable your costs are, and whether your AI initiatives scale or stall.

And now, more than ever, that pressure is increasing. Data no longer just supports reporting, but now powers operations and feeds AI systems that run continuously and scale automatically. What worked for human-driven queries is now being tested by thousands of automated, agent-driven workloads.

At the same time, many platforms are tightening control. Data is stored in proprietary formats, access is metered, and the cost of moving or reusing your own data keeps rising. What looks like a convenient, all-in-one solution at the start becomes a constraint as your needs evolve.

This is the shift data leaders are facing: not just managing data, but maintaining control over it across tools, across teams, and increasingly, across AI systems. Open Data Infrastructure (ODI) is how you do that.

[CTA_MODULE]

1. If you can’t freely access your data, you don’t control it

Most organizations currently centralize data within the proprietary storage of their chosen cloud data warehouse. However, this locks your data away outside of your control in proprietary formats in their storage buckets. The only way to get at your data is to go in via the cloud data warehouse (CDW) vendor’s tools and interfaces. Even when using third-party tools or your own AI agents, you have to connect via their compute and pay for the privilege.

Open Data Infrastructure changes this by storing data in open file formats (such as Apache Iceberg and Delta Lake) within a managed data lake utilizing commodity cloud storage buckets such as Amazon S3. Users and agents then utilize the best compute engine, including a cloud data warehouse where appropriate, for their specific task.

The CDW doesn’t go away in this scenario; it becomes a best-of-breed query engine available to your teams, providing the compute engine for your existing transformations and stored procedures — but now with the underlying storage being your managed data lake.

Open Data Infrastructure ensures that every data consumer, human or AI, operates on the same underlying data, regardless of the compute engine, with no unneeded duplication, no conflicting definitions, and no rework to support new tools.

2. Architectures built for dashboards break at agent scale

Instead of occasional human queries, systems today must support continuous, automated workloads running at scale, meaning Agentic AI is changing how data infrastructure gets used.

Agents require fresh data, consistent definitions, and reliable access across multiple compute engines 24/7. So what worked for dashboards doesn’t hold up under that pressure because as autonomous agents scale and expand, so will your data access costs from your CDW vendor.

As organizations move from hundreds of human queries to thousands of agent-driven interactions, architecture decisions directly determine whether AI initiatives scale or stall.

3. Vendor lock-in isn’t just a cost problem

When data is stored inside proprietary systems, flexibility is lost. Moving platforms becomes expensive and complex because the only options for populating a new data platform are:

Migrate data from the old platform to the new platform, paying both the extraction costs and the ingestion costs, as well as the time to create the tooling to make this possible.
Recreate the source data layer from the original sources, if this is even possible (API limits, limits on historical retrieval, or lack of access to decommissioned systems can block this).

Open Data Infrastructure architecture separates the storage from compute and stores data in open formats inside storage you control. Should you want to move to a new vendor or add an additional compute engine you just point it at the open data lake. No need to migrate data to the new tool. That freedom to evolve is critical in a landscape where tools, especially in AI, are changing rapidly.

4. Your data cost model doesn’t scale with AI

As more teams and agents rely on data, the cost of data ingestion and storage inevitably grows. With your data locked up inside a data warehouse, you’re forced to pay that vendor’s fees for access to your own data. This is bad enough for human users, but AI compounds the problem. If your AI agents scale 1000x, should you be forced to pay 1000x the cost to access your data?

ODI separates storage and compute so you can choose the right compute engine for the job for better efficiency and cost savings. AI Agents can access the data directly in your lake using the most cost-effective compute for each use case.

5. You don’t have to rebuild your stack

Moving to Open Data Infrastructure is an evolution in infrastructure, rather than a huge change like the move from on-prem to cloud. There’s just a few tweaks to your infrastructure to set up a managed data lake and begin to populate it. Your existing cloud data warehouse(s) become a compute engine running the same queries it always did — you’ve just moved the storage layer to a managed data lake.

New initiatives move more quickly because they already have access to an open and up-to-date data lake, with no need to duplicate data into another data store and spend time creating pipelines to keep it up to date. Simply point new agents and data tools at the data lake.

6. The real value of ODI is reduced risk

Taken together, these benefits lead to something broader: reduced risk both now and in the future. Open Data Infrastructure reduces risk across three key areas:

Flexibility: Teams can adopt new tools without being locked into a single platform and the costs and limitations that vendor imposes.
Cost: Ingestion compute costs are removed, and the most cost-efficient compute engine for the use case can be utilized
Speed: Teams move faster because they don’t need to rebuild infrastructure or re-ingest data for every new initiative.

For leaders, this is the real value. Open Data Infrastructure doesn’t just improve architecture, but rather it protects the organization from the long-term consequences of the wrong one.

7. The foundation determines whether AI scales or stalls

Open Data Infrastructure starts with a reliable, centralized data foundation. DIY approaches to data ingestion and lake management just don’t scale as needed. Fivetran helps organizations build and maintain that foundation automatically.

With 750+ fully managed connectors, Fivetran automatically ingests data from databases, SaaS applications, files, and events, and it keeps it up to date.

The Fivetran Managed Data Lake Service lands that data in open formats (Apache Iceberg and Delta Lake), while handling ongoing maintenance tasks like compaction and snapshot management. It also provides a managed catalog to provide visibility, governance, and control.

By delivering data into an open, centralized data lake, Fivetran enables teams to store data once, use multiple engines, and scale efficiently without sacrificing openness or control.

[CTA_MODULE]

See how Fivetran enables Open Data Infrastructure in practice.

Learn more

See why Fivetran is the open data foundation for AI.

Learn more

Heading

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Obtenir une démo