Why every data role needs Open Data Infrastructure

Analysts, data engineers, ML engineers, and data scientists don’t work the same way; they shouldn’t have to.
Today’s data ecosystem includes more roles, more tools, and more specialized workflows than ever before. The days of limiting access to a single warehouse or lake — controlled by a small group of data engineers or analysts — are over.
Each team, whether human or agentic, needs to interact with data differently. The challenge isn’t standardizing how they work but building an underlying data infrastructure that supports all of them. An infrastructure that allows every team to use the tools they prefer, without duplicating data or driving up costs.
[CTA_MODULE]
Different users need different tools
Different data users don’t just prefer different tools — they require them.
- Analysts need fast, structured queries and consistent metrics. They expect low-latency SQL access to data to run analytical queries such as dashboarding and BI tools like Tableau.
- Data engineers need reliable ingestion, schema evolution, and pipeline orchestration. They rely on tools like Fivetran for automated data movement, DuckDB for exploration, and dbt for productionized transformation.
- Data scientists need to query and explore across structured, semi-structured, and unstructured data with ease and a focus on experimentation to create and train powerful business-specific AI models using tools like PySpark and Jupyter Notebooks.
- ML engineers need to ensure their production-grade tooling is accessing complete and up-to-date data to provide fresh business context to the AI processes powering autonomous agent-triggered decisions using tools like MLflow and Databricks
These are not variations of the same workload; they are fundamentally different.
Trying to force them into a single system creates tradeoffs:
- Tools become optimized for one use case at the expense of others.
- Workflows get constrained by the limitations of a single engine.
- Teams build workarounds to get the functionality they actually need.
- Strain on tools and rising costs result as tools are used in ways they were not originally intended.
This is why “one tool for everything” breaks down in practice, as it standardizes on the lowest common denominator.
What happens when architecture doesn’t match reality
When infrastructure doesn’t support how people actually work teams create workarounds to meet their goals:
- Data is copied into multiple different databases and systems to take advantage of the unique capabilities of different engines — resulting in copies inevitably drifting out of sync with no single source of truth. Storage costs spiral, and governance complexity increases.
- Bespoke pipelines and scripts are created to update each copy of the data; time that should be spent analyzing data or training models is lost to maintenance and failure recovery
What starts as a simple stack turns into a fragmented system: this isn’t a failure of the individual tools; it’s a failure of architecture. Even worse, some data stores stop getting updated, so decisions are made on stale data.
AI is amplifying data fragmentation
The shift to AI amplifies these problems, as data is no longer used only by humans. It now powers systems that operate continuously at scale — models, automation, and autonomous agents.
Teams are experimenting with new tools like Claude Code, OpenAI APIs, LangChain, or custom Python workflows to build prototypes and automate decisions. No single engine can handle all of this effectively — so teams copy data again. They move it into vector databases, feature stores, or separate environments just to test ideas.
That slows down experimentation and increases cost right when speed matters most.
Open Data Infrastructure: One storage layer, many engines
Instead of centering everything around a single platform, Open Data Infrastructure (ODI) separates storage from compute and connects systems through open standards.
- Data is stored once in a managed data lake, ensuring one version of the truth and keeping ingestion and storage costs in check.
- Open formats like Apache Iceberg and Delta Lake ensure data is not locked away in proprietary formats, allowing you to add any compatible tool to your stack without needing to re-architect or replatform.
- Multiple engines operate on the same data directly in the lake with copying or moving — CDW for analyst SQL, Notebooks for data scientist exploration, AI agents for autonomous business decisions.
This allows each user to work in the tools that fit their needs without duplicating data, without building multiple ingestion pipelines, and without storing multiple copies of the data in expensive storage that may drift out of sync — solving for rising storage and compute costs and governance concerns.
Fivetran as the data foundation for AI
The Fivetran Managed Data Lake Service was designed with Open Data Infrastructure (ODI) principles — load data once in open table formats, and make it instantly accessible to any LLM, any agent, and any compute engine. This prevents lock-in, duplication, and rework.
Fivetran handles the hardest part first: reliable, automated ingestion. Data from databases, events, SaaS applications, and files is continuously synced and kept fresh — without manual pipelines or ongoing maintenance.
Just as important, that data stays performant without added overhead. Fivetran automatically handles ongoing maintenance — including compaction, snapshot management, and other optimizations — so your lake remains fast, reliable, and ready for any workload.
This is what Open Data Infrastructure enables: less friction between tools, and the freedom to adopt new ones without re-architecting your stack no matter who joins your team.
[CTA_MODULE]
Verwandte Beiträge
Kostenlos starten
Schließen auch Sie sich den Tausenden von Unternehmen an, die ihre Daten mithilfe von Fivetran zentralisieren und transformieren.

