Leading global enterprises, including 98 of the world’s 100 largest companies, rely on SAP ERP systems to power their operations. SAP systems serve as the foundation for over 430,000 business customers worldwide, managing core processes like finance, inventory management, customer relationship management (CRM), and human capital management (HCM). As the backbone of enterprise operations, data generated using SAP is crucial for analytics of all kinds, from reporting and business intelligence to artificial intelligence.
While SAP is designed with customization in mind, its business practices increasingly limit how customers access and use their own data, particularly in terms of data for analytics. SAP’s apparent strategy is based on keeping data inside SAP systems, severely limiting the ability of SAP customers to extract data from its applications and utilize third-party data repositories.
In this blog series, we will address the following topics in detail:
- SAP’s attempts to limit data movement and how Datasphere, its proprietary data platform, compares with third-party options
- How to efficiently and securely move data from various SAP ERP systems, while adhering to SAP guidelines
- Enterprises that have successfully moved data out of SAP to unlock efficiencies, customer 360, and more
The following is a synopsis of each topic.
SAP Datasphere vs. modern data platforms
Data from ERPs is intrinsically difficult to integrate due to the complexity and customizability of the underlying systems. However, over the last few years, SAP has intensified its efforts to encourage customers to adopt SAP-certified products and move away from generic ELT/ETL solutions. This is part of a concerted approach to promote their own data management and analytics offerings and foreclose use of other alternatives. In a 2023 blog, SAP stated outright that it would no longer certify non-SAP data integration vendors, limiting the tools customers can use to access their data. In 2024, SAP prohibited the use of ODP (operational data provisioning) via RFC (remote function calls), a popular data extraction method.
SAP Datasphere, launched in 2023, is a proprietary data warehouse with native connectors for SAP application data. SAP’s apparent intention is to vertically integrate the entire data stack, discouraging the adoption of third-party data warehouses and data lakes. However, in comparison to third-party offerings, Datasphere suffers several disadvantages:
- High costs and limited scalability: Costs can balloon due to SAP Datasphere’s reliance on an in-memory database. With compute and storage tightly coupled, scaling becomes both expensive and inefficient.
- Closed ecosystem and limited interoperability: Datasphere supports only SAP-specific modeling tools, offers limited compatibility with third-party BI platforms, and provides restricted, costly, support for external data sources.
- Gaps in modern data architecture support: There’s no native support for Delta Lake, Iceberg, or Kafka. Data lake integration is restricted to HANA lake storage, a paid add-on. Schema changes are cumbersome and SQL support is limited. Overall, there is limited functionality compared to modern cloud warehousing tools.
The result is that SAP customers face limited, expensive, and unscalable choices. The bottom line: Modern platforms like Databricks, Snowflake, and BigQuery offer far more capable, flexible, and performant foundations for analytics than SAP’s own tooling. The real challenge isn’t deciding whether to move — it's figuring out how to get your data out of SAP efficiently and reliably.
A technical deep dive into how Fivetran centralizes SAP data
Data, including that from SAP applications, is most powerful when extracted and combined with other business-critical data sources, such as CRMs, unstructured media files, SaaS applications, and other databases. Data ownership ultimately remains with the customer, and despite existing legal and technical restrictions imposed by SAP, it is still possible to extract and load SAP data into third-party systems, while adhering to guidelines.
Fivetran achieves data extraction via an SAP RFC (Remote Function Call) connection, which SAP has provided for third-party integrations for decades. This RFC interfaces with a Fivetran SAP Function Module component within the SAP NetWeaver application, known as Fivetran NetWeaver API. This component:
- Collects and processes changed data using change pointers in the SAP systems and triggers on tables, and therefore limits the additional overhead on the SAP source.
- Validates and compresses data for optimized transfer, making the best use of the available networking capacity.
Since Fivetran’s solution relies solely on SAP’s standard NetWeaver application layer for data extraction, it works with both enterprise and runtime HANA environments, including systems deployed in SAP RISE Private. For ECC customers with an enterprise license, Fivetran can connect directly to the database (Oracle or SQL Server). Fivetran also supports sourcing from SAP OData endpoints, enabling replication from SAP RISE Public aka Grow with SAP.
.png)
Destinations include the full range of data lakes and data warehouses like Snowflake, Databricks, Google Cloud Platform, S3, and more. In addition, Fivetran supports workloads and architectures comprising cloud, hybrid, and on-premises systems.
How leading enterprises use Fivetran to break down SAP data barriers
Leading companies across several industries are using Fivetran to securely replicate SAP and non-SAP data to the cloud with speed, scale, and zero vendor lock-in.
For instance, to unify its data landscape, luxury retailer LVMH turned to Fivetran to automate and standardize data integration across dozens of systems, including SAP. The company automated data movement from SAP and 60+ other sources into BigQuery, eliminating manual data pipelines that were expensive to operate and maintain. This meant instant visibility into sales transactions, product delivery, and financial performance — a breakthrough in a historically complex SAP environment.
Building materials company Cemex turned to Fivetran to replicate data from all 7 of its global SAP ECC and HANA instances. With Fivetran, Cemex shifted from hourly batch jobs to real-time updates every 2 minutes, dramatically accelerating decision making. Teams can now analyze data from any region on demand, powering critical use cases across the business.
Pitney Bowes, a leader in B2B technology, replaced its custom batch scripts and legacy ETL tools with Fivetran’s automated pipelines, purpose-built for SAP. Batch load times dropped by 95%, shrinking data processing from days to under an hour. With change data capture (CDC), the company now syncs SAP data multiple times a day instead of every 2-3 days — all without impacting source systems.
Across multiple industries, organizations are unlocking the ability to use SAP data outside of the SAP ecosystem and combine it with data from other sources, creating real-time insights without compromising security or performance.
A closer look at SAP’s offerings
SAP applications contain some of an enterprise’s most valuable and sensitive information, from financial records to customer and employee data. The value of this data only grows as it is combined with data from other sources, regardless of whether the data originates from the cloud, private networks, or on-premises. Yet for many SAP customers, this data is extremely difficult to extract and load into their data platforms of choice.
In the next article in this series, we’ll unpack the root of the problem: SAP’s tightly controlled ecosystem. From restrictive architectures to hidden technology costs, we’ll examine how these limitations can prevent customers from uncovering business-critical insights — and what they can do about it.
[CTA_MODULE]