benchmarking

Real-time benchmarking for database replication

Look under the hood at the live performance benchmarking data for Fivetran data pipelines. With fast throughput and low latency, Fivetran can replicate from large database volumes quickly and efficiently.

Get started for free Book a live demo

High-performance database replication

Historical sync: The initial load of a system's historical data into the destination.

Incremental sync: The replication of change data, after the historical load, using log-based or log-free change data capture.

Overview

We built this benchmark to showcase our performance loading from common OLTP relational databases like Oracle, PostgreSQL, SQL Server, and MySQL. We are using TPROC-C to benchmark and highlight our ability to ingest data from heavily loaded relational databases. The below show our benchmarking results for loading Oracle data to Snowflake.

Data throughput — Historical sync

Consistent high volume throughput across historical syncs

Fivetran efficiently handles the historical sync of large data volumes with a throughput greater than 500 GB/hour. For the best performance and to ensure a database can release additional transactional data while an import is running, Fivetran breaks large tables into consumable chunks, reducing the duration of our transaction during a historical sync. The data below shows Fivetran consistently replicates the data at high throughput levels, saving our users time and ensuring their data is readily available for downstream workflows.

To understand the impacts of a historical data sync. Fivetran runs a performance benchmark roughly once per week to see the throughput performance of replicating data from Oracle to Snowflake. The throughput values are calculated with the following formula: Throughput (GB/hr) = Data Volume (GB) / Historical Sync Time (hr)

Throughput (GB/hr) = Data Volume (GB) / Historical Sync Time (hr)

Sync cycles - incremental sync

The freshest data, always available

To better understand our syncs, Fivetran captures the total time for each sync during a period of high load on the Oracle database. With 16,000+ transactions written to the database per second, Fivetran incrementally replicates each change to Snowflake for better performance. When one incremental sync finishes, the next incremental sync begins to ensure the data pipeline doesn’t fall behind and users always have access to the freshest data.

Fivetran supports 1 minute sync frequencies. When a sync takes longer than the set sync frequency, the next sync automatically kicks off, ensuring that the data in the destination is always up-to-date.

Latency and throughput — Incremental sync

High-performance pipelines for high-volume workloads

With even some of the largest volumes of change data that customers replicate (greater than 16,000 transactions per second), Fivetran keeps up with incremental syncs in near real-time. Fivetran uses change data capture replication to incrementally update large volumes of data. The benchmark measures the latency and throughput of incremental syncs to highlight how performant Fivetran is even under intense load – 250+ GB/hr throughput and 15min or less latency. Many enterprises desire a 30 minute to 1 hour replication SLA and Fivetran successfully meets these needs when replicating transactional data into data warehouses or data lakes.

This workload represents the largest real-world databases with HammerDB. We see similar fluctuations in the benchmarking data volumes of a real-world database. Understand our benchmark tests further, here.

Over the course of 2 hours, 16,000+ new records are created per second on the Oracle database for Fivetran to replicate to Snowflake. Given the high volume of change data, Fivetran measures the latency and throughput to ensure all changed data is written to the destination in a timely manner. The above graph shows latency percentiles at P50, P80, and P95 confidence levels for full transparency ensuring we are quickly and efficiently replicating all changes regardless of the data volumes.

Looking at Fivetran by the numbers

500+ GB/hr

Historical sync throughput speeds

9.1+ Petabytes

Amount of data synced per month

22.2M+

Schema changes handled per month

37.7M+

Transformation models run per month

156.5M+

Pipeline syncs per month

10.1T+

Rows synced per month

FAQS

How do I repeat this process using my own systems?

Fivetran offers the ability to self-serve benchmark your own systems. In our Github repository, you can learn more about the steps that we took for our benchmarking processes and repeat them yourself on your own systems.

What's the value of this process?

This process showcases how performant Fivetran data pipelines are for analytical use cases when replicating data downstream into your data warehouse or data lake.

What determined the workload volumes used in this benchmarking process?

Fivetran wanted to better understand the performance of data replication for our largest customers and their largest workloads so we chose data volumes representative of those use cases. These are in the 99th percentile of database workloads that Fivetran customers have and the benchmark highlights how successful Fivetran is at handling these workloads.

What are the effects on these numbers for smaller workloads?

The numbers displayed here are for a 1TB database experiencing over 16000 transactions per second during a benchmark period. For smaller workloads, we would generally expect the throughput to remain high and the latency to decrease.

What efforts is Fivetran making to improve these numbers?

Fivetran is constantly iterating and improving our product to increase throughput rates and lower latency for workloads of all sizes. Join an upcoming product showcase to learn more about new features and performance improvements related to high-volume database replication.

Curious about Fivetran's preformance benchmark process

Dive into Fivetran’s performance benchmarking work with our blog series where we discuss all things throughput and latency.

blog

Benchmarked: A data pipeline latency analysis

blog

Benchmarked: A data pipeline throughput analysis

Trusted by data-driven enterprises

Get started for free Book a live demo