Database replication is the process of creating copies of a database and storing them across various destinations. It improves data availability and accessibility. Every user connected to the system can access copies of the same up-to-date data.

Database replication is an ongoing process. If a user accesses and changes data in the source database, those changes are synced to the replicated databases. This ensures users are always working with the latest and most accurate data.

There are many purposes for database replication. On the operational side, exactly and performantly copying data from one environment to another facilitates improved disaster recovery, lower data latency and reduced server load. On the analytical side, database replication enables data to be centralized and analyzed on a platform purpose-built for analytics.

Transactional databases are designed to sustain an organization’s operations, not to support columnar aggregations and analytical queries. Moreover, analytical queries on live production databases compete with transactional queries for resources, jeopardizing the critical operations the database supports. Analytical databases like data warehouses and governed data lakes are designed specifically to accommodate calculations across large numbers of records.

Customers around the world, including Autodesk, Pitney Bowes, Care.com, JetBlue, Salesloft, Nando’s and many more count on Fivetran to replicate data from production databases. Internally, we also use our product to deliver production data – consisting largely of account and usage information – into a cloud data warehouse for product analytics.

Our production setup and sync metrics look like so:

Database information:

Database type: Cloud-based Postgres
CPUs: 64
Memory: 256 GB

Sync information:

Transactions per second: ~26k transactions per second
Changelog volume: ~25 GB per sync
Primary / Replica: Primary database
Frequency: Sync every 15 minutes
Sync duration: ~10.5 minutes
Type of replication: Logical Replication - WAL

We accomplish this using a standard, off-the-shelf Fivetran cloud SaaS connector. Our usage for this database was 2.2B MAR last month. Fivetran's internal account has a total of 6.5B MAR, so the incremental monthly cost of this connector would be $6,000 per month on Fivetran's tiered pricing curve on the Enterprise plan.

The Fivetran approach to database replication

The key to effective database replication is to incrementally identify changes at the source and reproduce the corresponding updates in the destination. This practice is known as change data capture (CDC) and has several benefits:

Real-time operations – By minimizing the size of each batch of data to be loaded, data movement is conducted in real time. This has several positive knock-on effects.
Reduced impact on system resources – The transfer of smaller increments of data reduces the impact on system resources that could otherwise be heavily impacted by large bulk loads.
Faster database migrations with minimal downtime – Database migrations can be conducted on an ongoing, continuous basis.
Synchronization between multiple systems – Relatedly, CDC enables multiple systems to stay synchronized regardless of where they are located, something especially valuable for time-sensitive applications.

There are several methods of CDC: log-based, trigger-based, timestamp-based and difference-based. Log-based CDC is ideal whenever possible, as it can identify and replicate all changes on the fly and is suitable for busy, mission-critical applications. The Fivetran Postgres connector uses a form of log-based CDC by reading the write-ahead log of the Postgres database.

Experientially, setting up the Fivetran database connector consists of no more than navigating a series of menus, supplying credentials for the source and destination and setting up a schedule. It is a set-and-forget system, and should ideally remain out of sight and out of mind as you are using it. Fivetran connectors come with a 99.9% uptime guarantee and regularly exceed this rate.

[CTA_MODULE]

The Fivetran approach to database replication

Real-time operations – By minimizing the size of each batch of data to be loaded, data movement is conducted in real time. This has several positive knock-on effects.

Reduced impact on system resources – The transfer of smaller increments of data reduces the impact on system resources that could otherwise be heavily impacted by large bulk loads.

Faster database migrations with minimal downtime – Database migrations can be conducted on an ongoing, continuous basis.

Synchronization between multiple systems – Relatedly, CDC enables multiple systems to stay synchronized regardless of where they are located, something especially valuable for time-sensitive applications.

[CTA_MODULE]