Replicating sharded databases: A case study of SalesLoft, Citus Data and Fivetran

When SalesLoft faced a daunting challenge — combining sharded tables in AWS Redshift — it turned to Fivetran.

SalesLoft is among the world's leading sales engagement platforms. One of its key mottos is: “We Believe in Sales.” With that credo in mind, SalesLoft says it “created the Modern Sales Engagement Platform.” The data infrastructure that supports this is a Citus Data sharded Postgres cluster, says Mike Sandt, a SalesLoft engineer.

“Citus Data provides SalesLoft with a hosted, multi-tenant, Postgres environment without the overhead of implementing the technology ourselves. We deal with large data volumes at SalesLoft,” Sandt says. “In order to help scale our technology as we drive towards the enterprise, we identified a need to tenant our systems. Citus made this process easier on us by providing the appropriate libraries and operational infrastructure.”

SalesLoft’s challenge was how to combine the sharded tables in a columnar data warehouse, AWS Redshift, in order to perform advanced analytics. The Fivetran off-the-shelf solution was to replicate every database in the cluster individually by using a standard Fivetran Postgres connector. As a result, SalesLoft didn’t have to spend time or resources building or maintaining custom infrastructure to replicate sharded data.

“Fivetran solved the problem of getting the data into our warehouse without having to run the gambit of rolling custom infrastructure for such tasks,” Sandt says. “Fivetran effectively centralized the data for us so that we can provide insight into customer usage.”

The general strategy for replicating sharded databases is simple: Replicate each individual database into its own schema, then use a VIEW to re-combine the tables at query time. This side-steps combining the sharded data in the ETL pipeline, avoiding tricky logic that is difficult to debug and maintain. The data analysts working with the data queries the VIEW as a single logical table even though it’s physically split on disk.

This strategy leverages the fact that modern data warehouses natively handle computations across tables sharded over multiple nodes — it’s how data is stored under the hood. In practice, querying VIEWs that are UNION SELECT’s of the underlying sharded tables is highly performant. The data warehouse query planner is smart enough to avoid materializing the intermediary VIEW.

The final piece to this strategy is automating this whole process.

Fivetran handles the difficult parts by automatically detecting and syncing new schemas and tables in the source databases. A simple Python script was all that was necessary to automate keeping the VIEWs up-to-date as new schemas and tables are created in the Citus Data cluster. This can be scheduled by cron or run on AWS Lambda. Here’s a link to the VIEW creation script

“Fivetran makes it stupid simple to get data from disparate source systems into a centralized warehouse,” Sandt Says. “Fivetran has a very complete vision of the space they operate in, and the required competency with underlying database systems to get us up and running quickly."

The Fivetran Postgres connector is just one of the dozens of connectors Fivetran offers. These connectors are the data from your business applications, and databases, that Fivetran syncs into a data warehouse for you.

About Fivetran: Our mission is to democratize data, to make companies data driven, and to give analysts easy access to disparate data sources to perform advanced analytics.

With as little as a 5-minute setup, Fivetran replicates all your applications, databases, events and file storage into a high-performance data warehouse. Our cloud data pipelines are zero-configuration, zero-maintenance and fully managed by Fivetran.

Using Fivetran, businesses big and small gain complete control and ownership of their data. It’s easy to join data sources, perform agile analytics, and ultimately discover valuable insights using SQL or the business intelligence (BI) tools of choice.

The Fivetran sales team is available to present a demo, and provide a free trial. Sign up here.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Replicating sharded databases: A case study of SalesLoft, Citus Data and Fivetran

Company size
500-1999
Region
North America
Industry
B2B technology
Key results
  • Saved time and resources not having to build or maintain custom infrastructure
  • Moved data from disparate source systems into a centralized warehouse

SalesLoft is among the world's leading sales engagement platforms. One of its key mottos is: “We Believe in Sales.” With that credo in mind, SalesLoft says it “created the Modern Sales Engagement Platform.” The data infrastructure that supports this is a Citus Data sharded Postgres cluster, says Mike Sandt, a SalesLoft engineer.

“Citus Data provides SalesLoft with a hosted, multi-tenant, Postgres environment without the overhead of implementing the technology ourselves. We deal with large data volumes at SalesLoft,” Sandt says. “In order to help scale our technology as we drive towards the enterprise, we identified a need to tenant our systems. Citus made this process easier on us by providing the appropriate libraries and operational infrastructure.”

SalesLoft’s challenge was how to combine the sharded tables in a columnar data warehouse, AWS Redshift, in order to perform advanced analytics. The Fivetran off-the-shelf solution was to replicate every database in the cluster individually by using a standard Fivetran Postgres connector. As a result, SalesLoft didn’t have to spend time or resources building or maintaining custom infrastructure to replicate sharded data.

“Fivetran solved the problem of getting the data into our warehouse without having to run the gambit of rolling custom infrastructure for such tasks,” Sandt says. “Fivetran effectively centralized the data for us so that we can provide insight into customer usage.”

The general strategy for replicating sharded databases is simple: Replicate each individual database into its own schema, then use a VIEW to re-combine the tables at query time. This side-steps combining the sharded data in the ETL pipeline, avoiding tricky logic that is difficult to debug and maintain. The data analysts working with the data queries the VIEW as a single logical table even though it’s physically split on disk.

This strategy leverages the fact that modern data warehouses natively handle computations across tables sharded over multiple nodes — it’s how data is stored under the hood. In practice, querying VIEWs that are UNION SELECT’s of the underlying sharded tables is highly performant. The data warehouse query planner is smart enough to avoid materializing the intermediary VIEW.

The final piece to this strategy is automating this whole process.

Fivetran handles the difficult parts by automatically detecting and syncing new schemas and tables in the source databases. A simple Python script was all that was necessary to automate keeping the VIEWs up-to-date as new schemas and tables are created in the Citus Data cluster. This can be scheduled by cron or run on AWS Lambda. Here’s a link to the VIEW creation script

“Fivetran makes it stupid simple to get data from disparate source systems into a centralized warehouse,” Sandt Says. “Fivetran has a very complete vision of the space they operate in, and the required competency with underlying database systems to get us up and running quickly."

The Fivetran Postgres connector is just one of the dozens of connectors Fivetran offers. These connectors are the data from your business applications, and databases, that Fivetran syncs into a data warehouse for you.

About Fivetran: Our mission is to democratize data, to make companies data driven, and to give analysts easy access to disparate data sources to perform advanced analytics.

With as little as a 5-minute setup, Fivetran replicates all your applications, databases, events and file storage into a high-performance data warehouse. Our cloud data pipelines are zero-configuration, zero-maintenance and fully managed by Fivetran.

Using Fivetran, businesses big and small gain complete control and ownership of their data. It’s easy to join data sources, perform agile analytics, and ultimately discover valuable insights using SQL or the business intelligence (BI) tools of choice.

The Fivetran sales team is available to present a demo, and provide a free trial. Sign up here.

[CTA_MODULE]

The total economic impact of Fivetran

Learn how automated data movement boosts productivity and accelerates insights for your business.

Download the report
Are you a true data pioneer?

Our quiz will tell you where you are on your data journey and what tools you can unlock to level up.

Take the quiz
Why they chose Fivetran

Further reading
No items found.
No items found.
Related customer stories
Case study

Trinny London achieves up to £260K in savings with Fivetran

Case study

National Australia Bank enhances customer experiences and powers GenAI

Case study

Envision Pharma Group and Fivetran: Pioneering faster paths in pharma

Case study

Raiffeisen Bank International uses real-time data to win more customers

Case study

Saks achieves data efficiency and enables AI with Fivetran

Case study

Cemex connects 1,800+ global facilities in real-time

Case study

Gill Capital unlocks SAP data with Fivetran, boosts sales by 25%

Case study

HubSpot powers GenAI, saves $100,000 with Fivetran

Case study

Deliveroo turns food delivery into a data-driven enterprise

Case study

Paylocity transforms its go-to-market strategy with Fivetran

Let's get you moving data