Replicating sharded databases: A case study of SalesLoft, Citus Data and Fivetran

Unternehmensgröße
500-1999
Region
North America
Branche
B2B technology
Sources
+3 more
Destinations
Wichtigste Ergebnisse
  • Saved time and resources not having to build or maintain custom infrastructure
  • Moved data from disparate source systems into a centralized warehouse

SalesLoft is among the world's leading sales engagement platforms. One of its key mottos is: “We Believe in Sales.” With that credo in mind, SalesLoft says it “created the Modern Sales Engagement Platform.” The data infrastructure that supports this is a Citus Data sharded Postgres cluster, says Mike Sandt, a SalesLoft engineer.

“Citus Data provides SalesLoft with a hosted, multi-tenant, Postgres environment without the overhead of implementing the technology ourselves. We deal with large data volumes at SalesLoft,” Sandt says. “In order to help scale our technology as we drive towards the enterprise, we identified a need to tenant our systems. Citus made this process easier on us by providing the appropriate libraries and operational infrastructure.”

SalesLoft’s challenge was how to combine the sharded tables in a columnar data warehouse, AWS Redshift, in order to perform advanced analytics. The Fivetran off-the-shelf solution was to replicate every database in the cluster individually by using a standard Fivetran Postgres connector. As a result, SalesLoft didn’t have to spend time or resources building or maintaining custom infrastructure to replicate sharded data.

“Fivetran solved the problem of getting the data into our warehouse without having to run the gambit of rolling custom infrastructure for such tasks,” Sandt says. “Fivetran effectively centralized the data for us so that we can provide insight into customer usage.”

The general strategy for replicating sharded databases is simple: Replicate each individual database into its own schema, then use a VIEW to re-combine the tables at query time. This side-steps combining the sharded data in the ETL pipeline, avoiding tricky logic that is difficult to debug and maintain. The data analysts working with the data queries the VIEW as a single logical table even though it’s physically split on disk.

This strategy leverages the fact that modern data warehouses natively handle computations across tables sharded over multiple nodes — it’s how data is stored under the hood. In practice, querying VIEWs that are UNION SELECT’s of the underlying sharded tables is highly performant. The data warehouse query planner is smart enough to avoid materializing the intermediary VIEW.

The final piece to this strategy is automating this whole process.

Fivetran handles the difficult parts by automatically detecting and syncing new schemas and tables in the source databases. A simple Python script was all that was necessary to automate keeping the VIEWs up-to-date as new schemas and tables are created in the Citus Data cluster. This can be scheduled by cron or run on AWS Lambda. Here’s a link to the VIEW creation script

“Fivetran makes it stupid simple to get data from disparate source systems into a centralized warehouse,” Sandt Says. “Fivetran has a very complete vision of the space they operate in, and the required competency with underlying database systems to get us up and running quickly."

The Fivetran Postgres connector is just one of the dozens of connectors Fivetran offers. These connectors are the data from your business applications, and databases, that Fivetran syncs into a data warehouse for you.

About Fivetran: Our mission is to democratize data, to make companies data driven, and to give analysts easy access to disparate data sources to perform advanced analytics.

With as little as a 5-minute setup, Fivetran replicates all your applications, databases, events and file storage into a high-performance data warehouse. Our cloud data pipelines are zero-configuration, zero-maintenance and fully managed by Fivetran.

Using Fivetran, businesses big and small gain complete control and ownership of their data. It’s easy to join data sources, perform agile analytics, and ultimately discover valuable insights using SQL or the business intelligence (BI) tools of choice.

The Fivetran sales team is available to present a demo, and provide a free trial. Sign up here.

[CTA_MODULE]

Die gesamtwirtschaftlichen Auswirkungen von Fivetran

Erfahren Sie, wie Sie mit automatisiertem Data Movement in Ihrem Unternehmen die Produktivität steigern und schneller Erkenntnisse gewinnen können.

Laden Sie den Bericht herunter
Zentralisierte Daten treiben das Unternehmenswachstum voran

So beschleunigen echte Fivetran-Kunden Analytics und KI

Jetzt den Leitfaden herunterladen
Why they chose Fivetran

Further reading
No items found.
No items found.
Ähnliche Kunden-Storys
Case study

Adragos erzielt mit Fivetran schnellere Erkenntnisse und treibt seine globale Expansion voran

Case study

Flaschen gespart, Effizienz gewonnen: air up® zentralisiert ihre Daten für mehr Wachstum

Case study

FELFEL beschleunigt die Datenintegration mit Fivetran um das Zwanzigfache

Case study

Raiffeisen Bank International setzt für die Gewinnung von Kunden auf Echtzeitdaten

Case study

HubSpot spart bei GenAI mit Fivetran 100.000 US-Dollar ein

Case study

Deliveroo verwandelt die Essenslieferung in eine datengesteuerte Unternehmung

Case study

Westwing steigert den Marketing-ROI mit Fivetran

Case study

MyCamper startet seine datengesteuerte Reise mit Fivetran

Case study

Condé Nast bildet mit Fivetran die Customer Journey für globale Marken ab

Case study

Exporo gibt ELT an Fivetran ab und investiert in eine Datenkultur