What is Fivetran?

And how does it work? Fivetran explained in about 1000 words or 5 minutes.
April 10, 2023

Fivetran is a modern, cloud-based automated data movement platform, designed to offer organizations the ability to effortlessly extract, load and transform data between a wide range of sources and destinations.

This includes both the traditional data integration use case, in which data is moved from applications, databases and files to a central repository in order to consolidate a “single source of truth” for analytics, as well as the general capability of moving data between databases, data warehouses and data lakes to aid business operations.

Moving data from source to destination requires a high-performance system that can be deceptively complex to engineer. Considerations include properly scoping and scaling an environment, ensuring availability, recovering from failures and rebuilding the system in response to changing data sources and business needs. Many common data integration tools provide frameworks for solving these tasks but still demand a considerable degree of configuration and engineering work from end users.

Moreover, it is not uncommon for organizations to use dozens to hundreds of different applications, tools and operational systems that produce data, each of which leaves behind valuable digital clues.

These challenges impose significant costs in time, labor and money on organizations that attempt to move data using any kind of bespoke, high-configuration solution. Building an efficient, reliable and scalable data operations infrastructure from scratch, or even with the assistance of a framework, is an exercise in frustration and lost opportunities.

By contrast, an automated data movement solution that works off the shelf obviates the need for an organization to build such a solution in-house.

Automation, reliability and scalability

From the perspective of the end user, the ideal data movement workflow should consist of little more than:

  1. Selecting connectors for data sources from a menu
  2. Supplying credentials
  3. Specifying a schedule
  4. Pressing a button to begin execution

The simplicity of this workflow belies considerable complexity under the hood. The Fivetran architecture is strictly divided between a user’s local environment, Fivetran cloud and customer cloud. This division is essential for ensuring both security and performance. In terms of security, the strict separations between the front end, back end and customer cloud ensure there are no ways sensitive data can be exposed through the front end. For the sake of performance, as a cloud-native tool, Fivetran makes extensive use of on-demand parallelization.

The following architectural diagram lays out the standard cloud-based Fivetran approach to automated data movement:

Note that for organizations with security requirements that limit the ability to use cloud-based SaaS solutions, Fivetran also offers hybrid and on-premises architectures.

A typical workflow follows these steps:

  1. The user accesses the Fivetran front end through the Fivetran.com dashboard or API.
  2. The user creates and configures connectors.
  3. The user’s choices are recorded in the Fivetran production database.
  4. Based on the settings saved in the production database, the Fivetran backend spawns a number of workers on a schedule.
  5. Each worker extracts and loads data, with some light processing. Workers expire when they are no longer needed.
  6. Transformations to produce analyst-ready data models are separately triggered and run on the destination. Fivetran data models are produced through our integration with dbt™ to power transformations.

In order to ensure that the workflow described above operates smoothly and reliably, Fivetran is also designed with a number of considerations that aren’t easily captured in an architectural diagram.

  • Incremental updates ensure timely updates and minimal disruption to source systems. Instead of extracting and loading the entire data source with every sync, Fivetran detects new or modified records and reproduces the changes at the destination. Full syncs are only used for an initial sync or to fix serious data integrity issues such as corrupted records. The main mechanism by which Fivetran achieves incremental updates is change data capture (CDC).

  • Idempotence is the ability of a data connector to easily recover from failed syncs. In the context of data movement, idempotence ensures that if you apply the same data to a destination multiple times, you will get the same result. Without idempotence, a failed sync means an engineer must sleuth out which records were and weren’t synced and design a custom recovery procedure to remove duplicate records. With idempotence, the data connector can simply replay any data that might not have made it to the destination. If a record is already present, the replay has no effect; otherwise, the record is added.

  • Schema drift handling involves accurately representing data even as sources change. Schema drift handling also involves data type detection and coercion, balancing accurate replication and preservation of data with the reliable functioning of data connectors. Fivetran mainly addresses this problem with live updating, in which data is perfectly reproduced between source and destination.

  • Ensuring pipeline and network performance requires minimizing latency and performance bottlenecks. Fivetran accomplishes this through algorithmic optimization, parallelization, pipelining and buffering.

Why Fivetran is a platform, not just a pipeline

Fivetran is more than a point solution that solves the single, discrete problem of centralizing data for analytics. Longer term, organizations must also consider democratizing access to data and finding ways to monetize data. Mindful of these needs, Fivetran provides security, governance and extensibility features.

Security features are must-haves to ensure regulatory compliance, manage brand risk, protect internal operations and intellectual property, and safeguard customer information or other business critical data in an ethical manner as it is moved around. When considering platform security, common features include flexible deployment and secure networking options, security compliance certifications for SaaS platforms, end-to-end encryption data protection and process isolation.

In a similar vein, data governance is essential for enabling organizations to know, access and protect their data. Data governance features include easy integration with data catalogs, graphical exposure of data model lineage, metadata capture and other auditing tools.

Finally, extensibility features enable an organization to programmatically control a growing ecosystem of data management tools and embed data assets into products. As data needs grow in scale and complexity over time, organizations will need the ability to manage users at scale, integrate with other data operations technologies and construct custom processes and workflows that depend on data.

[CTA_MODULE]

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

What is Fivetran?

What is Fivetran?

April 10, 2023
April 10, 2023
What is Fivetran?
And how does it work? Fivetran explained in about 1000 words or 5 minutes.

Fivetran is a modern, cloud-based automated data movement platform, designed to offer organizations the ability to effortlessly extract, load and transform data between a wide range of sources and destinations.

This includes both the traditional data integration use case, in which data is moved from applications, databases and files to a central repository in order to consolidate a “single source of truth” for analytics, as well as the general capability of moving data between databases, data warehouses and data lakes to aid business operations.

Moving data from source to destination requires a high-performance system that can be deceptively complex to engineer. Considerations include properly scoping and scaling an environment, ensuring availability, recovering from failures and rebuilding the system in response to changing data sources and business needs. Many common data integration tools provide frameworks for solving these tasks but still demand a considerable degree of configuration and engineering work from end users.

Moreover, it is not uncommon for organizations to use dozens to hundreds of different applications, tools and operational systems that produce data, each of which leaves behind valuable digital clues.

These challenges impose significant costs in time, labor and money on organizations that attempt to move data using any kind of bespoke, high-configuration solution. Building an efficient, reliable and scalable data operations infrastructure from scratch, or even with the assistance of a framework, is an exercise in frustration and lost opportunities.

By contrast, an automated data movement solution that works off the shelf obviates the need for an organization to build such a solution in-house.

Automation, reliability and scalability

From the perspective of the end user, the ideal data movement workflow should consist of little more than:

  1. Selecting connectors for data sources from a menu
  2. Supplying credentials
  3. Specifying a schedule
  4. Pressing a button to begin execution

The simplicity of this workflow belies considerable complexity under the hood. The Fivetran architecture is strictly divided between a user’s local environment, Fivetran cloud and customer cloud. This division is essential for ensuring both security and performance. In terms of security, the strict separations between the front end, back end and customer cloud ensure there are no ways sensitive data can be exposed through the front end. For the sake of performance, as a cloud-native tool, Fivetran makes extensive use of on-demand parallelization.

The following architectural diagram lays out the standard cloud-based Fivetran approach to automated data movement:

Note that for organizations with security requirements that limit the ability to use cloud-based SaaS solutions, Fivetran also offers hybrid and on-premises architectures.

A typical workflow follows these steps:

  1. The user accesses the Fivetran front end through the Fivetran.com dashboard or API.
  2. The user creates and configures connectors.
  3. The user’s choices are recorded in the Fivetran production database.
  4. Based on the settings saved in the production database, the Fivetran backend spawns a number of workers on a schedule.
  5. Each worker extracts and loads data, with some light processing. Workers expire when they are no longer needed.
  6. Transformations to produce analyst-ready data models are separately triggered and run on the destination. Fivetran data models are produced through our integration with dbt™ to power transformations.

In order to ensure that the workflow described above operates smoothly and reliably, Fivetran is also designed with a number of considerations that aren’t easily captured in an architectural diagram.

  • Incremental updates ensure timely updates and minimal disruption to source systems. Instead of extracting and loading the entire data source with every sync, Fivetran detects new or modified records and reproduces the changes at the destination. Full syncs are only used for an initial sync or to fix serious data integrity issues such as corrupted records. The main mechanism by which Fivetran achieves incremental updates is change data capture (CDC).

  • Idempotence is the ability of a data connector to easily recover from failed syncs. In the context of data movement, idempotence ensures that if you apply the same data to a destination multiple times, you will get the same result. Without idempotence, a failed sync means an engineer must sleuth out which records were and weren’t synced and design a custom recovery procedure to remove duplicate records. With idempotence, the data connector can simply replay any data that might not have made it to the destination. If a record is already present, the replay has no effect; otherwise, the record is added.

  • Schema drift handling involves accurately representing data even as sources change. Schema drift handling also involves data type detection and coercion, balancing accurate replication and preservation of data with the reliable functioning of data connectors. Fivetran mainly addresses this problem with live updating, in which data is perfectly reproduced between source and destination.

  • Ensuring pipeline and network performance requires minimizing latency and performance bottlenecks. Fivetran accomplishes this through algorithmic optimization, parallelization, pipelining and buffering.

Why Fivetran is a platform, not just a pipeline

Fivetran is more than a point solution that solves the single, discrete problem of centralizing data for analytics. Longer term, organizations must also consider democratizing access to data and finding ways to monetize data. Mindful of these needs, Fivetran provides security, governance and extensibility features.

Security features are must-haves to ensure regulatory compliance, manage brand risk, protect internal operations and intellectual property, and safeguard customer information or other business critical data in an ethical manner as it is moved around. When considering platform security, common features include flexible deployment and secure networking options, security compliance certifications for SaaS platforms, end-to-end encryption data protection and process isolation.

In a similar vein, data governance is essential for enabling organizations to know, access and protect their data. Data governance features include easy integration with data catalogs, graphical exposure of data model lineage, metadata capture and other auditing tools.

Finally, extensibility features enable an organization to programmatically control a growing ecosystem of data management tools and embed data assets into products. As data needs grow in scale and complexity over time, organizations will need the ability to manage users at scale, integrate with other data operations technologies and construct custom processes and workflows that depend on data.

[CTA_MODULE]

Learn more about the Fivetran approach to data movement, security, governance and extensibility.
Download now

Articles associés

ETL vs. ELT: Why a post-load process wins every time
Data insights

ETL vs. ELT: Why a post-load process wins every time

Lire l’article
How the Fivetran approach to data normalization cuts compute costs
Data insights

How the Fivetran approach to data normalization cuts compute costs

Lire l’article
Fivetran: The all-in-one data movement platform for enterprise
Product

Fivetran: The all-in-one data movement platform for enterprise

Lire l’article
No items found.
How to give marketers a safe, self-serve Customer 360
Blog

How to give marketers a safe, self-serve Customer 360

Lire l’article
The small data team’s guide to conquering data
Blog

The small data team’s guide to conquering data

Lire l’article
Replacing iPaaS workflows with warehouse-centric data pipelines
Blog

Replacing iPaaS workflows with warehouse-centric data pipelines

Lire l’article

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.