Introducing the Fivetran async provider in Airflow

A new Fivetran Airflow provider developed by Astronomer allows data engineers to run Fivetran data syncs more efficiently in Airflow 2.2+
January 25, 2023

Fivetran announced the Fivetran Airflow provider just over a year ago, and now over 1,300 Fivetran connectors are being orchestrated together with other components of the modern data stack in Airflow everyday. We have made numerous improvements to the provider over the past year, and today Fivetran is happy to announce its collaboration with Astronomer as the first data integration service to provide a deferrable operator for Airflow, the Fivetran async provider. Here, we’ll discuss the motivations for this new package and new Airflow features that make it possible, before giving some instructions on how it can be added to DAGs.

Hear more from Astronomer’s Airflow Engineering Advocate Benji Lampel at this year’s Modern Data Stack Conference on April 4-5, 2023. Register with the discount code MDSCON-ORCHESTSRATE to get 25 percent off by March 1, 2023! 

Airflow architecture before deferrable operators

There are many components that make up an airflow deployment, including at least a database, webserver, scheduler and worker(s). The database maintains a list of all tasks and their state, and the scheduler monitors that database and identifies when a DAG has a task that needs to be run. Then, the scheduler will send that task to a worker to be done via an executor. A close look at the role of workers in Airflow will help explain the need for deferrable operators. Most Airflow deployments have a finite number of workers (typically 1 or 3, the number of workers can be defined and scaled out via the executor), and each worker can perform a finite number of tasks (this is also defined by the executor via the variable worker_concurrency) at a time. Once this limit is reached, schedulers will queue tasks until a worker has the capacity to accept a new task.

The components of an Airflow environment. Souce: Airflow docs

When using Fivetran in Airflow, a FivetranOperator starts a Fivetran connector while a FivetranSensor monitors that connector’s status and returns once the connector is complete. Every FivetranSensor takes up a worker slot for each connector. This could lead to situations where a FivetranSensor or multiple FivetranSensors are blocking Airflow from running other tasks or starting other DAGs. Luckily, a new Airflow feature has made this process much more efficient, deferrable operators.

New architecture and the async provider

Deferrable operators and sensors allow Airflow to run asynchronously. In versions 2.2 or later, a new component was introduced to the architecture mentioned above, a triggerer. Now, when an Airflow task is waiting for a condition to be met, it can be deferred to this triggerer instead of consuming a worker slot. A triggerer will group all of the deferrable operators and sensors together in a single Python process that monitors their status asynchronously, which is perfect for I/O bound operations like the movement of data that Fivetran performs.

Once a deferrable operator’s target status is achieved, it will trigger a callback function that tells the scheduler the task can continue or the DAG can proceed to another task. With Fivetran, this means a deferrable sensor can be used to wait for a Fivetran connector to finish syncing data to its destination without consuming resources that would cause other tasks and DAGs to enter a queued state. We have built this deferrable sensor and named it FivetranSensorAsync, and it is called in DAGs in the very same manner that the FivetranSensor was used before. Another option is to use FivetranOperatorAsync, which will both start a Fivetran sync and asynchronously monitor its execution to completion. There are examples on how to use both contained within the provider and on the Astronomer Registry.

An additional component, a triggerer, can make newer versions of Airflow asynchronous.

Try out the Fivetran async provider today

The code for the new provider can be found at Astronomer open-source software for a number of other providers containing deferrable operators and sensors, including for AWS services like RedShift, GCP services like BigQuery, Snowflake and Databricks. The Fivetran Async provider can be added to any Airflow 2.2+ environment with pip install airflow-provider-fivetran-async.

For more information on their new providers, check out a replay of Astronomer’s webinar on the provider or you can also attend this year’s Modern Data Stack Conference, where Astronomer will be presenting. 

See the full agenda here and register with the discount code MDSCON-ORCHESTSRATE to get 25 percent off by March 1, 2023! 

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Blog
Blog

Introducing the Fivetran async provider in Airflow

Introducing the Fivetran async provider in Airflow

January 25, 2023
January 25, 2023
Introducing the Fivetran async provider in Airflow
A new Fivetran Airflow provider developed by Astronomer allows data engineers to run Fivetran data syncs more efficiently in Airflow 2.2+

Fivetran announced the Fivetran Airflow provider just over a year ago, and now over 1,300 Fivetran connectors are being orchestrated together with other components of the modern data stack in Airflow everyday. We have made numerous improvements to the provider over the past year, and today Fivetran is happy to announce its collaboration with Astronomer as the first data integration service to provide a deferrable operator for Airflow, the Fivetran async provider. Here, we’ll discuss the motivations for this new package and new Airflow features that make it possible, before giving some instructions on how it can be added to DAGs.

Hear more from Astronomer’s Airflow Engineering Advocate Benji Lampel at this year’s Modern Data Stack Conference on April 4-5, 2023. Register with the discount code MDSCON-ORCHESTSRATE to get 25 percent off by March 1, 2023! 

Airflow architecture before deferrable operators

There are many components that make up an airflow deployment, including at least a database, webserver, scheduler and worker(s). The database maintains a list of all tasks and their state, and the scheduler monitors that database and identifies when a DAG has a task that needs to be run. Then, the scheduler will send that task to a worker to be done via an executor. A close look at the role of workers in Airflow will help explain the need for deferrable operators. Most Airflow deployments have a finite number of workers (typically 1 or 3, the number of workers can be defined and scaled out via the executor), and each worker can perform a finite number of tasks (this is also defined by the executor via the variable worker_concurrency) at a time. Once this limit is reached, schedulers will queue tasks until a worker has the capacity to accept a new task.

The components of an Airflow environment. Souce: Airflow docs

When using Fivetran in Airflow, a FivetranOperator starts a Fivetran connector while a FivetranSensor monitors that connector’s status and returns once the connector is complete. Every FivetranSensor takes up a worker slot for each connector. This could lead to situations where a FivetranSensor or multiple FivetranSensors are blocking Airflow from running other tasks or starting other DAGs. Luckily, a new Airflow feature has made this process much more efficient, deferrable operators.

New architecture and the async provider

Deferrable operators and sensors allow Airflow to run asynchronously. In versions 2.2 or later, a new component was introduced to the architecture mentioned above, a triggerer. Now, when an Airflow task is waiting for a condition to be met, it can be deferred to this triggerer instead of consuming a worker slot. A triggerer will group all of the deferrable operators and sensors together in a single Python process that monitors their status asynchronously, which is perfect for I/O bound operations like the movement of data that Fivetran performs.

Once a deferrable operator’s target status is achieved, it will trigger a callback function that tells the scheduler the task can continue or the DAG can proceed to another task. With Fivetran, this means a deferrable sensor can be used to wait for a Fivetran connector to finish syncing data to its destination without consuming resources that would cause other tasks and DAGs to enter a queued state. We have built this deferrable sensor and named it FivetranSensorAsync, and it is called in DAGs in the very same manner that the FivetranSensor was used before. Another option is to use FivetranOperatorAsync, which will both start a Fivetran sync and asynchronously monitor its execution to completion. There are examples on how to use both contained within the provider and on the Astronomer Registry.

An additional component, a triggerer, can make newer versions of Airflow asynchronous.

Try out the Fivetran async provider today

The code for the new provider can be found at Astronomer open-source software for a number of other providers containing deferrable operators and sensors, including for AWS services like RedShift, GCP services like BigQuery, Snowflake and Databricks. The Fivetran Async provider can be added to any Airflow 2.2+ environment with pip install airflow-provider-fivetran-async.

For more information on their new providers, check out a replay of Astronomer’s webinar on the provider or you can also attend this year’s Modern Data Stack Conference, where Astronomer will be presenting. 

See the full agenda here and register with the discount code MDSCON-ORCHESTSRATE to get 25 percent off by March 1, 2023! 

No items found.

Related blog posts

Orchestrating ELT in Airflow: Announcing the Fivetran Airflow provider
Blog

Orchestrating ELT in Airflow: Announcing the Fivetran Airflow provider

Read post
Five reasons to attend Modern Data Stack Conference 2023
Blog

Five reasons to attend Modern Data Stack Conference 2023

Read post
Terraform and the modern data stack as code
Blog

Terraform and the modern data stack as code

Read post
No items found.
Orchestrating ELT in Airflow: Catching every sync with XCOM
Blog

Orchestrating ELT in Airflow: Catching every sync with XCOM

Read post
Orchestrating ELT in Airflow: Scheduling vs. orchestrating
Blog

Orchestrating ELT in Airflow: Scheduling vs. orchestrating

Read post

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.