What to know about the Fivetran async provider in Airflow

Fivetran Airflow provider developed by Astronomer allows data engineers to run Fivetran data syncs more efficiently in Airflow 2.2+.
July 12, 2023

Fivetran announced the Fivetran Airflow provider in 2021 — fast forward to today and thousands of Fivetran connectors are orchestrated together with other components of the modern data stack in Airflow. Fivetran’s collaboration with Astronomer is the first data integration service to provide a deferrable operator for Airflow, the Fivetran async provider. Here, we’ll discuss the motivations for this new package and new Airflow features that make it possible, before giving some instructions on how it can be added to DAGs.

[CTA_MODULE]

Airflow architecture before deferrable operators

There are many components that make up an Airflow deployment, including at least a database, webserver, scheduler and worker(s). The database maintains a list of all tasks and their state, and the scheduler monitors that database and identifies when a DAG has a task that needs to be run. Then, the scheduler will send that task to a worker to be done via an executor. A close look at the role of workers in Airflow will help explain the need for deferrable operators. Most Airflow deployments have a finite number of workers (typically 1 or 3, the number of workers can be defined and scaled out via the executor), and each worker can perform a finite number of tasks (this is also defined by the executor via the variable worker_concurrency) at a time. Once this limit is reached, schedulers will queue tasks until a worker has the capacity to accept a new task.

The components of an Airflow environment. Source: Airflow docs

When using Fivetran in Airflow, a FivetranOperator starts a Fivetran connector while a FivetranSensor monitors that connector’s status and returns once the connector is complete. Every FivetranSensor takes up a worker slot for each connector. This could lead to situations where a FivetranSensor or multiple FivetranSensors are blocking Airflow from running other tasks or starting other DAGs. Luckily, a new Airflow feature has made this process much more efficient, deferrable operators.

New architecture and the async provider

Deferrable operators and sensors allow Airflow to run asynchronously. In versions 2.2 or later, a new component was introduced to the architecture mentioned above, a triggerer. Now, when an Airflow task is waiting for a condition to be met, it can be deferred to this triggerer instead of consuming a worker slot. A triggerer will group all of the deferrable operators and sensors together in a single Python process that monitors their status asynchronously, which is perfect for I/O bound operations like the movement of data that Fivetran performs.

Once a deferrable operator’s target status is achieved, it will trigger a callback function that tells the scheduler the task can continue or the DAG can proceed to another task. With Fivetran, this means a deferrable sensor can be used to wait for a Fivetran connector to finish syncing data to its destination without consuming resources that would cause other tasks and DAGs to enter a queued state. We have built this deferrable sensor and named it FivetranSensorAsync, and it is called in DAGs in the very same manner that the FivetranSensor was used before. Another option is to use FivetranOperatorAsync, which will both start a Fivetran sync and asynchronously monitor its execution to completion. There are examples on how to use both contained within the provider and on the Astronomer Registry.

An additional component, a triggerer, can make newer versions of Airflow asynchronous.

Try out the Fivetran async provider today

The code for the new provider can be found at Astronomer open-source software for a number of other providers containing deferrable operators and sensors, including for AWS services like RedShift, GCP services like BigQuery, Snowflake and Databricks. The Fivetran Async provider can be added to any Airflow 2.2+ environment with pip install airflow-provider-fivetran-async.

For more information on Astronomer’s Airflow provider check out their blog post.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Product
Product

What to know about the Fivetran async provider in Airflow

What to know about the Fivetran async provider in Airflow

July 12, 2023
July 12, 2023
What to know about the Fivetran async provider in Airflow
Fivetran Airflow provider developed by Astronomer allows data engineers to run Fivetran data syncs more efficiently in Airflow 2.2+.

Fivetran announced the Fivetran Airflow provider in 2021 — fast forward to today and thousands of Fivetran connectors are orchestrated together with other components of the modern data stack in Airflow. Fivetran’s collaboration with Astronomer is the first data integration service to provide a deferrable operator for Airflow, the Fivetran async provider. Here, we’ll discuss the motivations for this new package and new Airflow features that make it possible, before giving some instructions on how it can be added to DAGs.

[CTA_MODULE]

Airflow architecture before deferrable operators

There are many components that make up an Airflow deployment, including at least a database, webserver, scheduler and worker(s). The database maintains a list of all tasks and their state, and the scheduler monitors that database and identifies when a DAG has a task that needs to be run. Then, the scheduler will send that task to a worker to be done via an executor. A close look at the role of workers in Airflow will help explain the need for deferrable operators. Most Airflow deployments have a finite number of workers (typically 1 or 3, the number of workers can be defined and scaled out via the executor), and each worker can perform a finite number of tasks (this is also defined by the executor via the variable worker_concurrency) at a time. Once this limit is reached, schedulers will queue tasks until a worker has the capacity to accept a new task.

The components of an Airflow environment. Source: Airflow docs

When using Fivetran in Airflow, a FivetranOperator starts a Fivetran connector while a FivetranSensor monitors that connector’s status and returns once the connector is complete. Every FivetranSensor takes up a worker slot for each connector. This could lead to situations where a FivetranSensor or multiple FivetranSensors are blocking Airflow from running other tasks or starting other DAGs. Luckily, a new Airflow feature has made this process much more efficient, deferrable operators.

New architecture and the async provider

Deferrable operators and sensors allow Airflow to run asynchronously. In versions 2.2 or later, a new component was introduced to the architecture mentioned above, a triggerer. Now, when an Airflow task is waiting for a condition to be met, it can be deferred to this triggerer instead of consuming a worker slot. A triggerer will group all of the deferrable operators and sensors together in a single Python process that monitors their status asynchronously, which is perfect for I/O bound operations like the movement of data that Fivetran performs.

Once a deferrable operator’s target status is achieved, it will trigger a callback function that tells the scheduler the task can continue or the DAG can proceed to another task. With Fivetran, this means a deferrable sensor can be used to wait for a Fivetran connector to finish syncing data to its destination without consuming resources that would cause other tasks and DAGs to enter a queued state. We have built this deferrable sensor and named it FivetranSensorAsync, and it is called in DAGs in the very same manner that the FivetranSensor was used before. Another option is to use FivetranOperatorAsync, which will both start a Fivetran sync and asynchronously monitor its execution to completion. There are examples on how to use both contained within the provider and on the Astronomer Registry.

An additional component, a triggerer, can make newer versions of Airflow asynchronous.

Try out the Fivetran async provider today

The code for the new provider can be found at Astronomer open-source software for a number of other providers containing deferrable operators and sensors, including for AWS services like RedShift, GCP services like BigQuery, Snowflake and Databricks. The Fivetran Async provider can be added to any Airflow 2.2+ environment with pip install airflow-provider-fivetran-async.

For more information on Astronomer’s Airflow provider check out their blog post.

Learn how to set up a data ingestion pipeline with Fivetran and Astronomer.
Save your seat for the hands-on lab

Related blog posts

Orchestrating ELT in Airflow: Announcing the Fivetran Airflow provider
Product

Orchestrating ELT in Airflow: Announcing the Fivetran Airflow provider

Read post
Terraform and the modern data stack as code
Product

Terraform and the modern data stack as code

Read post
From descriptive to predictive: Your first machine learning model
Data insights

From descriptive to predictive: Your first machine learning model

Read post
No items found.
Build a data app with Streamlit, Fivetran and Snowflake
Blog

Build a data app with Streamlit, Fivetran and Snowflake

Read post
Everything you need to know about the Fivetran REST API
Blog

Everything you need to know about the Fivetran REST API

Read post
Three ways Fivetran improves development efficiency
Blog

Three ways Fivetran improves development efficiency

Read post

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.