Fivetran announced the Fivetran Airflow provider just over a year ago, and now over 1,300 Fivetran connectors are being orchestrated together with other components of the modern data stack in Airflow everyday. We have made numerous improvements to the provider over the past year, and today Fivetran is happy to announce its collaboration with Astronomer as the first data integration service to provide a deferrable operator for Airflow, the Fivetran async provider. Here, we’ll discuss the motivations for this new package and new Airflow features that make it possible, before giving some instructions on how it can be added to DAGs.
Hear more from Astronomer’s Airflow Engineering Advocate Benji Lampel at this year’s Modern Data Stack Conference on April 4-5, 2023. Register with the discount code MDSCON-ORCHESTSRATE to get 25 percent off by March 1, 2023!
Airflow architecture before deferrable operators
There are many components that make up an airflow deployment, including at least a database, webserver, scheduler and worker(s). The database maintains a list of all tasks and their state, and the scheduler monitors that database and identifies when a DAG has a task that needs to be run. Then, the scheduler will send that task to a worker to be done via an executor. A close look at the role of workers in Airflow will help explain the need for deferrable operators. Most Airflow deployments have a finite number of workers (typically 1 or 3, the number of workers can be defined and scaled out via the executor), and each worker can perform a finite number of tasks (this is also defined by the executor via the variable worker_concurrency) at a time. Once this limit is reached, schedulers will queue tasks until a worker has the capacity to accept a new task.
New architecture and the async provider
Deferrable operators and sensors allow Airflow to run asynchronously. In versions 2.2 or later, a new component was introduced to the architecture mentioned above, a triggerer. Now, when an Airflow task is waiting for a condition to be met, it can be deferred to this triggerer instead of consuming a worker slot. A triggerer will group all of the deferrable operators and sensors together in a single Python process that monitors their status asynchronously, which is perfect for I/O bound operations like the movement of data that Fivetran performs.
Try out the Fivetran async provider today
For more information on their new providers, check out a replay of Astronomer’s webinar on the provider or you can also attend this year’s Modern Data Stack Conference, where Astronomer will be presenting.
See the full agenda here and register with the discount code MDSCON-ORCHESTSRATE to get 25 percent off by March 1, 2023!