The Fivetran Airflow provider includes a FivetranOperator, which starts a Fivetran sync and a FivetranSensor, which monitors a connector until it is finished syncing data. With Airflow, it’s easy to program a FivetranSensor to start only after a FivetranOperator has successfully run for a connector. However, Airflow provides no guarantees that a FivetranSensor will run immediately after a FivetranOperator, which can cause issues. In this post, we’ll describe those issues and demonstrate how they are being resolved in the new version of the Fivetran Airflow provider.
Sensors missing syncs in Airflow
Well over 600,000 Fivetran data syncs have run in Airflow since the Fivetran Airflow provider launched in early 2021. Rarely have we seen issues in which a FivetranSensor will keep running and time out after a connector has already finished moving data. But we’ve identified two potential reasons when and why this happens.
First, short running syncs. Some Fivetran connectors take just seconds to move data from its source to destination. If this happens, a FivetranSensor may not start until after the connector it is monitoring has already finished syncing data.
The other reason has to do with Airflow’s limited resources. Some Airflow deployments have a finite number of workers, which are containers that perform the tasks that make up DAGs. If many DAGs are running in parallel, every worker slot may be occupied, and Airflow’s scheduler will queue upcoming tasks until a worker slot is available. If a FivetranSensor is placed in a queued state by a scheduler, a connector’s sync may finish before the FivetranSensor begins running.
In either case, data syncs that a FivetranSensor is supposed to monitor finish before the FivetranSensor can start. One of the very first things that a FivetranSensor does is collect the timestamp of the connector’s previously completed data sync from a connector’s metadata. A FivetranSensor will then monitor this field until Fivetran updates it, signaling the completion of a new data sync. If a connector’s metadata has already been updated before a FivetranSensor starts, which is the case in the examples illustrated above, the sensor will either time out or hang until the connector’s next data sync is started. Both situations can be avoided by utilizing another Airflow mechanism: XCOM.
Ensuring successful sensors with XCOM
By default, every Task in Airflow is entirely independent and isolated. This allows separate tasks to easily be composed together into data pipelines and ensure that any Task can run on any machine, at any time, in any order.
XCOMs in Airflow do the opposite: Tying tasks together by allowing them to pass key value pairs between each other.
Downloading the new Fivetran Airflow provider
If you have any questions about the new version of the Fivetran Airflow provider, please email us at [email protected].