Do you find yourself wasting time looking through your data lineage DAG trying to figure out which models to run? What about saving on the costs of running your dbt models? Do your dbt runs take forever?
To solve this problem for ourselves, the Fivetran analytics team developed "smart run for dbt Core™," our way of running only the models that need to run without wasting your brainpower to figure out how to craft your dbt run command.
Imagine the following scenario…
You’re working on a code change which involves making changes to many models. You’ve made changes to the models in red below, and you want to know how your changes impact the model in green. Purple models are those that haven’t been touched.
The cheapest way in terms of time and compute cost to run the new sequence is to:
- Copy the corresponding table from the models labeled with “C” from the production schema to your development schema. This ensures the models that will be run are using the data that is as fresh as the production environment. Note that the Copy command is free!
- Subsequently run the models labeled with “R”.
- All models labeled with “I” are ignored
That’s why the Fivetran Analytics team developed a python script “smart run for dbt Core™”
Now, analysts at Fivetran don’t need to worry about this problem. They run `$python3 dbt_smart_run.py` and it does the heavy lifting.
Consider the following example: Assume we have made a change in the xactly_quotas model, and everything else has not changed. What would be the least expensive (both in terms of time and cost) way to understand what is the impact of our change to the quota_attainment model at the end?
We actually don’t need to run the entire tree. We can ignore some models, copy others, and run only the necessary models (see image above). Well, that is what smart run does automatically. You just specify the target model, and it figures out and does the rest.
Now, without wasting any time thinking, the analyst can simply run:
$ python3 dbt_smart_run.py -targets quota_attainment
We (the internal analytics team at Fivetran) chose to develop this for the following reasons:
- The same command is used regardless of how complicated or simple your dbt run is. No need to remember different commands. We encourage our analysts to always use smart run for dbt Core™
- This does not rely on the manifest.json
Great, how can I use this on my team?
Check out the code here.