How we execute dbt™ runs faster and cheaper

Introducing smart run for dbt Core™
April 6, 2023

Do you find yourself wasting time looking through your data lineage DAG trying to figure out which models to run? What about saving on the costs of running your dbt models? Do your dbt runs take forever?

To solve this problem for ourselves, the Fivetran analytics team developed "smart run for dbt Core™," our way of running only the models that need to run without wasting your brainpower to figure out how to craft your dbt run command. 

Imagine the following scenario… 

You’re working on a code change which involves making changes to many models. You’ve made changes to the models in red below, and you want to know how your changes impact the model in green. Purple models are those that haven’t been touched.

The cheapest way in terms of time and compute cost to run the new sequence is to:

  1. Copy the corresponding table from the models labeled with “C” from the production schema to your development schema. This ensures the models that will be run are using the data that is as fresh as the production environment. Note that the Copy command is free!
  2. Subsequently run the models labeled with “R”.
  3. All models labeled with “I” are ignored

That’s why the Fivetran Analytics team developed a python script “smart run for dbt Core™”

Now, analysts at Fivetran don’t need to worry about this problem. They run `$python3 dbt_smart_run.py` and it does the heavy lifting. 

Consider the following example: Assume we have made a change in the xactly_quotas model, and everything else has not changed. What would be the least expensive (both in terms of time and cost) way to understand what is the impact of our change to the quota_attainment model at the end?

We actually don’t need to run the entire tree. We can ignore some models, copy others, and run only the necessary models (see image above). Well, that is what smart run does automatically. You just specify the target model, and it figures out and does the rest.

Now, without wasting any time thinking, the analyst can simply run:

$ python3 dbt_smart_run.py -targets quota_attainment

We (the internal analytics team at Fivetran) chose to develop this for the following reasons:

  • The same command is used regardless of how complicated or simple your dbt run is.  No need to remember different commands.  We encourage our analysts to always use smart run for dbt Core™
  • This does not rely on the manifest.json

Great, how can I use this on my team?

Check out the code here.

Kostenlos starten

Schließen auch Sie sich den Tausenden von Unternehmen an, die ihre Daten mithilfe von Fivetran zentralisieren und transformieren.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

How we execute dbt™ runs faster and cheaper

How we execute dbt™ runs faster and cheaper

April 6, 2023
April 6, 2023
How we execute dbt™ runs faster and cheaper
THEMEN
No items found.
Aktie
Introducing smart run for dbt Core™

Do you find yourself wasting time looking through your data lineage DAG trying to figure out which models to run? What about saving on the costs of running your dbt models? Do your dbt runs take forever?

To solve this problem for ourselves, the Fivetran analytics team developed "smart run for dbt Core™," our way of running only the models that need to run without wasting your brainpower to figure out how to craft your dbt run command. 

Imagine the following scenario… 

You’re working on a code change which involves making changes to many models. You’ve made changes to the models in red below, and you want to know how your changes impact the model in green. Purple models are those that haven’t been touched.

The cheapest way in terms of time and compute cost to run the new sequence is to:

  1. Copy the corresponding table from the models labeled with “C” from the production schema to your development schema. This ensures the models that will be run are using the data that is as fresh as the production environment. Note that the Copy command is free!
  2. Subsequently run the models labeled with “R”.
  3. All models labeled with “I” are ignored

That’s why the Fivetran Analytics team developed a python script “smart run for dbt Core™”

Now, analysts at Fivetran don’t need to worry about this problem. They run `$python3 dbt_smart_run.py` and it does the heavy lifting. 

Consider the following example: Assume we have made a change in the xactly_quotas model, and everything else has not changed. What would be the least expensive (both in terms of time and cost) way to understand what is the impact of our change to the quota_attainment model at the end?

We actually don’t need to run the entire tree. We can ignore some models, copy others, and run only the necessary models (see image above). Well, that is what smart run does automatically. You just specify the target model, and it figures out and does the rest.

Now, without wasting any time thinking, the analyst can simply run:

$ python3 dbt_smart_run.py -targets quota_attainment

We (the internal analytics team at Fivetran) chose to develop this for the following reasons:

  • The same command is used regardless of how complicated or simple your dbt run is.  No need to remember different commands.  We encourage our analysts to always use smart run for dbt Core™
  • This does not rely on the manifest.json

Great, how can I use this on my team?

Check out the code here.

Topics
No items found.
Share

Verwandte Beiträge

dbt erklärt
Data insights

dbt erklärt

Beitrag lesen
Fivetran is now a dbt Metrics Ready Partner
Product

Fivetran is now a dbt Metrics Ready Partner

Beitrag lesen
Fivetran & dbt: The essential duo for modern analytics
Data insights

Fivetran & dbt: The essential duo for modern analytics

Beitrag lesen
Fivetran & dbt: The essential duo for modern analytics
Blog

Fivetran & dbt: The essential duo for modern analytics

Beitrag lesen
dbt erklärt
Blog

dbt erklärt

Beitrag lesen
Fivetran named a Challenger in 2024 Gartner® Magic Quadrant™
Blog

Fivetran named a Challenger in 2024 Gartner® Magic Quadrant™

Beitrag lesen
The path to better patient care: Securing and scaling healthcare data
Blog

The path to better patient care: Securing and scaling healthcare data

Beitrag lesen
Fivetran brings automated data integration to Amazon SageMaker Lakehouse
Blog

Fivetran brings automated data integration to Amazon SageMaker Lakehouse

Beitrag lesen

Kostenlos starten

Schließen auch Sie sich den Tausenden von Unternehmen an, die ihre Daten mithilfe von Fivetran zentralisieren und transformieren.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.