Learn
Learn

What is DBT? : Everything you need to know

What is DBT? : Everything you need to know

April 28, 2023
April 28, 2023
What is DBT? : Everything you need to know
DBT (data build tool) is an open-source tool that can change how you transform your data. Learn more about dbt and how it works.

The data analytics stack has several layers: extract, load, transform and analyze. Businesses have multiple software alternatives they can use to let citizen analysts extract and load data and run analytics, but the transformation phase still requires specialized expertise. Data transformation is essential for any company. It involves turning data from various sources into a structured format that supports analytics and modeling.

But to transform data, you need a data engineer or database administrator (DBA) who knows how relational databases work. Fortunately, that transformation work doesn’t have to be a manual process that a DBA creates with every project.

This is where dbt comes in — an open-source tool that makes the data transformation process more accessible to data analysts. But how exactly does dbt work, and what can you do with it? In this article, we’ll explain what dbt is and how it can improve your data pipeline. We’ll also look at how it differs from Extract, Transform, Load (ETL) and provide instructions on getting started.

What is dbt?

dbt (data build tool) is a data transformation and modeling tool from dbt Labs, formerly Fishtown Analytics. The dbt data transformation tool, which is SQL-based, lets data professionals build data models iteratively and automate data transformation. 

Users model their data using SQL SELECT statements in their text editor of choice, create relationships and dependencies between models and materialize those models as tables and views in a data warehouse. From there it’s a simple task to turn the models into business intelligence.

The dbt platform comprises two products. When most people refer to dbt, they’re talking about dbt Core. It takes a dbt project and a command and creates tables or views in your warehouse. dbt Core is free and released under an Apache License as open-source software.

The other product, dbt Cloud, provides a web-based integrated development environment (IDE) to help teams develop dbt projects and a scheduler. Some dbt Cloud features are free, while other features, for collaboration and enterprise use, have a cost to use them.

dbt supports databases like Snowflake, BigQuery and Redshift. Since it uses simple SQL SELECT statements, data analysts can own data transformation and accelerate time to insight.

Here’s a look at the built-in features that dbt offers.

Version control 

Remember when you’d name your college papers “paper-first-draft” and “paper-final-draft2”? This method of “version control” may have worked for reports, but it’s not practical for data models. It’s also confusing for team projects.

dbt integrates with version control tools like GitHub, which allows you to track, organize and manage changes to your dbt projects. It also ensures that your team is working on the right version. If you make a mistake, you can easily revert to a previous version.

Data testing

dbt has a built-in tool that allows you to test the quality of your transformations. It supports four generic tests out of the box:

  • unique: Ensures that there are no duplicate values.
  • not_null: Validates that there are no null values in a column.
  • accepted_values: Validates that all values in a column are present.
  • relationships: Ensures that records in a child table have a related record in a parent table.

Here’s an example of what these tests look for in an orders model:

(Image Source)

If a test fails, dbt flags those errors and stores them in a database. This allows you to diagnose and debug the problem without affecting your data model.

Sandboxing

dbt has a useful feature called environments. It acts as a “virtual sandbox” where you can freely write and test code without affecting the production data model. You can deploy new changes to the production environment once you’re happy with them.

Why should you use dbt?

With dbt, you can reference and transform data into new sets without it leaving your data warehouse. dbt has now become more mainstream, as shown below with the growing number of weekly active dbt projects.

(Image Source)

Here’s why more companies are using dbt and what it can do for your data pipeline:

Vastly improve runtimes

Improving runtime performance can help your data analysts work more productively. dbt performs transformations and calculations directly in a data warehouse — tasks that would lead to significantly longer run times if they were performed in a visualization platform. 

What’s more, dbt offers incremental data models. When you run a data model for the first time, a table is built by transforming every row of source data. Incremental models update and insert new records on subsequent runs, so the table gets built “incrementally.” This leads to faster runtimes. An analytics engineer at dbt Labs used incremental data models to cut 90 minutes off their longest-running model.

Simplify data transformations

dbt makes the data transformation process more accessible for analysts. They don’t have to wait for a data engineer to build a pipeline. Data analysts can transform data with SELECT statements in SQL without requiring extensive experience in other languages.

What does this mean for your company? Analysts can transform the data the way they need it to complete their analysis instead of waiting around.

Maintain data documentation

Ever look at a dataset or data model and have no idea what you’re looking at? 

DBT offers a documentation hub, which includes information about your project and your data warehouse. More importantly, you can also add descriptions to your models, columns and source.

(Image Source)

This documentation feature helps build more transparency into your data. Another hidden benefit is that it protects against knowledge loss — if an employee leaves, another team member can pick up where they left off. 

What is data modeling?

As it applies to data analytics, data modeling is the process of imposing order on data from diverse sources. That means transforming tables from different schemas or even databases into a single view or table that describes a business system or process.

Once you’ve modeled the data, it’s easier to derive value from it, either in the form of reports or dashboards or as a basis for predictive or prescriptive analytics. A dbt project is just a directory of SQL and YAML files that dbt uses to transform your data. The YAML file contains project configuration information. Each model contains a single SQL SELECT statement that transforms data in some way.

SQL has long been the language of choice for analysts developing transformations and performing analysis. Therefore, the decision for dbt to have a model file be a SQL SELECT statement means analysts don’t need to learn a new language or tool and can freely transform data the same way they have previously. Using SQL makes for easy collaboration and a better understanding of your transformations.

What are dbt Packages?

dbt Packages are standalone dbt projects, which take advantage of the projects’ modularity to make modules that tackle a specific problem area available for future use, much like a code library. These packages contain pre-built SQL, which drastically reduces the need to hand-code your transformations.

Fivetran develops and maintains open-source dbt Packages which model data from single schema and multi-schema data sources. These Fivetran dbt Packages leverage data modeling principles and code modularity to effectively transform data to be used for exploratory, predictive or prescriptive analytics. For more information on the Fivetran dbt packages and how to use them, you can refer to the dbt hub.

The YAML files are not only used to configure your dbt project, but are also used to curate a data dictionary of your source and transformed data as well as apply data integrity tests. If used effectively, the final result of your dbt project will have a definition of each raw table from the source and transformed table from your dimensional schema, as well as a description of each field within those tables. Further, you can apply simple tests to certain fields (i.e., a primary key field should always be unique) and test your data to ensure there are no data integrity issues.

Models are modular, so you can reproduce a transformation across multiple projects. Any developer can tell you how valuable code reuse is in terms of saving time and effort. A project may also include test, macro and documentation files.

Is dbt an ETL?

ETL describes a three-step process where you extract data from different sources, transform it to make it ready for analytics and load it into a data repository.

Is dbt an ETL (Extract, Transform, Load) tool?

No, but you can think of dbt as the “T” in ETL. That is, the tool transforms data that already exists in your data warehouse. Its only function is to compile code to SQL and execute it against your database.

dbt is a data transformation tool, but it works with Fivetran, so you can use dbt as part of your end-to-end ELT processes.

We began integrating dbt transformations with Fivetran. The feature, called Fivetran Transformations, integrates dbt Core directly into Fivetran, so customers can orchestrate the cleaning, testing, transformation, modeling and documentation of data sets with the same tool they use to extract and load the data. 

Fivetran Transformations for dbt Core are available for the following:

We created a setup guide to make it easy to start taking advantage of Fivetran Transformations.

A promising future for dbt

Data analytics is a hot business segment — witness the buzz around companies such as Snowflake and Databricks. dbt Labs has grown dramatically since it was founded in 2016 by colleagues who worked together at RJMetrics. In June of 2021, the company raised $150 million in a Series C round.

The company was a first mover among data transformation products, but it’s not the only one out there. A younger company called Dataform was a direct competitor until it was snapped up by Google Cloud late last year. 

Similarly, analytics software Looker offers data modeling through its LookML language, but Looker was acquired by Google Cloud too in 2019. Other products do some of what dbt does, but few offer the full range of capabilities of dbt in a package that works in just the way data engineers like to work.

Get started with dbt

If you’re interested in learning more about dbt, dbt Labs has tons of resources, including its own getting started documentation and links to example projects.

Here are the basic steps to getting started with dbt:

  1. Set up a warehouse (e.g., BigQuery, Redshift or Snowflake)
  2. Connect the warehouse to dbt Cloud
  3. Add a Git repository
  4. Execute a dbt transformation
  5. Schedule a job or transformation

While you can try dbt on your own, it’s more powerful as part of the complete data pipeline that Fivetran provides. In addition to the core Fivetran data integration capabilities, Fivetran also provides Fivetran Transformations powered by dbt which allows you to connect a dbt project to your Fivetran account and orchestrate your end-to-end ELT pipelines all from within Fivetran.

Watch the video below to learn more about Fivetran Transformations for dbt Core:

<iframe width="560" height="315" src="https://www.youtube.com/embed/ro3vSC1c_Jk" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

Stay tuned for more information on updates and enhancements to Fivetran Transformations as more features will be added in the future. Consider the case of Ritual, where the data team used dbt in conjunction with Fivetran to replace scattered Python scripts and LookML files. Ritual used the automated testing and data freshness capabilities of dbt to drive a 68% increase in new feature development.

Conclusion

dbt has gained widespread popularity. It allows data analysts to own the “transform” step in the ETL process, helping them transform the data they need to support their analysis. Plus, it supports modern cloud data warehouses like BigQuery, Redshift and Snowflake. Sign up for a free Fivetran trial and see how easy it is to connect to more than 150 data sources, extract data, load it into a data warehouse or data lake, transform it with Fivetran Transformations and use it to enhance your business.

Topics
Share

Related posts

No items found.
No items found.
No items found.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.