The data analytics stack has several layers: extract, load, transform, analyze. Businesses have multiple software alternatives they can use to let citizen analysts extract and load data and run analytics, but the transformation phase still requires specialized expertise. To transform data, you need a data engineer or database administrator who knows how relational databases work — but that transformation work doesn’t have to be a manual process that a DBA creates anew with every project.
dbt is a data transformation and modeling tool from dbt Labs, formerly Fishtown Analytics. dbt is an open-source, SQL-based tool that simplifies data transformation and stands out for its ability to expedite the process of transforming data and building data pipelines. Users model their data using SQL SELECT statements, create relationships and dependencies between models, and then materialize those models as tables and views in a data warehouse. From there it’s a simple task to turn the models into business intelligence.
What is data modeling?
As it applies to data analytics, data modeling is the process of imposing order on data from diverse sources. That means transforming tables from different schemas or even databases into a single view or table that describes a business system or process. Once you’ve modeled the data, it’s much easier to derive value from it, either in the form of reports or dashboards or as a basis for predictive or prescriptive analytics.
A dbt project is a directory of SQL and YAML files used to transform your data. The YAML file contains project configuration information. Each model contains a single SQL SELECT statement that transforms data in some way.
SQL has long been the language of choice for analysts developing transformations and performing analysis. Therefore, the decision for dbt to have a model file be a SQL SELECT statement means analysts don’t need to learn a new language or tool and can freely transform data the same way they have previously. Using SQL makes for easy collaboration and a better understanding of your transformations.
dbt packages are standalone dbt projects, which take advantage of the projects’ modularity to make modules that tackle a specific problem area available for future use, much like a code library. These packages contain pre-built SQL, drastically reducing the necessity to hand-code transformations. Fivetran develops and maintains open-source dbt packages known as data models that model data from single schema and multi-schema data sources. These Fivetran dbt packages leverage data modeling principles and code modularity to effectively transform data to be used for exploratory, predictive, or prescriptive analytics. Learn more about Fivetran data models and how to use them here.
The YAML files are not only used to configure your dbt project, but are also used to curate a data dictionary of your source and transformed data as well as apply data integrity tests. If used effectively, the final result of your dbt project will have a definition of each raw table from the source and transformed table from your dimensional schema, as well as a description of each field within those tables. Further, you can apply tests to certain fields (ie. a primary key field should always be unique) and test your data to ensure there are no data integrity issues.
Models are modular, so you can reproduce a transformation across multiple projects. Any developer can tell you how valuable code reuse is in terms of saving time and effort. A project may also include test, macro, and documentation files.
dbt has two main products. dbt Core™ is a free, open source tool that includes everything technical teams need to transform data. It provides you with the ability to roll your own ELT pipelines, SQL compilation logic, Jinja templating, database adapters, testing frameworks and documentation.
dbt Cloud™ is their paid product and is great for teams that want to move quickly in a highly coordinated way. It is fast and reliable, has a user-friendly IDE, allows for job scheduling, centralized logging and alerting, and provides CI/CD with Slim CI, among other things.
When assessing which dbt product is right for your organization, consider your team structure and skills. No matter your choice, dbt will help you modernize your approach to data transformations.
Is dbt an ETL?
Is dbt an ETL (extract, transform, load) tool? No, dbt is a data transformation tool. But dbt works with Fivetran, so you can use dbt as part of your end-to-end ELT processes.
In 2020, we launched our integration with dbt and our ever expanding library of available data models. Fivetran’s integration with dbt Core let's data teams orchestrate, manage, and monitor data transformations from the same tool they use to extract and load the data, at no extra charge. And, we enhance the modeling experience with fully integrated scheduling and data lineage graphs, providing teams with increased efficiency and automation.
But, to make the data transformation process even easier, we’ve recently launched Quickstart data models, which enable you to turn your connector data into analytics-ready tables in just a few clicks. No dbt project, 3rd party tools, or coding needed.If you’re interested in learning more about dbt, dbt Labs has tons of resources, including its own getting started documentation and links to example projects.
A promising future
Data analytics is a hot business segment — witness the buzz around companies such as Snowflake and Databricks. dbt Labs has been growing like crazy since it was founded in 2016 by colleagues who first worked together at a company called RJMetrics. In June of 2021, the company raised nearly $150 million in a Series C.
The company was a first mover among data transformation products, but it’s not the only one out there. A younger company called Dataform was a direct competitor until it was snapped up by Google Cloud in 2020. More recently, Coalesce.io entered the space with a drag-and-drop GUI approach to data transformations. While the space continues to evolve, dbt still remains a top choice of those looking to infuse data engineering best practices into their approachable data transformation processes.
You can try dbt on your own, but it’s more powerful as part of the complete data pipeline that Fivetran provides. Consider the case of Ritual, where the data team used dbt in conjunction with Fivetran to replace scattered Python scripts and LookML files. Ritual used the automated testing and data freshness capabilities of dbt to drive a 68 percent increase in new feature development.
Or SpotOn, who were able to turn thousands of lines of manual data modeling code into scale and generate their necessary client-facing reporting 5x faster. They now have the tools they need to meet their current needs and grow without investing in additional headcount.
Sign up for a free Fivetran trial, and see how easy it is to connect to more than 300 data sources, extract data, load it into a data warehouse or data lake, transform it with Fivetran Transformations and use it to enhance your business.