How ChatGPT helps us write better dbt™ data models

How our analytics engineering team uses ChatGPT to write the most efficient dbt packages for your most common analytics use cases.

We recently did a deep dive into how we turn our Fivetran connector loaded data into analytics-ready tables in your warehouse with our dbt data models. We are now revealing how we actually create those data models.

And the secret? While the data models take a lot of manual work and collaboration with you all - our customers and peers - some of that work has been scaled using ChatGPT. That’s right, the nascent AI tool that’s revolutionizing business and dominating the national conversation also helps us model data.

[CTA_MODULE]

Here are a few ways we’ve been able to use ChatGPT to expedite our dbt data model creation:

Documentation and finding the descriptions of columns

When developing new - and updating existing - data models we spend a large amount of time documenting the tables and fields from the raw data layer. This is a cumbersome, research-intensive process.

However, with the advent of AI we have been able to streamline this process by generating a significant amount of source data definitions (and even data integrity tests) by interacting with ChatGPT. We are able to feed data source schemas into ChatGPT directly and ask it to generate a dbt flavored yml that documents the defined tables and fields. ChatGPT is able to generate this yml by leveraging the source destinations API docs. 

A process that would take us significantly longer in the past is now automated to a few minutes. We reallocate time from researching and design to validating results. This allows us to spend more time working on the data transformations and understanding the raw data as opposed to meticulously combing through the API documentation.

Building python scripts to help automate maintenance efforts

AI is very helpful when you have a basic knowledge of a topic but lack expertise to complete specific tasks without a fair amount of research. AI can assist in that research and complete it in a fraction of the time. 

For example, I wanted to use the GitHub API to duplicate a GitHub issue into multiple repositories that all suffered from the same bugs; duplicating the GitHub issues would ensure that the recurring problems were flagged wherever they were present. I generally knew how APIs functioned, but had no experience with GitHub’s. I also had limited experience with creating API scripts.

This presented me with a roadblock. To tackle this use case, I would need to do research and improve my base skills. I would need to read the API documentation, learn about any necessary packages, learn syntax, troubleshoot and iterate my code until I found success.

Instead, I turned to ChatGPT. With AI, I was able to ask simple questions and end up with code I could start testing within a few minutes. I could also ask follow-up questions if I didn’t get expected results, got an error or didn’t understand an output. For example, when I received a “403” api error, ChatGPT was able to tell me what the error meant as well as suggest how to remedy it.

Aid when building complex sql transformations

Many folks may be familiar with the rubber duck debugging philosophy. This is a method where you have a rubber duck on your desk and you explain your code to the duck. While explaining the code you are able to hit points of realization where you need to adjust or update your code to be more accurate and performant. 

I have always been a firm believer in this method, but lately I have adopted a new ChatGPT equivalent which I call “robot duck debugging.” With this, I treat my sessions with ChatGPT as I would my very real rubber duck on my desk. I go through my code and explain it in detail and also highlight what I am trying to achieve with the result. 

The robot duck then provides insights and oftentimes highlights problematic areas of my code. For example, it once called out that one of my joins would likely result in a cartesian join (which it did) and I should consider a different approach to ensure data validity in my transformation. 

The robot duck even once told me that my approach was unfounded and I should consider some new approaches to achieve my desired result. I asked it then what kind of approaches I should take and after some back and forth I arrived at a desirable conclusion. One I had not considered before robot duck debugging. 

In examples like the two I mentioned, ChatGPT is extremely valuable to work through complex code and solutions. It is important to note that ChatGPT can sometimes jump the gun and suggest some solutions that are not practical. I have found correcting the robot duck and coaching it to stay on track to be most effective. I have been able to unlock some new approaches simply because I have had the robot duck to accompany me. 

But, I still talk to my very real rubber duck from time to time.

Understand business outcomes of specific data models

When building out new data models, our analytics team conducts interviews with our customers to understand the business analytics use case for their source data. Because of everyone’s busy schedules, we are only able to interview a few customers. 

Fortunately, ChatGPT is able to help us scale additional business knowledge. With a simple query, we can leverage the scope of the entire history of the internet to understand common pain points and questions others have experienced when leveraging a certain data source. 

For example, when building our latest Twilio dbt data model we were struggling to understand exactly what types of answers customers wanted to answer with the data. ChatGPT revealed that customers frequently want to analyze the breakdown of all messages ever sent to uncover trends and cost over time. Similarly, aggregating data by phone number and account is key for any Twilio analysis.

Help with cross database compatibility

We want to ensure that the data models we make are helpful for all organizations - regardless of the data warehouse you deploy. When building models, there are nuanced differences between the warehouses that dbt doesn’t have a function like dbt.type to automatically handle. Warehouses using JSON data have similar ways of processing the data, but the syntax varies between them. 

So, when modeling our models, we can ask ChatGPT, “How can I make the below code compatible with Databricks?” While it might not give the exact answer, it points us in the right direction.

To operationalize this, we fed our Fivetran Utils macro into ChatGPT and asked it to create a dispatch compatible with a given warehouse. For example, you can take any Fivetran Utils macro and copy the contents of the code and add it to your ChatGPT session and request it provide a new dispatch for a new adapter. 

Using AI to help you get the best analytics-ready tables

Generative AI like ChatGPT is set to become a fantastic productivity aide for anyone who creates content for a living – code, text, images and more. While ChatGPT cannot yet make data models for us from scratch, they certainly help us more efficiently and effectively release and update models for your most needed use cases. These models are free and available to everyone. You can download and deploy them from our library here.

And while AI is helpful, your feedback is invaluable. We would love to hear from you! What data models do you want to see us create? What PRs would improve our existing models? You can submit these ideas in GitHub or start a conversation with us in Fivetran’s Community.

[CTA_MODULE]

Kostenlos starten

Schließen auch Sie sich den Tausenden von Unternehmen an, die ihre Daten mithilfe von Fivetran zentralisieren und transformieren.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

How ChatGPT helps us write better dbt™ data models

How ChatGPT helps us write better dbt™ data models

August 1, 2023
August 1, 2023
How ChatGPT helps us write better dbt™ data models
How our analytics engineering team uses ChatGPT to write the most efficient dbt packages for your most common analytics use cases.

We recently did a deep dive into how we turn our Fivetran connector loaded data into analytics-ready tables in your warehouse with our dbt data models. We are now revealing how we actually create those data models.

And the secret? While the data models take a lot of manual work and collaboration with you all - our customers and peers - some of that work has been scaled using ChatGPT. That’s right, the nascent AI tool that’s revolutionizing business and dominating the national conversation also helps us model data.

[CTA_MODULE]

Here are a few ways we’ve been able to use ChatGPT to expedite our dbt data model creation:

Documentation and finding the descriptions of columns

When developing new - and updating existing - data models we spend a large amount of time documenting the tables and fields from the raw data layer. This is a cumbersome, research-intensive process.

However, with the advent of AI we have been able to streamline this process by generating a significant amount of source data definitions (and even data integrity tests) by interacting with ChatGPT. We are able to feed data source schemas into ChatGPT directly and ask it to generate a dbt flavored yml that documents the defined tables and fields. ChatGPT is able to generate this yml by leveraging the source destinations API docs. 

A process that would take us significantly longer in the past is now automated to a few minutes. We reallocate time from researching and design to validating results. This allows us to spend more time working on the data transformations and understanding the raw data as opposed to meticulously combing through the API documentation.

Building python scripts to help automate maintenance efforts

AI is very helpful when you have a basic knowledge of a topic but lack expertise to complete specific tasks without a fair amount of research. AI can assist in that research and complete it in a fraction of the time. 

For example, I wanted to use the GitHub API to duplicate a GitHub issue into multiple repositories that all suffered from the same bugs; duplicating the GitHub issues would ensure that the recurring problems were flagged wherever they were present. I generally knew how APIs functioned, but had no experience with GitHub’s. I also had limited experience with creating API scripts.

This presented me with a roadblock. To tackle this use case, I would need to do research and improve my base skills. I would need to read the API documentation, learn about any necessary packages, learn syntax, troubleshoot and iterate my code until I found success.

Instead, I turned to ChatGPT. With AI, I was able to ask simple questions and end up with code I could start testing within a few minutes. I could also ask follow-up questions if I didn’t get expected results, got an error or didn’t understand an output. For example, when I received a “403” api error, ChatGPT was able to tell me what the error meant as well as suggest how to remedy it.

Aid when building complex sql transformations

Many folks may be familiar with the rubber duck debugging philosophy. This is a method where you have a rubber duck on your desk and you explain your code to the duck. While explaining the code you are able to hit points of realization where you need to adjust or update your code to be more accurate and performant. 

I have always been a firm believer in this method, but lately I have adopted a new ChatGPT equivalent which I call “robot duck debugging.” With this, I treat my sessions with ChatGPT as I would my very real rubber duck on my desk. I go through my code and explain it in detail and also highlight what I am trying to achieve with the result. 

The robot duck then provides insights and oftentimes highlights problematic areas of my code. For example, it once called out that one of my joins would likely result in a cartesian join (which it did) and I should consider a different approach to ensure data validity in my transformation. 

The robot duck even once told me that my approach was unfounded and I should consider some new approaches to achieve my desired result. I asked it then what kind of approaches I should take and after some back and forth I arrived at a desirable conclusion. One I had not considered before robot duck debugging. 

In examples like the two I mentioned, ChatGPT is extremely valuable to work through complex code and solutions. It is important to note that ChatGPT can sometimes jump the gun and suggest some solutions that are not practical. I have found correcting the robot duck and coaching it to stay on track to be most effective. I have been able to unlock some new approaches simply because I have had the robot duck to accompany me. 

But, I still talk to my very real rubber duck from time to time.

Understand business outcomes of specific data models

When building out new data models, our analytics team conducts interviews with our customers to understand the business analytics use case for their source data. Because of everyone’s busy schedules, we are only able to interview a few customers. 

Fortunately, ChatGPT is able to help us scale additional business knowledge. With a simple query, we can leverage the scope of the entire history of the internet to understand common pain points and questions others have experienced when leveraging a certain data source. 

For example, when building our latest Twilio dbt data model we were struggling to understand exactly what types of answers customers wanted to answer with the data. ChatGPT revealed that customers frequently want to analyze the breakdown of all messages ever sent to uncover trends and cost over time. Similarly, aggregating data by phone number and account is key for any Twilio analysis.

Help with cross database compatibility

We want to ensure that the data models we make are helpful for all organizations - regardless of the data warehouse you deploy. When building models, there are nuanced differences between the warehouses that dbt doesn’t have a function like dbt.type to automatically handle. Warehouses using JSON data have similar ways of processing the data, but the syntax varies between them. 

So, when modeling our models, we can ask ChatGPT, “How can I make the below code compatible with Databricks?” While it might not give the exact answer, it points us in the right direction.

To operationalize this, we fed our Fivetran Utils macro into ChatGPT and asked it to create a dispatch compatible with a given warehouse. For example, you can take any Fivetran Utils macro and copy the contents of the code and add it to your ChatGPT session and request it provide a new dispatch for a new adapter. 

Using AI to help you get the best analytics-ready tables

Generative AI like ChatGPT is set to become a fantastic productivity aide for anyone who creates content for a living – code, text, images and more. While ChatGPT cannot yet make data models for us from scratch, they certainly help us more efficiently and effectively release and update models for your most needed use cases. These models are free and available to everyone. You can download and deploy them from our library here.

And while AI is helpful, your feedback is invaluable. We would love to hear from you! What data models do you want to see us create? What PRs would improve our existing models? You can submit these ideas in GitHub or start a conversation with us in Fivetran’s Community.

[CTA_MODULE]

Data models are created through data transformation. Learn more with our free ebook!
Read me
Not familiar with Fivetran?
Start a 14 day free trial today

Verwandte Beiträge

Fivetran & dbt: The essential duo for modern analytics
Data insights

Fivetran & dbt: The essential duo for modern analytics

Beitrag lesen
How we execute dbt™ runs faster and cheaper
Data insights

How we execute dbt™ runs faster and cheaper

Beitrag lesen
Behind the data model: How we transform your data
Data insights

Behind the data model: How we transform your data

Beitrag lesen
No items found.
How Fivetran ensures GDPR compliance and protects your data
Blog

How Fivetran ensures GDPR compliance and protects your data

Beitrag lesen
Data replication tools: Comparing Fivetran’s deployment options
Blog

Data replication tools: Comparing Fivetran’s deployment options

Beitrag lesen
Fivetran Product Update: September 2024
Blog

Fivetran Product Update: September 2024

Beitrag lesen

Kostenlos starten

Schließen auch Sie sich den Tausenden von Unternehmen an, die ihre Daten mithilfe von Fivetran zentralisieren und transformieren.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.