Every modern data stack needs a modern approach to data transformation. In fact, this is often the most vexing part for companies. Forty-four percent of companies in our 2021 State of Data Engineers Survey agreed. They miss out on insights because key data isn't in the right format. As the responsibility of modeling shifts onto data analysts and analytics engineers, teams are looking for scale. So, how do you optimize your modern data practices to more efficiently and effectively turn your data into insights?
It starts with a modern data stack. In order to complete the “T” in ELT, you need data ingestion, data storage and data transformation tools. Let’s dive into a possible stack and see how you can write faster data models with Fivetran’s Wizard for dbt Core™*.
Fivetran: Automated, modern data ingestion
You first need access to your data. Fivetran, Google Cloud’s 2021 Global Technology Partner of the Year, gives you easy, reliable access to pre-built, fully managed connectors, using change data capture (CDC) technology. Fivetran connects your data sources and moves that data into your cloud destinations or targets. To modernize this approach, we also bake in as much automation, scale and security as possible along the way to allow you to truly let go of all of the work associated with building and managing data pipelines.
We then load (and subsequently scan for incremental updates using system columns for applications or transaction logs for database sources) that data into your data warehouse. One leader in the modern, cloud data warehouse is Google Cloud's BigQuery.
BigQuery: A fully managed, modern data warehouse
BigQuery is Google Cloud’s fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery lets you use SQL to query terabytes in seconds and petabytes in minutes.
It’s an ideal platform for processing data transformations as well as complex reporting requirements. Once Fivetran has loaded data into BigQuery, you can use a modern data transformation tool, like dbt (data build tool), to clean and transform your data. This allows you to prepare it for BigQuery’s advanced analytics features like Geospatial Analysis or Predictive Analytics.
dbt: Modern data transformation, at scale
Most data teams agree that SQL is the de facto language of data modeling. But SQL data modeling leaves room for improvement. Code is often written in silos by individuals, without version control or collaboration, making it difficult to scale or QA.
Many data teams have adopted dbt to accomplish data modeling at scale. dbt is based on SQL and allows data professionals to build data models iteratively and automate data transformation. dbt is built for scale and includes engineering best practices like:
- Reusable macros accelerate time to complete data models
- Version control ensures you have an auditable system of record and the ability to revert undesired changes
- Auto-documentation increases visibility so stakeholders can understand the available data
dbt provides two main ways to use their technology: dbt Core and dbt Cloud. Both are built with analytics engineering best practices. dbt Core is accessed and run via your command line interface (CLI), whereas dbt Cloud is a hosted, browser-based integrated development environment. For those without the technical bandwidth to manage their own infrastructure, dbt Cloud provides the development, scheduling and deployment tools that enable the entire data organization to be part of the modeling process.
With all of these technologies in place, you can use your data to drive business impact. But, we recognized that there was still room for optimization for those working in the command line with dbt. That’s why we built the Wizard for dbt Core™.
Fivetran’s Wizard for dbt Core™ extension for Visual Studio Code
To help facilitate the data modeling process and to ensure you can take full advantage of the data made accessible by Fivetran, we created the free Wizard for dbt Core™ extension for Visual Studio Code. This extension helps analysts write more efficient models in the CLI for your data in BigQuery. We will be adding compatibility for more data warehouses in the near future. Available in the VSCode marketplace, the Wizard for dbt Core™ helps you optimize your modeling by providing: auto-completion of macro, model and source references; error highlighting and suggestions; SQL to ref conversion; and installation of dbt packages with the click of a button.
Auto-completion of macros, models and sources
The Wizard recognizes the macros, models and sources in your project, allowing you to more quickly complete your model. It also helps you with function signatures. If you hover over a SQL function, you can see the definition and required parameters. This ensures you write the most accurate and efficient code.
If you do happen to misspell a column or table name, the Wizard will not only highlight it but suggest fixes. This allows you to quickly remedy issues — reducing QA and accelerating time to complete. SQL syntax errors are also highlighted for your review.
SQL to ref
If a hardcoded table name can be converted to a dbt ref, the Wizard will identify that SQL code and convert it with the click of a button. This streamlines your code, reducing room for error while future-proofing. If you do need to make a change, you only have to change the ref model rather than hunting down instances of the SQL code in your model.
Installation of dbt packages
And these are just four of the advantages of using the Wizard for dbt Core™. You can read more about the features and download the extension here.
A modern data stack: Fivetran + dbt + BigQuery
By combining the always-on access to data of Fivetran, with the scalable modeling of dbt, the ease of use of the Wizard for dbt Core™, and the processing power of BigQuery, you can spend more time on data-driven decisions and less time on data pipeline processes with this modern approach to data.