Introducing the Fivetran dbt package for GitHub

Gain insight into the state of code issues and the complete software delivery process.
May 27, 2020

For an overview of how dbt powers advanced transformations, and information about our other dbt packages, take a look at this recent blog.

Dbt package for GitHub

Our dbt package for GitHub helps you to better track the state of issues, pull requests and their related assignments in order to increase velocity for codebase updates. The packages make use of the Fivetran Github connector which enables the package to directly ingest all of the data passed through the GitHub API to join these disparate tables for:

  • Enriching GitHub issues with their assignees and time to completion
  • Time metrics attached to pull requests to track life cycles from creation, to review, to merge
  • A weekly, monthly, and quarterly overview of your opened and closed issues and pull requests

The modeling and transformation package’s outputs can help organizations solve for common engineering challenges, such as:

  • Whether there’s a disproportionate issue to assignee ratio
  • Establishing a potential “cliff” timeline for a pull request to fall through the cracks
  • High-level pull request completion tracking, by week, month and quarter.
  • Determining the average time taken in each stage of a pull request to forecast an issue completion timeline

Challenges of the GitHub API

The GitHub API splits out contextual information about issues and pull requests, such as assignees and history, into various endpoints, which makes it easier to define your API requests to target the exact information that you’re looking for, but harder to join the data for analytics.

How Fivetran helps

Our native GitHub connector automatically brings in data about issues, pull requests and their corresponding contextual information in a pre-defined format (see Fivetran’s documentation for the GitHub schema) that makes it easy to start querying your data right away. By continuing to replicate your GitHub data into your centralized data warehouse at a frequency that you dictate, and using the  provided dbt package, you’ll be able to better track and optimize your development team’s efficiency. Use GitHub as a standalone source or combine this data with common project tracking software, such as Jira or Asana, to provide your organization insight into the complete software delivery process.

Next steps

Get the dbt package for GitHub: This does advanced modeling, i.e., data transformations, dependencies, and target table creation. The primary outputs of this package are described below. Intermediate models are used to create these output models:

  • github_Issues: Each record represents a GitHub issue, enriched with data about its assignees, milestones, and time comparisons
  • github_pull requests: Each record represents a GitHub pull request, enriched with data about its repository, reviewers, and durations between review requests, merges and reviews
  • github metrics: Each record represents enriched metrics about PRs and issues that were created and closed during day, week, month, or quarter periods.

Note, this dbt package is dependent upon the dbt source package for GitHub. The source package will automatically download when you download the dbt package for GitHub. The source package is for lightly cleansing the data, defining tables and columns, and testing your source data.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Product
Product

Introducing the Fivetran dbt package for GitHub

Introducing the Fivetran dbt package for GitHub

May 27, 2020
May 27, 2020
 Introducing the Fivetran dbt package for GitHub
Gain insight into the state of code issues and the complete software delivery process.

For an overview of how dbt powers advanced transformations, and information about our other dbt packages, take a look at this recent blog.

Dbt package for GitHub

Our dbt package for GitHub helps you to better track the state of issues, pull requests and their related assignments in order to increase velocity for codebase updates. The packages make use of the Fivetran Github connector which enables the package to directly ingest all of the data passed through the GitHub API to join these disparate tables for:

  • Enriching GitHub issues with their assignees and time to completion
  • Time metrics attached to pull requests to track life cycles from creation, to review, to merge
  • A weekly, monthly, and quarterly overview of your opened and closed issues and pull requests

The modeling and transformation package’s outputs can help organizations solve for common engineering challenges, such as:

  • Whether there’s a disproportionate issue to assignee ratio
  • Establishing a potential “cliff” timeline for a pull request to fall through the cracks
  • High-level pull request completion tracking, by week, month and quarter.
  • Determining the average time taken in each stage of a pull request to forecast an issue completion timeline

Challenges of the GitHub API

The GitHub API splits out contextual information about issues and pull requests, such as assignees and history, into various endpoints, which makes it easier to define your API requests to target the exact information that you’re looking for, but harder to join the data for analytics.

How Fivetran helps

Our native GitHub connector automatically brings in data about issues, pull requests and their corresponding contextual information in a pre-defined format (see Fivetran’s documentation for the GitHub schema) that makes it easy to start querying your data right away. By continuing to replicate your GitHub data into your centralized data warehouse at a frequency that you dictate, and using the  provided dbt package, you’ll be able to better track and optimize your development team’s efficiency. Use GitHub as a standalone source or combine this data with common project tracking software, such as Jira or Asana, to provide your organization insight into the complete software delivery process.

Next steps

Get the dbt package for GitHub: This does advanced modeling, i.e., data transformations, dependencies, and target table creation. The primary outputs of this package are described below. Intermediate models are used to create these output models:

  • github_Issues: Each record represents a GitHub issue, enriched with data about its assignees, milestones, and time comparisons
  • github_pull requests: Each record represents a GitHub pull request, enriched with data about its repository, reviewers, and durations between review requests, merges and reviews
  • github metrics: Each record represents enriched metrics about PRs and issues that were created and closed during day, week, month, or quarter periods.

Note, this dbt package is dependent upon the dbt source package for GitHub. The source package will automatically download when you download the dbt package for GitHub. The source package is for lightly cleansing the data, defining tables and columns, and testing your source data.

Related blog posts

No items found.
No items found.
How the modern data stack powers real-time decisions at CHS
Blog

How the modern data stack powers real-time decisions at CHS

Read post
Introducing Alation’s Fivetran OCF connector
Blog

Introducing Alation’s Fivetran OCF connector

Read post
Fivetran celebrates triple win with tech partner awards
Blog

Fivetran celebrates triple win with tech partner awards

Read post

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.