How Sigma Computing built its best-in-class modern data stack

Turns out if you’re working hard just to keep the data flowing, you won’t get much else done. Fivetran, dbt™ and Sigma change that.
February 28, 2023

My philosophy about data product development is simple — work hard and your work will pay off. If you use tools that help with the hard work, your work will pay off even more. The work being done to extract away pain for data pipelines by Fivetran, the transformative work being done by dbt to allow for robust and trusted data models, and the collaborative approach to analytics at Sigma, I believe, has the ability to finally get data teams and stakeholders alike in the same state — happy and successful. But it wasn’t always this way.

Hear from Sigma Computing at this year's Modern Data Stack Conference on April 4-5, 2023. Register with the discount code MDSCON-SIGMABLOG to get 20 percent off by March 15, 2023!

The challenge of using legacy tools

My first foray into the world of analytics was about ten years ago as a data analyst using legacy tools like Microsoft Access and Excel. I transitioned to data engineering at a Fortune 50 company and first experienced the monolithic legacy stack common at large enterprises, creating data products using Teradata and Informatica. The challenges with those tools are well known. Business-as-usual went something like this:

  • The enterprise data warehouse was offline from 10am-2pm most days to allow batch data loading jobs time to complete.
  • Most data was — at best — one day old, but more commonly, 2 days old. Data product development from ideation to implementation took several months. Not to mention, my day-to-day was rather mundane, waiting for data to load and writing custom SQL to help make my team successful.
  • Code changes required saving .txt files and sending them off for various rounds of review (leading to the long path to production mentioned above).

Fast forward to 2018, and the ways of working fundamentally changed. I was fortunate to join a startup that was using Snowflake and Fivetran and were in the early stages of onboarding a new transformation tool called dbt. With Fivetran, our core data engineering team was able to focus on specialized pipelines since core data from a variety of sources flowed through without a hitch at whatever time interval we specified. This greatly improved our team’s throughput. dbt further supercharged our workflow, allowing our analytics team to ideate, collaborate and build trusted data products in a way that wasn’t possible before.

Snowflake’s powerful Data Cloud allows us to essentially limitless storage and compute without having to manage infrastructure or resources. By leveraging managed pipelines in Fivetran and version-controlled transformations in dbt, we were able to do what once took months, in just a few hours.

We were able to reallocate hundreds of hours from pipeline build and maintenance to more high impact projects, like rebuilding our customer-facing data products, refactoring existing high priority data models for the entire company and reducing our overall Snowflake compute footprint by 30 percent.

Turns out if you’re working hard just to keep the data flowing, you won’t get much else done. Fivetran and dbt changed that.

Build vs. buy for the modern data stack

In 2022, I joined Sigma Computing and was given the opportunity to design and implement a data platform on a modern data stack. Like many data architects, I was initially faced with the biggest decision – build or buy.

As a one-person team, my decision matrix was relatively straightforward.

Build Buy
Compute and storage Provisioning cloud compute infrastructure on a data lake Snowflake-managed storage and compute
Extract and load Custom infrastructure to support 30+ data sources Fivetran and spend about five hours on total configurations
Transformation Build custom infra using dbt core Build and orchestrate in dbt Cloud
Observability Build custom scripts to run anomaly detection Use metaplane for their rich, automated anomaly detection
Visualization Sigma

The cost benefit analysis here is straightforward: Spend time building and maintaining the infrastructure, or plug into best-in-class tools within a few hours. With these tools in our toolkit, we are able to focus on what matters — building data products for the company and enabling everyone to use Sigma effectively. 

The data team’s key directive is to build data models and workbooks that give our organization and customers access to fast and reliable reporting. The modern data stack we chose made it all possible.

Beyond the modern data stack

It’s easy to say that the future of the MDS is a new tool that does something unique, but I have a different take. Up to this point, the hurdles have been:

  1. There are no resources to build and maintain analytics infrastructure.
  2. The time to go from data model → insightful workbook is longer than a stakeholder would prefer.

While the modern data stack has improved the data team’s efficiency, that doesn’t mean there isn’t still room for evolution, especially as it relates to time to insight for the stakeholder. Sure, the time it takes to ingest from raw data into your warehouse has been expedited, and for many sources this is a solved problem. But there is still no easy way to spin up reporting on top of this data.

That’s why I’m so excited about Fivetran’s new Quickstart data models. At Sigma, we’ve been using Fivetran for our data ingestion needs and monitor the health, status and cost of these connectors. Fivetran has always made it easy to view the metadata about their connectors on the Fivetran Log Connector schema, and we relied heavily on custom dbt models to make the metrics and information contained therein report ready. With this new Quickstart data model, though, we were able to set up a data model built for the Fivetran Log Connector data in just a few clicks.

This model turns the metrics and dimensions about usage and compute that we need into analytics-ready tables, allowing us to quickly understand the Fivetran component of our data platform. The metrics and transform logic are validated by Fivetran and guaranteed to be updated when changes to the underlying data structures occur.

The Quickstart data model gives us a way to ingest, land and model data with less overhead than ever before. Within an hour of activating the feature, we had a fully-fledged Sigma workbook showing the health and usage of all of our Fivetran connectors with automated alerts to monitor for unexpected events.

Our team was so thrilled with the insights we’d made that we decided to bring this functionality to every Sigma user with a new Fivetran Usage Template. With this template, shared knowledge of modeling and visualization of Fivetran Log Connector data is easy. By leveraging Fivetran’s Quickstart data model, any organization can go from no data to complete stakeholder insights in just a matter of clicks, within an hour or less. That’s the fast, reliable reporting every team needs. This automation of end-to-end data product development really excites me, and I think we’ll continue to see tighter integrations across modern data stack tools in the coming years. These integrations will undoubtedly create a better experience for those that matter most, our stakeholders.

The biggest opportunity is getting stakeholders closer to data and creating a better understanding of what is available and represented in the data warehouse. Of course, I strongly believe that Sigma is best positioned to solve this, and I’m excited about the work we have done and what lies ahead on our roadmap.

There are, of course, challenges ahead, but would it be any fun if there weren’t?

Start your free trial with Sigma Computing today.

Hear from Sigma Computing at this year's Modern Data Stack Conference on April 4-5, 2023. Register with the discount code MDSCON-SIGMABLOG to get 20 percent off by March 15, 2023!

[CTA_MODULE]

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

How Sigma Computing built its best-in-class modern data stack

How Sigma Computing built its best-in-class modern data stack

February 28, 2023
February 28, 2023
How Sigma Computing built its best-in-class modern data stack
Turns out if you’re working hard just to keep the data flowing, you won’t get much else done. Fivetran, dbt™ and Sigma change that.

My philosophy about data product development is simple — work hard and your work will pay off. If you use tools that help with the hard work, your work will pay off even more. The work being done to extract away pain for data pipelines by Fivetran, the transformative work being done by dbt to allow for robust and trusted data models, and the collaborative approach to analytics at Sigma, I believe, has the ability to finally get data teams and stakeholders alike in the same state — happy and successful. But it wasn’t always this way.

Hear from Sigma Computing at this year's Modern Data Stack Conference on April 4-5, 2023. Register with the discount code MDSCON-SIGMABLOG to get 20 percent off by March 15, 2023!

The challenge of using legacy tools

My first foray into the world of analytics was about ten years ago as a data analyst using legacy tools like Microsoft Access and Excel. I transitioned to data engineering at a Fortune 50 company and first experienced the monolithic legacy stack common at large enterprises, creating data products using Teradata and Informatica. The challenges with those tools are well known. Business-as-usual went something like this:

  • The enterprise data warehouse was offline from 10am-2pm most days to allow batch data loading jobs time to complete.
  • Most data was — at best — one day old, but more commonly, 2 days old. Data product development from ideation to implementation took several months. Not to mention, my day-to-day was rather mundane, waiting for data to load and writing custom SQL to help make my team successful.
  • Code changes required saving .txt files and sending them off for various rounds of review (leading to the long path to production mentioned above).

Fast forward to 2018, and the ways of working fundamentally changed. I was fortunate to join a startup that was using Snowflake and Fivetran and were in the early stages of onboarding a new transformation tool called dbt. With Fivetran, our core data engineering team was able to focus on specialized pipelines since core data from a variety of sources flowed through without a hitch at whatever time interval we specified. This greatly improved our team’s throughput. dbt further supercharged our workflow, allowing our analytics team to ideate, collaborate and build trusted data products in a way that wasn’t possible before.

Snowflake’s powerful Data Cloud allows us to essentially limitless storage and compute without having to manage infrastructure or resources. By leveraging managed pipelines in Fivetran and version-controlled transformations in dbt, we were able to do what once took months, in just a few hours.

We were able to reallocate hundreds of hours from pipeline build and maintenance to more high impact projects, like rebuilding our customer-facing data products, refactoring existing high priority data models for the entire company and reducing our overall Snowflake compute footprint by 30 percent.

Turns out if you’re working hard just to keep the data flowing, you won’t get much else done. Fivetran and dbt changed that.

Build vs. buy for the modern data stack

In 2022, I joined Sigma Computing and was given the opportunity to design and implement a data platform on a modern data stack. Like many data architects, I was initially faced with the biggest decision – build or buy.

As a one-person team, my decision matrix was relatively straightforward.

Build Buy
Compute and storage Provisioning cloud compute infrastructure on a data lake Snowflake-managed storage and compute
Extract and load Custom infrastructure to support 30+ data sources Fivetran and spend about five hours on total configurations
Transformation Build custom infra using dbt core Build and orchestrate in dbt Cloud
Observability Build custom scripts to run anomaly detection Use metaplane for their rich, automated anomaly detection
Visualization Sigma

The cost benefit analysis here is straightforward: Spend time building and maintaining the infrastructure, or plug into best-in-class tools within a few hours. With these tools in our toolkit, we are able to focus on what matters — building data products for the company and enabling everyone to use Sigma effectively. 

The data team’s key directive is to build data models and workbooks that give our organization and customers access to fast and reliable reporting. The modern data stack we chose made it all possible.

Beyond the modern data stack

It’s easy to say that the future of the MDS is a new tool that does something unique, but I have a different take. Up to this point, the hurdles have been:

  1. There are no resources to build and maintain analytics infrastructure.
  2. The time to go from data model → insightful workbook is longer than a stakeholder would prefer.

While the modern data stack has improved the data team’s efficiency, that doesn’t mean there isn’t still room for evolution, especially as it relates to time to insight for the stakeholder. Sure, the time it takes to ingest from raw data into your warehouse has been expedited, and for many sources this is a solved problem. But there is still no easy way to spin up reporting on top of this data.

That’s why I’m so excited about Fivetran’s new Quickstart data models. At Sigma, we’ve been using Fivetran for our data ingestion needs and monitor the health, status and cost of these connectors. Fivetran has always made it easy to view the metadata about their connectors on the Fivetran Log Connector schema, and we relied heavily on custom dbt models to make the metrics and information contained therein report ready. With this new Quickstart data model, though, we were able to set up a data model built for the Fivetran Log Connector data in just a few clicks.

This model turns the metrics and dimensions about usage and compute that we need into analytics-ready tables, allowing us to quickly understand the Fivetran component of our data platform. The metrics and transform logic are validated by Fivetran and guaranteed to be updated when changes to the underlying data structures occur.

The Quickstart data model gives us a way to ingest, land and model data with less overhead than ever before. Within an hour of activating the feature, we had a fully-fledged Sigma workbook showing the health and usage of all of our Fivetran connectors with automated alerts to monitor for unexpected events.

Our team was so thrilled with the insights we’d made that we decided to bring this functionality to every Sigma user with a new Fivetran Usage Template. With this template, shared knowledge of modeling and visualization of Fivetran Log Connector data is easy. By leveraging Fivetran’s Quickstart data model, any organization can go from no data to complete stakeholder insights in just a matter of clicks, within an hour or less. That’s the fast, reliable reporting every team needs. This automation of end-to-end data product development really excites me, and I think we’ll continue to see tighter integrations across modern data stack tools in the coming years. These integrations will undoubtedly create a better experience for those that matter most, our stakeholders.

The biggest opportunity is getting stakeholders closer to data and creating a better understanding of what is available and represented in the data warehouse. Of course, I strongly believe that Sigma is best positioned to solve this, and I’m excited about the work we have done and what lies ahead on our roadmap.

There are, of course, challenges ahead, but would it be any fun if there weren’t?

Start your free trial with Sigma Computing today.

Hear from Sigma Computing at this year's Modern Data Stack Conference on April 4-5, 2023. Register with the discount code MDSCON-SIGMABLOG to get 20 percent off by March 15, 2023!

[CTA_MODULE]

Start your 14-day free trial today!
Get started now
Topics
No items found.
Share

Articles associés

Five reasons to consider a modern data stack
Data insights

Five reasons to consider a modern data stack

Lire l’article
ETL vs. ELT: Why a post-load process wins every time
Data insights

ETL vs. ELT: Why a post-load process wins every time

Lire l’article
From application to insights with Quickstart data models
Product

From application to insights with Quickstart data models

Lire l’article
No items found.
No items found.

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.