Terraform and the modern data stack as code

Create and maintain dev, test and prod modern data stack deployments with Fivetran’s Terraform provider.
January 26, 2022

The modern data stack is a suite of tools and processes used for data integration with a data warehouse for analysis. Recent developments and technologies have made it easier than ever to get started building a modern data stack. However, this increased ease-of-use has increased a data warehouse's ability to change in unchecked and unintended ways. Terraform by HashiCorp is an open-source project that provides infrastructure-as-code and can implement change management of the modern data stack. Terraform can be used with Fivetran to separate a modern data stack into different deployment environments that ensure the changes or additions at a data source level are verified before publication in a production data warehouse. Let’s look at the different ways to manage a modern data stack as code and segment it into dev/test/prod environments with Terraform.

Change management with dev/test/prod environments

In software development, it is popular to maintain multiple deployments of the same technology to facilitate change management, with three different environments (a development environment, a testing environment and a production environment) being the standard. With this setup, new features and functionalities are created first in a development environment, or dev. These changes get tested in a separate environment test. Once sufficiently tested, changes can move to an externally facing production environment, prod. Multicloud, container and infrastructure-as-code technologies have extended this three environment standard into a 3X one, with many organizations expanding this standard template to maintain multiple instances of their dev/test/and prod environments that might be across clouds and geographies, furthering the need to implement Terraform to quickly and easily create and manage the cloud resources among them all.

The same process can be applied to the modern data stack, a concept called data as code. As a data warehouse matures, it can become a very dynamic environment, with changes in source schemas and their resulting transformations as well as new data sources being brought in all the time. By managing data and data pipelines as code, an organization can gain more visibility on these changes, track their lineage and reproduce them and verify their intended outcomes before they reach production.

Implementation options

There are 2 different ways to separate modern data stack environments based on which data abstraction layer is being used to separate the deployments. When performed at the logical level, dev, test, and prod environments are all stored within the same data warehouse. All that is needed to extend a single modern data stack into multiple ones with this level of abstraction is Terraform. Terraform and the Fivetran Terraform provider can be used to duplicate Fivetran connectors and differentiate their schema names and user permissions within a single data warehouse depending on the environment. Terraform then maintains a single destination with multiple versions of the same connector in separate logical deployments, each with their own Terraform state. If you would like to separate your Fivetran environments as well, Terraform can be used to create different Fivetran groups for each deployment and the connector names for each of them can be the same (or different). 

Separation can also happen at the physical level with a different data warehouse used for each deployment. This can also be achieved by adding Terraform and the Fivetran Terraform provider as well, but multiple data warehouses are also needed. Terraform can create these data warehouses and add them as a destination in Fivetran, giving the option to maintain multiple modern data stacks across clouds or cloud projects. Once each destination is created and associated with Fivetran, Terraform can add and manage a single version of the same connectors for each data source for each data warehouse. Right now we are seeing most users opt for the former and keep every modern data stack in the same warehouse, but either of these options can be created with Terraform in three main ways: within a module, with workspaces, or with repositories.

Differentiation within module

The most straightforward way to separate modern data stacks would be to create all the necessary resources for each one in a single module. The deployments can then be separated explicitly by their resource and schema names. This is the fastest way to get started, but it does not take advantage of Terraform’s modular architecture and their benefits as modern data stacks grow. For that, either workspaces or repositories are needed.

Differentiation with workspaces

With Terraform, infrastructure resources can be grouped together as workspaces that operate independently of each other. Each Terraform project starts with a single workspace, called “default,” that is used to manage infrastructure resources. As the number of resources Terraform is responsible for continues to grow, new workspaces can be created to better organize resources and limit the need to rewrite code. The same code can be used to duplicate infrastructure resources into different workspaces, with Terraform maintaining every workspace’s state individually. 

Multiple modern data stacks can be created by leveraging Terraform’s workspaces. Workspaces can be created with the command: terraform workspace new <workspace-name> and selected with the command: terraform workspace select <workspace-name>.

Terraform can then create resources that are unique to each workspace but are duplicates across workspaces by referencing terraform.workspace in the creation of these resources as shown in the  image below. 

A terraform module to create a Fivetran connector (left) that will be named according to the workspace it is created in. The creation of a new workspace (right) and a plan to add a new connector with a dynamically named schema.

This can be used to create Fivetran connectors (or groups of Fivetran connectors) for each deployment that send data to different schemas and/or warehouses. Fivetran will track the state of each connector in these environments and Terraform will monitor the state of each Fivetran resource. Workspaces can be used with both versions of Terraform: Terraform open source and Terraform Cloud – the difference being workspaces in Terraform Cloud will also provide additional governance by allowing separate individuals or teams to access a particular workspace (read more on The Recommended Terraform Workspace Structure and the differences between these two types of workspaces).

This setup can be used to both logically and physically separate modern data stacks. It is an easy to manage option because there is only one repository with code to manage, but it may be difficult to implement differentiation between the environments.

Differentiation with repositories

Environments can be made more flexible if they are broken out into separate repositories, or maintained in different sub-repositories. Each environment can work off of a default repo and be extended to include the specific resources, variables and configurations needed for a particular deployment. For a modern data stack, this would include the creation of duplicate Fivetran connectors for each deployment, with their name changes between environments and/or created in separate groups. These changes among repos and their deployments can be managed automatically with a CI/CD tool like GitHub Actions. This type of implementation works well to separate modern data stack deployments at the physical level that was mentioned earlier, where a production data warehouse may be on a different cloud, project, or vpc network than a development environment. It can work in logically separated environments referencing the same data warehouse as well. 

Each environment operating out of its own environment will cause Terraform to maintain each deployment’s state independently.

Implementing with the Fivetran Terraform provider

There are a number of other ways to separate modern data stacks into different environments and an equally different number of reasons to do so. For instance, these same concepts can be applied to build and manage distinct modern data stacks per client for organizations that manage multiple. Regardless of how and why a modern data stack needs to be split up, doing so with Terraform allows these changes to happen as code, where scale can be achieved effectively and managed efficiently. 

The HashiCorp Terraform verified provider for Fivetran can create Fivetran connectors, destinations, users and groups used to build any number of modern data stacks, which can then be organized into separate environments with Terraform. Terraform. There are a number of different ways to do so and code examples for each discussed in this post can be found here

Check out the Terraform Registry to create your modern data stack deployments with Fivetran’s Terraform provider.  

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.