“Reverse ETL is the process of copying data from a cloud data warehouse to operational systems of record, including but not limited to SaaS tools used for growth, marketing, sales and support.”
With the rise of the modern data stack, it’s become easier than ever for companies to build a single source of truth (SSOT) in the data warehouse.
- Data warehouses like Snowflake and BigQuery enable companies to store and query petabytes of data without thinking about complex infrastructure.
- Data integration solutions like Fivetran enable companies to replicate all their data into a warehouse without dealing with third-party APIs.
- Data transformation solutions like dbt enable companies to build clean, organized models in the data warehouse with just SQL.
However, it’s historically been really challenging to get data out of the warehouse and into the operational tools used by business teams, like Salesforce, Iterable, Marketo, and Facebook Ads. Ultimately, business teams are more apt to use these tools to solve their day-to-day problems than to look at reports in BI tools.
At Hightouch and Fivetran, we’ve seen hundreds of companies hit a barrier where their most important data is stuck in their warehouse. In this post, you'll learn about Reverse ETL, a new solution to the “last mile” problem in analytics of making data actionable.
Reverse ETL is the inverse of ETL/ELT solutions like Fivetran. Instead of moving data into the data warehouse, Reverse ETL moves data out of the warehouse into the operational tools that your business teams rely on.
ETL: Building a source of truth
The core premise for investing in ETL has long been analytics — extracting, transforming, and loading data into a warehouse to analyze it, build reports, and make business decisions.
ETL’s successor, ELT is the modern approach to ETL but the premise is still analytics, and data is still moving from the same sources to the same destination in the same direction.
If you’re looking for a comprehensive overview of various data integration technologies, including ETL, ELT, and Reverse ETL, this guide we recently published is just what you need to read.
The Circle of Data Integration
Reverse ETL: Operationalizing your source of truth
Reverse ETL completes the loop of data integration by copying data from the warehouse into systems of record to enable teams to finally act on the same data that has been powering all the beautiful reports they have been consuming.
But why make data actionable, and why does doing so require moving data back from the warehouse into the systems where it came from in the first place?
Why: Delighting customers
The core reason to make your data actionable is to build delightful, data-driven customer experiences.
It’s one thing to analyze data, derive insights into customer behaviour, and make data-informed decisions, like which features to kill or which channels to prioritize.
But it’s a different matter altogether to let data power customer experiences across every touchpoint. It’s an approach that requires accurate data to be available in downstream systems used by various teams to engage with customers. This method now has a widely accepted name: operational analytics.
Operational analytics refers to feeding insights from analytics to business teams in their usual workflow so they can make more data-informed decisions. What’s really interesting is that traditional analytics and the new kid on the block, operational analytics, rely on the same core data infrastructure.
The same data from the data warehouse that powers the reports in a BI tool can be operationalized or made actionable by syncing to downstream SaaS tools used by sales, marketing, and customer success teams.
Why now: Increasing expectations
The exchange between a buyer and a seller is probably as old as time. But never in history has the buyer had so many options and such high expectations from businesses.
At the same time, high-performing business teams today are also increasingly demanding better tools and resources to enable them to provide delightful customer experiences.
Gone are the days of blasting a hundred thousand emails with the exact same copy and raking it in. Today’s savvy buyer — whether buying a cup of coffee worth a few dollars or buying enterprise software worth thousands — expects a fluid, personalized experience. And data, particularly customer data, is what makes personalized experiences possible. With the increasing amount of customer data being collected and the decreasing costs of storing data in cloud data warehouses such as Snowflake, BigQuery, and Redshift, businesses already have what it takes to meet the expectations of demanding customers.
All a business needs to do is to take data models from the warehouse that power analysis in a BI tool and sync them to downstream SaaS tools used for growth, marketing, sales, and support. Once enriched customer data is made available in these tools, teams will have no excuse to continue flat, linear customer experiences, because they will have the data they require to do better in their preferred engagement tools.
Business teams are most able to provide better customer experiences when the required data is made available in the tools they rely on, whether that’s a CRM, advertising platform, ERP, or even workplace messenger like Slack. Moreover, since data is constantly changing and becoming stale, data requests (more like “data demands”) keep coming in.
Here are a few common examples of everyday data requests from various teams:
- Growth wants email interaction data from Customer.io in Mixpanel
- Sales wants to see credit consumption on contacts and accounts in Salesforce
- Customer Success wants product-usage data inside Gainsight
- Support wants to see data about accounts with premium support on Zendesk
- Product wants a Slack feed of customers who have seemingly become inactive
- Accounting wants customers’ attributes to be synced to NetSuite
- Finance wants a CSV of rolled up transaction data to use in Excel or Google Sheets
I could go on.
The data needed is already available in the data warehouse and SQL, the Swiss Army Knife of data manipulation, is all you need to extract data from the warehouse and sync it to external tools via Reverse ETL, making it the simplest solution to fulfilling these wants.
Now that you’re already sold (hopefully) on the Reverse ETL paradigm, it’s time to explore your tooling options.
How: Building vs. buying
If you’ve ever bought developer tools, you know there are always pros and cons to both buying a purpose-built solution and building one in-house.If you’re leaning toward the DIY camp, you can build an in-house Reverse ETL pipeline if you’re one of the few companies with spare data engineering resources. In fact, we spoke to the Engineering Team Lead at Datadog, Romoli Bakshi, who was kind enough to walk us through her process of doing this in-house using Luigi and Spark — here’s the full conversation.
However, if you prefer to spend that time solving your business’s unique data challenges rather than learning gotchas in third-party APIs, consider a tool that syncs data from your data warehouse to the tools that your business teams rely on using just SQL.
Here are some important features to consider:
- Just use SQL. No more coding, APIs, etc. Paste a SQL query or select a table and use a visual mapper to specify how data should look in tools like Salesforce.
- Robust integrations catalog. Consider a tool with an extensive catalog of battle-tested integrations for moving data into SaaS tools or internal systems (databases, webhooks, etc.).
- Better than building in-house. Quality-of-life features include a live debugger, detailed API request logs, a bidirectional git integration, and out-of-the-box monitoring via Slack, PagerDuty, DataDog, etc.
Alternatives to reverse ETL
Besides building a custom pipeline and investing in a purpose-built Reverse ETL tool like Hightouch, there are a couple of alternative solutions that come up time and again in the context of copying data into SaaS tools.
Integration platform (iPaaS)
iPaaS solutions include Tray, Workato, Zapier, Integromat, and many others.These integration platforms are incredibly powerful and offer a visual interface that anybody can use to stitch together different tools to move data between them. I say anybody because these tools don’t even require you to write SQL queries, bringing more flexibility for business folks.
So what’s the caveat?iPaaS solutions are meant for point-to-point integrations (e.g. directly connect Salesforce to HubSpot) and primarily solve workflow automation challenges. The event-based architecture of these solutions isn’t ideal for most operational analytics or data replication use cases.
While iPaaS solutions offer a visual interface to programming, it’s also still a visual interface to programming, meaning you hit all the same complexity and challenges as your integrations grow in complexity and scale.
Finally, these solutions lack the capability to build custom data models required to accurately sync data to downstream applications that expect data to arrive in a specific shape and format.
My colleague wrote a guide that provides an overview of all the data integration technologies — check it out if you’d like to learn more about iPaaS and how it differs from ETL, Reverse ETL, and CDP.
Customer data platform (CDP)
CDP vendors include Segment (Personas), mParticle, Lytics, Tealium, and many others.
Considering how large the CDP market is, many people assume that a Customer Data Platform is an answer to all data woes. To give credit where it’s due, there are several use cases where CDPs shine and my colleague even wrote an article covering why one might want to invest in a CDP. On the surface, the capabilities of a CDP may seem to solve some of the challenges of operational analytics and reverse ETL. However, the data model that a CDP is built upon is rigid and cannot handle more complex use cases that require custom data models. Like iPaaS solutions, most CDPs are also built for event processing rather than batch processing.
I actually helped build Segment Personas and am talking from first-hand experience. I also recently published another article on the Fivetran blog that makes a case for why your data warehouse should be your CDP. It covers the pros and cons of each approach.
While growing in importance, Reverse ETL is still a new category in the data space and like any hot category (looking at you, CDP), many companies and products will try to ride this wave. There is nothing wrong with competition, but the wide range of choice can make it harder for businesses to determine the right solution to easily solve their data challenges.
For a more in-depth look at Reverse ETL and what this new trend means for data teams building a modern data stack, check out our upcoming webinar on Reverse ETL.
If you want to start syncing data from your warehouse into 40+ SaaS tools with just SQL - no scripts, sign up at Hightouch.io. Every new account comes with a demo database, and it only takes a few minutes to get data flowing and recognize the value yourself.