How data teams can jumpstart investment in data infrastructure

Whether your organization’s growth is led by product, marketing or sales, small proofs of concept can springboard you towards better data infrastructure
June 9, 2023

If you’re part of a data team who wants reliable, powerful data infrastructure with tools like Snowflake, Fivetran, dbt, Monte Carlo, Looker and Hex, but are struggling to justify the cost to CFOs in a down market—this guide is for you. 

The solution sounds obvious: create data practices that tangibly drive ROI, and link the practices to infra spending. But the problem is a chicken-and-egg situation. Absent great infrastructure tools, how can data prove its worth?

The approach that worked for me as a data scientist at Airbnb and Webflow may sound unintuitive to a young data team building a brand of trust. You need to stand up clear money making activities, even if they’re built on low quality data and manual csv pulls. In a new world of hyperactive CFO scrutiny, use cases worth investment come before the infrastructure required to make them robust.

With that in mind, this post will cover:

  • Which data practices are revenue-driving
  • How to stand up those data practices with minimal infrastructure (with examples)

Data practices that drive revenue

Data practices that drive revenue include all work that enables an organization to a) increase its earnings or b) control its costs.

In my experience, cost control is a fine use of data resources, but costs are highly specific to the business. Airbnb’s biggest cost in its growth years was customer support. Stitch Fix’s business cost was supply chain and inputs. Webflow didn’t have much cost at all, as it was purely technology SaaS, so we didn’t do much data work on cost control. If you find an opportunity for data to lower costs through better customer support routing or supply chain management, by all means invest in it.

But this post is primarily about growth, which can be divided into three* categories:

  1. Product-led growth
  2. Marketing-led growth
  3. Sales-led growth

Each of these growth channels comes with their own opportunities for data to directly drive metrics and promote growth. I’ll quickly summarize each growth mechanism and then go over fast ways to implement metric-driving data practices.

*Note that there are other growth motions like affiliate-led, B2B2C, or partnering with retailers who sell your product. This post will focus on these big three.

Product-led growth (PLG)

Biggest ROI lever: AB experiments in product

The easiest way to know if you are product-led is to figure out how many customers you acquired without them ever talking to a sales rep or encountering a paid advertisement. Companies with primarily organic, self-serve customers include Airbnb, Webflow and Figma.

In each case, customers who reach particular milestones (”aha moments”) in the product tend to bring on more customers. We found that Airbnb travelers who actually went on a stay tended to continue using Airbnb, would expose others to Airbnb when traveling together, and would talk about their Airbnb vacations to friends. At Webflow, designers who finished, published and shared their first website would start to use Webflow with other clients, and use it for their portfolios.

Data teams have a huge lever for ROI in product-led growth: tie every product change to driving “aha” moments via AB experiments. Experimentation is so powerful in PLG, because:

  • The counter-factual of ROI is clear. Empirically, only 20-30% of product changes improve metrics and “aha” moments. Another 20-30% actively worsen these metrics. Without running experiments, product teams would launch the bad products and wouldn’t recognize the good ones.
  • Experimentation analysis is the one avenue to link metrics to product decisions that removes all caveats. The magic of randomizing users controls for seasonality, macro conditions and contemporaneous product/marketing initiatives.
  • Data teams can change decisions and improve metrics at scale, creating repeatable workflows for setting up, analyzing and reporting results.

While tools like Eppo can leap you forward to enterprise-grade experimentation, you can get started with the simplest experiment method:

  1. Randomize users into groups based on the last digit of a subject’s user_id, using a tool of choice. This is a fine way to start but is not advisable once you’ve run more than a handful of experiments.
  2. If you’re up for it, randomize with a hashing function like python’s hashlib.sha1 , passing in concat(user_id, experiment_name) and converting the result to an integer.
  3. Have an engineer expose the new product change to one group and the old product to another group.
  4. Tabulate your business’ most important metrics for each group. To keep things extra simple (and avoid outlier issues, more complicated statistical treatment), use simple numerical metrics like “users making a purchase” vs. “total purchases by user.
  5. Put the numbers into an online calculator of z-scores, such as Evan Miller’s.

Marketing-led growth

Biggest ROI lever: Channel optimization

Marketing spend exists in nearly every company. Some companies are “marketing-led” where the majority of users find the product via digital ads. Examples include ecommerce companies who primarily grow through Instagram ads, or companies like Stitch Fix after their initial word-of-mouth growth spurt.

Whether marketing-led or just marketing-boosted, data teams can reliably make the company money with marketing support, because:

  • Marketing spend is typically large, such that 5-10% improvements add up to significant sums
  • Calculating lifetime value (LTV) and payback periods typically requires curated lifecycle data that’s unavailable in marketing tools. At Webflow, the only source of lifetime value was Stripe data, which was only available in Snowflake.
  • If there’s never been prior data work, 5-10% efficiency improvements can usually be found with simple comparisons of spend and lifetime value, split by common business segments

The trick with marketing optimization is that only a few channels will be pliable right away. Click-through channels like Google search ads are usually a good place to start, and you can get SEO with the same work.

This dbt blog post on modeling attribution is a great introduction to getting a model set up quickly to tag users by their initial channel touch point. From here, data workers can use standard metrics and tables to understand how much revenue these users accumulate each month. Compare this tabulated revenue for a given week/month cohort to the spend, and there’s likely a set of channels to retarget, grow further or spin down.

Marketing channel optimization is another case with a clear counter-factual. Absent the data team providing a deeper view of lifecycle spend, low value marketing channels continue to look good and spend company dollars.

Sales-led growth

Biggest ROI lever: Data enrichment

Although sales teams are often versed in finding their way around tools like Salesforce to derive the insights they need, they’re often less capable of producing insights with data that sits outside of these tools. However, data teams can help sales teams create more efficient processes, maximize new deal volume and size and minimize churn by utilizing this data and making it accessible to the sales team.

Generally speaking, some of the highest value work a data team can do for a sales team is to enrich all of the lead, opportunity and account data sitting in tools like Salesforce. This process typically involves building data models within a data warehouse and then syncing these models to tools like Salesforce. This is done very easily with tools like Hightouch, which allows you to link columns in a data model to fields within Salesforce and merge them using a unique identifier.

So, for example, if you want to enrich lead data within Salesforce, you’d sync your raw lead data to your data warehouse. Once there, you would build a model that contains valuable data points for every unique email in your lead pool. Once built, you’d simply tell Hightouch to sync this model to lead objects within Salesforce using an email address as the unique identifier. This general pattern can be applied to any and all objects within Salesforce, from leads to accounts, although the unique identifiers would obviously vary for each object.

So, syncing models to tools like Salesforce is easy. But what are some examples of helpful models? One extremely common model for pre-sales is a “lead scoring” model. These models typically combine a lead’s behavioral data on the marketing website with other important characteristics known about the lead, such as company size or sector, to provide an estimate on this lead’s likelihood to convert and/or their potential LTV. With these models, sales teams can better prioritize and route these leads to the proper members of their teams to maximize efficiency and revenue. Smaller, simpler deals go to more junior members of the team while larger, more complex deals flow to more experienced members of the team.

On the post-sales side of things, “account health scores” are one of the most valuable models a data team can develop early on for their sales stakeholders. These models provide a weekly or monthly score quantifying the health of an account. This score is typically derived from an index that utilizes user behavioral data to automatically analyze if users are behaving like other healthy customers on multiple dimensions. These scores provide account managers with insightful, timely context on their accounts. Account managers are alerted of any negative changes in an account’s health score and intervene soon after.

Tying it all together

The power of analytics depends on a high volume and quality of data. By demonstrating value using simple, manual methods and a very modest amount of data, you enable your CFO and other gatekeepers to picture the possibilities of making such analytics projects repeatable and greatly expanding the scope, scale and sophistication of your data operations, enabling both faster turnaround for projects and richer, higher-quality analysis.

With time, and proven worth, you’ll open up the opportunity to invest in building a modern data stack to vastly improve the ROI possibilities. 

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

How data teams can jumpstart investment in data infrastructure

How data teams can jumpstart investment in data infrastructure

June 9, 2023
June 9, 2023
How data teams can jumpstart investment in data infrastructure
Whether your organization’s growth is led by product, marketing or sales, small proofs of concept can springboard you towards better data infrastructure

If you’re part of a data team who wants reliable, powerful data infrastructure with tools like Snowflake, Fivetran, dbt, Monte Carlo, Looker and Hex, but are struggling to justify the cost to CFOs in a down market—this guide is for you. 

The solution sounds obvious: create data practices that tangibly drive ROI, and link the practices to infra spending. But the problem is a chicken-and-egg situation. Absent great infrastructure tools, how can data prove its worth?

The approach that worked for me as a data scientist at Airbnb and Webflow may sound unintuitive to a young data team building a brand of trust. You need to stand up clear money making activities, even if they’re built on low quality data and manual csv pulls. In a new world of hyperactive CFO scrutiny, use cases worth investment come before the infrastructure required to make them robust.

With that in mind, this post will cover:

  • Which data practices are revenue-driving
  • How to stand up those data practices with minimal infrastructure (with examples)

Data practices that drive revenue

Data practices that drive revenue include all work that enables an organization to a) increase its earnings or b) control its costs.

In my experience, cost control is a fine use of data resources, but costs are highly specific to the business. Airbnb’s biggest cost in its growth years was customer support. Stitch Fix’s business cost was supply chain and inputs. Webflow didn’t have much cost at all, as it was purely technology SaaS, so we didn’t do much data work on cost control. If you find an opportunity for data to lower costs through better customer support routing or supply chain management, by all means invest in it.

But this post is primarily about growth, which can be divided into three* categories:

  1. Product-led growth
  2. Marketing-led growth
  3. Sales-led growth

Each of these growth channels comes with their own opportunities for data to directly drive metrics and promote growth. I’ll quickly summarize each growth mechanism and then go over fast ways to implement metric-driving data practices.

*Note that there are other growth motions like affiliate-led, B2B2C, or partnering with retailers who sell your product. This post will focus on these big three.

Product-led growth (PLG)

Biggest ROI lever: AB experiments in product

The easiest way to know if you are product-led is to figure out how many customers you acquired without them ever talking to a sales rep or encountering a paid advertisement. Companies with primarily organic, self-serve customers include Airbnb, Webflow and Figma.

In each case, customers who reach particular milestones (”aha moments”) in the product tend to bring on more customers. We found that Airbnb travelers who actually went on a stay tended to continue using Airbnb, would expose others to Airbnb when traveling together, and would talk about their Airbnb vacations to friends. At Webflow, designers who finished, published and shared their first website would start to use Webflow with other clients, and use it for their portfolios.

Data teams have a huge lever for ROI in product-led growth: tie every product change to driving “aha” moments via AB experiments. Experimentation is so powerful in PLG, because:

  • The counter-factual of ROI is clear. Empirically, only 20-30% of product changes improve metrics and “aha” moments. Another 20-30% actively worsen these metrics. Without running experiments, product teams would launch the bad products and wouldn’t recognize the good ones.
  • Experimentation analysis is the one avenue to link metrics to product decisions that removes all caveats. The magic of randomizing users controls for seasonality, macro conditions and contemporaneous product/marketing initiatives.
  • Data teams can change decisions and improve metrics at scale, creating repeatable workflows for setting up, analyzing and reporting results.

While tools like Eppo can leap you forward to enterprise-grade experimentation, you can get started with the simplest experiment method:

  1. Randomize users into groups based on the last digit of a subject’s user_id, using a tool of choice. This is a fine way to start but is not advisable once you’ve run more than a handful of experiments.
  2. If you’re up for it, randomize with a hashing function like python’s hashlib.sha1 , passing in concat(user_id, experiment_name) and converting the result to an integer.
  3. Have an engineer expose the new product change to one group and the old product to another group.
  4. Tabulate your business’ most important metrics for each group. To keep things extra simple (and avoid outlier issues, more complicated statistical treatment), use simple numerical metrics like “users making a purchase” vs. “total purchases by user.
  5. Put the numbers into an online calculator of z-scores, such as Evan Miller’s.

Marketing-led growth

Biggest ROI lever: Channel optimization

Marketing spend exists in nearly every company. Some companies are “marketing-led” where the majority of users find the product via digital ads. Examples include ecommerce companies who primarily grow through Instagram ads, or companies like Stitch Fix after their initial word-of-mouth growth spurt.

Whether marketing-led or just marketing-boosted, data teams can reliably make the company money with marketing support, because:

  • Marketing spend is typically large, such that 5-10% improvements add up to significant sums
  • Calculating lifetime value (LTV) and payback periods typically requires curated lifecycle data that’s unavailable in marketing tools. At Webflow, the only source of lifetime value was Stripe data, which was only available in Snowflake.
  • If there’s never been prior data work, 5-10% efficiency improvements can usually be found with simple comparisons of spend and lifetime value, split by common business segments

The trick with marketing optimization is that only a few channels will be pliable right away. Click-through channels like Google search ads are usually a good place to start, and you can get SEO with the same work.

This dbt blog post on modeling attribution is a great introduction to getting a model set up quickly to tag users by their initial channel touch point. From here, data workers can use standard metrics and tables to understand how much revenue these users accumulate each month. Compare this tabulated revenue for a given week/month cohort to the spend, and there’s likely a set of channels to retarget, grow further or spin down.

Marketing channel optimization is another case with a clear counter-factual. Absent the data team providing a deeper view of lifecycle spend, low value marketing channels continue to look good and spend company dollars.

Sales-led growth

Biggest ROI lever: Data enrichment

Although sales teams are often versed in finding their way around tools like Salesforce to derive the insights they need, they’re often less capable of producing insights with data that sits outside of these tools. However, data teams can help sales teams create more efficient processes, maximize new deal volume and size and minimize churn by utilizing this data and making it accessible to the sales team.

Generally speaking, some of the highest value work a data team can do for a sales team is to enrich all of the lead, opportunity and account data sitting in tools like Salesforce. This process typically involves building data models within a data warehouse and then syncing these models to tools like Salesforce. This is done very easily with tools like Hightouch, which allows you to link columns in a data model to fields within Salesforce and merge them using a unique identifier.

So, for example, if you want to enrich lead data within Salesforce, you’d sync your raw lead data to your data warehouse. Once there, you would build a model that contains valuable data points for every unique email in your lead pool. Once built, you’d simply tell Hightouch to sync this model to lead objects within Salesforce using an email address as the unique identifier. This general pattern can be applied to any and all objects within Salesforce, from leads to accounts, although the unique identifiers would obviously vary for each object.

So, syncing models to tools like Salesforce is easy. But what are some examples of helpful models? One extremely common model for pre-sales is a “lead scoring” model. These models typically combine a lead’s behavioral data on the marketing website with other important characteristics known about the lead, such as company size or sector, to provide an estimate on this lead’s likelihood to convert and/or their potential LTV. With these models, sales teams can better prioritize and route these leads to the proper members of their teams to maximize efficiency and revenue. Smaller, simpler deals go to more junior members of the team while larger, more complex deals flow to more experienced members of the team.

On the post-sales side of things, “account health scores” are one of the most valuable models a data team can develop early on for their sales stakeholders. These models provide a weekly or monthly score quantifying the health of an account. This score is typically derived from an index that utilizes user behavioral data to automatically analyze if users are behaving like other healthy customers on multiple dimensions. These scores provide account managers with insightful, timely context on their accounts. Account managers are alerted of any negative changes in an account’s health score and intervene soon after.

Tying it all together

The power of analytics depends on a high volume and quality of data. By demonstrating value using simple, manual methods and a very modest amount of data, you enable your CFO and other gatekeepers to picture the possibilities of making such analytics projects repeatable and greatly expanding the scope, scale and sophistication of your data operations, enabling both faster turnaround for projects and richer, higher-quality analysis.

With time, and proven worth, you’ll open up the opportunity to invest in building a modern data stack to vastly improve the ROI possibilities. 

Related blog posts

Product thinking for data teams
Data insights

Product thinking for data teams

Read post
 How to go from spreadsheets to a modern data stack with limited resources
Data insights

How to go from spreadsheets to a modern data stack with limited resources

Read post
6 steps to building world-class data operations
Data insights

6 steps to building world-class data operations

Read post
No items found.
Modern data architecture allows you to have your cake and eat it, too
Blog

Modern data architecture allows you to have your cake and eat it, too

Read post
Four approaches to data infrastructure modernization
Blog

Four approaches to data infrastructure modernization

Read post
The modern data stack: How companies differentiate on data
Blog

The modern data stack: How companies differentiate on data

Read post

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.