On paper, it’s easy to create a perfectly optimized data stack with a warehouse, ETL, reverse ETL, data modeling, data visualization and observability. In reality, the same stack often ends in friction between data and business teams.
The rising popularity of centralized, petabyte-scale data warehousing over the past decade has pushed data teams into maintaining a tough balancing act: Enforce strict-enough governance to mitigate complete chaos across the enterprise while making progress toward the dream of business user self-service.
In many modern data organizations, data teams often over-index governance (or want to) in an effort to minimize data chaos and excessive storage and computing needs for one-off views. But locking down data so much that you create a black box of access (and visibility) only inflates the number of meetings (read: hours) data teams have to sit through with stakeholders to help non-technical users see the value of data and align on metrics.
This interdisciplinary game of resource tug-of-war is one of the most common challenges we see in modern data stack (MDS) companies. Data teams don't want a backlog of tickets and requests to implement in the warehouse. Business users want the ability and privileges to feed data into their downstream applications without relying on a technical user. Rather than streamlining operations between units, this difference in needs often impedes progress and productivity across the organization.
If we sit too far on either side of the data governance spectrum here, we run the risk of wasting personnel hours fixing data problems either way. This then begs the question: How do we strike a balance between data governance and business user self-service?
In this article, I'll dive deeper into why this problem happens, how it’s unique to MDS companies and what an ideal, Goldilocks future state might look like for safely loosening data governance requirements just enough to foster collaboration between data and business teams.
Just because data teams love it doesn’t mean business users do
While the list of tools loved by data teams has grown exponentially these last few years, many of these same tools are at the root of stakeholder frustration.
This is actually somewhat by design. While data teams are tasked with making the lives of their stakeholders as frictionless as possible, the same tools data teams love inherently restrict how well business users can actually access data (for good reasons). Tools, such as dbt and Looker, have gained immense popularity among data engineers for three key reasons:
- They make it easy to centralize data definitions/transformations in SQL.
- The data models are version controlled. Any changes to the agreed-upon definitions need to go through a review and approval process.
- Their models can easily be leveraged in downstream workflows. For instance, Looker has in-house dashboards, alerts and embedded reports all tying back to the centralized data model. dbt materializes its tables and data marts directly in the warehouse to be used downstream.
All these shiny features may make the data team happier, but they don’t necessarily make the business users' lives easier. While more centralized data definitions and standardized tables give data teams the ability to safely re-slice and dice queries downstream in Tableau or Looker’s Explore, it is impossible to predict manual changes business users need to make to the data model. This often means more process and review, making it harder and harder for business users to edit definitions and deploy changes, instead relying on requests to the data team (and the request cycle floods again).
All this back and forth just leads to the original problem of data teams and business users needing more meetings to define and agree on data definitions. And it leaves us wondering, “When is enough, enough?”
Technical challenges posed by data governance controls
This one will probably feel counterintuitive: Too much data governance can often create tech challenges on the way to solving data quality issues.
Let’s take a marketing use case as an example. Marketers often want to segment users by demographics and filtered metrics based on a “golden record.” The simplest use case can already be done in just about every analytics tool and the info is typically already included in the “user” table.
But the more complex the filters (and the tools needed to create them), the more technical challenges marketing folks face. If a marketer wants to segment records based on total metrics from a transactions table (e.g. total dollars, total orders, total clicks), the data team would need to aggregate the transactions table grouped by user and then join it to the “user” table. Sounds easy, right?
It is – until the marketer wants to create or edit a metric themselves (e.g. adding tax to total dollars). To fully capture these changes in the data model, the data team has to make the update to the “user fact” table, along with all the associated reviews and approvals processes.
Alternatively, the user might just circumvent the data model altogether, make the changes themselves in Google Sheets – and we’re back to square one with data chaos. And if the marketer wants to take it a step further and create a segment based on a filtered metric (e.g. total dollars between Black Friday and Cyber Monday), there’s even more detail lost in the aggregation process.
TL;DR: Data folks cannot predict every iteration of a metric that a business user will request. The marketing folks often don’t even know what level of detail is possible from the data because so much of it is lost when aggregating to the user level.
So, what’s the solution?
We cannot stress this enough: There is no one-size-fits-all solution. The correct balance between data governance and end-user self-service is going to vary from company to company. But you can find where you should be on the sliding scale by categorizing governance needs into foundational, intermediate and advanced.
Foundational: Starting on one end of the sliding scale, the data team services all requests manually, writes all queries, creates the dashboards and reports etc. At the foundational stage, everything – and we mean everything – is both tracked and governed in dbt. This establishes a baseline for the data’s single source of truth.
Intermediate: Slowly walk this control back and open the door to self-service. Allow users to start filtering out underlying records. If that goes off without a hitch, try letting them do lightweight changes by combining metrics. From there, see if they can safely create filtered metrics. Most of these final outputs aren’t going to be captured in the “official” data governance process, but they are still created under the umbrella of the approved data model.
Advanced: On the far opposite end of the sliding scale is allowing end users to start making changes to the actual data model and metric definitions in their downstream tools (e.g. one-off CSV uploads to Tableau or exports to Google Sheets). These changes completely circumvent the data governance layer and need a strict review process to ensure permanent changes are captured upstream. Most companies might never get to this stage, and that’s completely okay! Not every company has the culture or personnel to facilitate this effectively (or the actual need).
At each step, there needs to be a feedback loop to make sure downstream changes that aren’t captured in the governance layer don’t compromise the data integrity. If there are issues at any step in the process, determine if it’s a “people problem” or a tool problem. For instance, when it comes to differences in how the data is modeled and how specific fields and metrics are defined, it’s more of a people problem than it is a tool problem.
Of course, there are some products that push the needle closer toward the business user while keeping a balance with data governance. For example, Sigma and Omni give BI folks a bit more freedom, whereas Census segmentation is the go-to for Data Activation.
The future of modern data management is a balance
As the data landscape and the requests of the data team evolve, modern data management will continue to be a balancing act.
While too little governance is much worse than too much governance, the data team’s ability to successfully serve data stakeholders will rely on finding the sweet spot between the two. Doing so ensures data accuracy, consistency and security while enabling users to access and analyze data without relying too much on the data team.
Yes, this means more meetings with your stakeholders. But rather than these meetings coming about when something breaks, they should be proactive conversations and collaborations around the metrics business users need most, which ones they can answer via self-service tools and which ones the data team can help them with (all framed inside responsible data governance).
Truthfully, finding the right balance isn’t really a tool problem, it’s a people problem. Thankfully, the data community is larger than ever. If you’re looking for a way to connect with folks solving similar problems to you, come say hi in The Operational Analytics Club, our dedicated community for mid- to senior-level data professionals.
Looking to attend this year's Modern Data Stack Conference on April 4-5, 2023? Register with the discount code MDSCON-CENSUSBLOG to get 20 percent off by March 31, 2023!