Three perspectives on data governance

How to identify and resolve conflicting incentives and priorities over data governance
June 12, 2023

Data governance refers to the policies, processes, roles and technology that businesses use to ensure the availability, usability, integrity and security of data. It fundamentally consists of the following needs, each of which grows in difficulty and complexity as an organization grows its headcount and the volume, variety and velocity of its data:

  • Knowing data – maintaining a full inventory of all data and its context
  • Accessing data – ensuring that the appropriate parties in an organization have access to fresh, relevant data
  • Protecting data – eliminating misuse or unwanted exposure of data, including by parties within an organization

Within any organization, there are three interest groups who will prioritize these needs to differing degrees.

  1. Data controllers are the security and legal stakeholders responsible for regulatory compliance, especially in the face of continued regulatory changes. Their main goal is to protect company, employee and customer data by avoiding improper exposure and misuse of data. The worst case scenario for security and legal teams is to be the subject of an external audit with inadequate answers.
  2. Data consumers such as analysts want unrestricted access to data to facilitate their projects. They are frustrated by slow turnaround and stale data, and often end up finding ways to circumvent existing data service processes, accessing and producing data products in ways that may not be sanctioned. Left unchecked, these ‘rogue data teams’ produce spurious or duplicative data products – models, dashboards, reports and more – creating multiple and conflicting “sources of truth” with unclear provenance from ungoverned data, ultimately resulting in untrusted insights. 
  3. Data producers, typically teams of data engineers, are responsible for managing a growing queue of data programs. They are forced to balance the competing interests of the security and legal teams, on the one hand, and analysts on the other. For security and legal teams, they must have monitoring capabilities and answers for audits. For analysts, they must be able to access and explore new data sets to serve the needs of their business units.

The access vs. compliance tradeoff

Without the assistance of highly capable data governance tools, organizations are often forced to choose between access and compliance. You can situate the three parties previously described on a spectrum:

Many organizations err in the direction of compliance, as the stakes grow with the business. It is common for businesses to impose limitations on access, creating necessary safeguards to control authorized data access. They may centralize authority and control over access to data, usually in central data teams which are sometimes supported by analysts or data engineers embedded in different business units or departments, i.e. a hub-and-spoke model. This model intentionally creates information silos to prevent leaks of sensitive data. The downside is that parties who need the data face longer project turnaround times and stale data.

Overcoming the access vs. compliance tradeoff

Overcoming this tradeoff and ensuring both access and compliance fundamentally comes down to finding some way to provide both visibility and control. This ultimately requires a technological solution that automates or programmatically manages access and compliance at scale. Simple, repeatable and robust processes managed through software can bring an organization to its ideal end state, wherein:

  • The concerns of the data controllers are fully addressed, with full visibility for security and legal to conduct internal and external audits. 
  • Data consumers have access to all, and only, the data they require and can self-serve with minimal turnaround times.
  • Data producers are able to implement and enforce policies for compliance and can scalably manage the onboarding and monitoring of their data programs as they mature. 

Let’s review the capabilities and features required to knowing, accessing and protecting data.

Knowing data

All three interest groups – data controllers, consumers and producers – have an interest in observing, knowing and understanding data. Security and legal teams need to audit incoming data and monitor access. Data engineers need to ensure that they are meeting their obligations to analysts and other stakeholders, particularly impact analysis of how upstream data pipeline changes might affect data models downstream. Data analysts need to understand the context of their data and what questions can and can’t be answered.

The technological solution to observing data consists of ensuring that all metadata is exposed in the process of moving data from source to destination as well as organizing all of the data into data catalogs. Specifically, this includes:

  • Exposing column-level data lineage in the form of graphs
  • Real-time metadata capture and logging of keys, tables, columns and data types
  • End-to-end audit trails logging all access, behaviors and changes to a data pipeline
  • A metadata API enabling programmatic management of data movement

Accessing data

Business users rely on fresh and relevant data to make decisions. In order to serve their business users, analysts depend entirely on their ability to access and explore new data. Data teams are the main gatekeepers to analysts, managing approval workflows for interested parties. As a business grows and scales its operations, the core data team can become a major bottleneck for data access and management. 

The solution to the challenge that is scaling your onboarding and access management without scaling your data team is to leverage technology and automation. Streamline and automate your user provisioning with integrated SCIM providers such as Okta and Azure AD. Quickly provide new users with the tools they need to perform their jobs and ensure past employees’ access is revoked on departure. Ensure new users have access to the specific data they need by automatically assigning access based on their team’s permissions. This obviates the need to manually create accounts and configure permissions, which can otherwise be prone to human error and delay.

Protecting data

Security and legal teams are mainly concerned with protecting data, and there is considerable overlap between data governance and security. Similarly, data teams directly manage access control and handle sensitive information.

Technological capabilities that are critical to protecting data include:

  • Compliance with laws and regulations across the relevant jurisdictions, such as SOC 1 and SOC 2, GDPR, HIPAA, ISO 27001, PCI DSS Level 1 and more
  • Role-based access control in order to finely control who can move, load and transform data
  • Blocking and hashing at the column level
  • Automated, centralized user provisioning with granular permissions based on roles
  • Automatic tagging and categorization of data, especially PII
  • Data residency restrictions
  • End-to-end encryption

You can have your cake and eat it, too

The purpose of technology is to multiply human capabilities in order to entirely bypass the ugly tradeoffs of the past. Data governance does not need to be a matter of either accepting data anarchy or clamping down so tightly that basic data projects become complicated. With the aid of specific technologies, your organization can know, access and protect their data while simultaneously satisfying the concerns of security and legal teams, data producers and data consumers alike.

Governed data movement

Traditionally, data governance has primarily been enforced in the data warehouse. But to expand data governance upstream, Fivetran is committed to delivering governed data movement to enforce data governance at the earliest stage in the ecosystem - as soon as it leaves the source. 

Learn more about how data movement platforms like Fivetran can help you solve these challenges with a demo or a trial.

[CTA_MODULE]

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

Three perspectives on data governance

Three perspectives on data governance

June 12, 2023
June 12, 2023
Three perspectives on data governance
How to identify and resolve conflicting incentives and priorities over data governance

Data governance refers to the policies, processes, roles and technology that businesses use to ensure the availability, usability, integrity and security of data. It fundamentally consists of the following needs, each of which grows in difficulty and complexity as an organization grows its headcount and the volume, variety and velocity of its data:

  • Knowing data – maintaining a full inventory of all data and its context
  • Accessing data – ensuring that the appropriate parties in an organization have access to fresh, relevant data
  • Protecting data – eliminating misuse or unwanted exposure of data, including by parties within an organization

Within any organization, there are three interest groups who will prioritize these needs to differing degrees.

  1. Data controllers are the security and legal stakeholders responsible for regulatory compliance, especially in the face of continued regulatory changes. Their main goal is to protect company, employee and customer data by avoiding improper exposure and misuse of data. The worst case scenario for security and legal teams is to be the subject of an external audit with inadequate answers.
  2. Data consumers such as analysts want unrestricted access to data to facilitate their projects. They are frustrated by slow turnaround and stale data, and often end up finding ways to circumvent existing data service processes, accessing and producing data products in ways that may not be sanctioned. Left unchecked, these ‘rogue data teams’ produce spurious or duplicative data products – models, dashboards, reports and more – creating multiple and conflicting “sources of truth” with unclear provenance from ungoverned data, ultimately resulting in untrusted insights. 
  3. Data producers, typically teams of data engineers, are responsible for managing a growing queue of data programs. They are forced to balance the competing interests of the security and legal teams, on the one hand, and analysts on the other. For security and legal teams, they must have monitoring capabilities and answers for audits. For analysts, they must be able to access and explore new data sets to serve the needs of their business units.

The access vs. compliance tradeoff

Without the assistance of highly capable data governance tools, organizations are often forced to choose between access and compliance. You can situate the three parties previously described on a spectrum:

Many organizations err in the direction of compliance, as the stakes grow with the business. It is common for businesses to impose limitations on access, creating necessary safeguards to control authorized data access. They may centralize authority and control over access to data, usually in central data teams which are sometimes supported by analysts or data engineers embedded in different business units or departments, i.e. a hub-and-spoke model. This model intentionally creates information silos to prevent leaks of sensitive data. The downside is that parties who need the data face longer project turnaround times and stale data.

Overcoming the access vs. compliance tradeoff

Overcoming this tradeoff and ensuring both access and compliance fundamentally comes down to finding some way to provide both visibility and control. This ultimately requires a technological solution that automates or programmatically manages access and compliance at scale. Simple, repeatable and robust processes managed through software can bring an organization to its ideal end state, wherein:

  • The concerns of the data controllers are fully addressed, with full visibility for security and legal to conduct internal and external audits. 
  • Data consumers have access to all, and only, the data they require and can self-serve with minimal turnaround times.
  • Data producers are able to implement and enforce policies for compliance and can scalably manage the onboarding and monitoring of their data programs as they mature. 

Let’s review the capabilities and features required to knowing, accessing and protecting data.

Knowing data

All three interest groups – data controllers, consumers and producers – have an interest in observing, knowing and understanding data. Security and legal teams need to audit incoming data and monitor access. Data engineers need to ensure that they are meeting their obligations to analysts and other stakeholders, particularly impact analysis of how upstream data pipeline changes might affect data models downstream. Data analysts need to understand the context of their data and what questions can and can’t be answered.

The technological solution to observing data consists of ensuring that all metadata is exposed in the process of moving data from source to destination as well as organizing all of the data into data catalogs. Specifically, this includes:

  • Exposing column-level data lineage in the form of graphs
  • Real-time metadata capture and logging of keys, tables, columns and data types
  • End-to-end audit trails logging all access, behaviors and changes to a data pipeline
  • A metadata API enabling programmatic management of data movement

Accessing data

Business users rely on fresh and relevant data to make decisions. In order to serve their business users, analysts depend entirely on their ability to access and explore new data. Data teams are the main gatekeepers to analysts, managing approval workflows for interested parties. As a business grows and scales its operations, the core data team can become a major bottleneck for data access and management. 

The solution to the challenge that is scaling your onboarding and access management without scaling your data team is to leverage technology and automation. Streamline and automate your user provisioning with integrated SCIM providers such as Okta and Azure AD. Quickly provide new users with the tools they need to perform their jobs and ensure past employees’ access is revoked on departure. Ensure new users have access to the specific data they need by automatically assigning access based on their team’s permissions. This obviates the need to manually create accounts and configure permissions, which can otherwise be prone to human error and delay.

Protecting data

Security and legal teams are mainly concerned with protecting data, and there is considerable overlap between data governance and security. Similarly, data teams directly manage access control and handle sensitive information.

Technological capabilities that are critical to protecting data include:

  • Compliance with laws and regulations across the relevant jurisdictions, such as SOC 1 and SOC 2, GDPR, HIPAA, ISO 27001, PCI DSS Level 1 and more
  • Role-based access control in order to finely control who can move, load and transform data
  • Blocking and hashing at the column level
  • Automated, centralized user provisioning with granular permissions based on roles
  • Automatic tagging and categorization of data, especially PII
  • Data residency restrictions
  • End-to-end encryption

You can have your cake and eat it, too

The purpose of technology is to multiply human capabilities in order to entirely bypass the ugly tradeoffs of the past. Data governance does not need to be a matter of either accepting data anarchy or clamping down so tightly that basic data projects become complicated. With the aid of specific technologies, your organization can know, access and protect their data while simultaneously satisfying the concerns of security and legal teams, data producers and data consumers alike.

Governed data movement

Traditionally, data governance has primarily been enforced in the data warehouse. But to expand data governance upstream, Fivetran is committed to delivering governed data movement to enforce data governance at the earliest stage in the ecosystem - as soon as it leaves the source. 

Learn more about how data movement platforms like Fivetran can help you solve these challenges with a demo or a trial.

[CTA_MODULE]

Learn more about the Fivetran approach to data movement, security, governance and extensibility.
Download now
Topics
No items found.
Share

Related blog posts

What is data governance?
Data insights

What is data governance?

Read post
Why your modern data stack needs data governance
Data insights

Why your modern data stack needs data governance

Read post
How data governance creates an effective data supply chain
Data insights

How data governance creates an effective data supply chain

Read post
How to give marketers a safe, self-serve Customer 360
Blog

How to give marketers a safe, self-serve Customer 360

Read post
Fivetran supports Microsoft OneLake as a destination through integration with Microsoft Fabric
Blog

Fivetran supports Microsoft OneLake as a destination through integration with Microsoft Fabric

Read post
Why data centralization matters for retail
Blog

Why data centralization matters for retail

Read post
No items found.

Start for free

Join the thousands of companies using Fivetran to centralize and transform their data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.