Data governance refers to the policies, processes, roles, and technology that businesses use to ensure the availability, usability, integrity and security of data.
In a world where data drives business decisions, and where governments regulate how data must be handled, data governance and good data stewardship are becoming ubiquitous. In fact, data governance should be a core component of any organization’s data management strategy.
According to TDWI, 84% of organizations say data governance is a top priority. It’s critical that businesses consider who can access what data throughout its entire lifecycle: collection, storage, preparation and distribution.
In recent years, growing concerns about data security and privacy have engendered a slew of data privacy laws across the globe, such as Europe’s GDPR, Brazil’s LGPD and California’s CCPA. Noncompliance can result in consumer mistrust, reputational damage and massive fines. In the last year alone, EU data protection authorities have handed out more than a billion dollars in fines over data breaches that violated the GDPR.
Different industries also regulate security and privacy, which is why organizations get certified for compliance with standards such as HIPAA, PCI DSS and SOC 2.
Data governance looks to address data security and privacy concerns and risks to maximize the benefit of data. It helps ensure that businesses stay in compliance with applicable regulations, safeguarding personal information and user data appropriately. It can also help them improve the quality of their data. While many companies implement data governance for compliance, another critical use case is to drive data quality — bringing together better data for better insights and decisions.
Without effective data governance, businesses are likely to have to deal with inconsistencies in data definitions and content across different systems. For example, some systems might use European-style dd/mm/yyyy dates, while others use American mm/dd/yyyy format. Issues like this create data integrity issues that affect the accuracy of business intelligence (BI) and reporting dashboards.
Data governance best practices will improve an organization’s data quality, lower data management costs and help the organization make relevant data accessible and available for nearly every employee.
The four dimensions of data governance
To improve data integrity, accuracy, consistency and availability, organizations should assess their data governance practices and determine whether they have adequate procedures in place. To do that, they must examine the policies, processes, roles and technology that businesses use to ensure the availability, usability, integrity and security of data.
Data governance policies are documented guidelines for ensuring that an organization's data is managed and used properly. Policies govern data workflows and answer questions about how data should be handled, who can access what data, and for how long.
Data governance policies fall into four subcategories.
- A data governance structure outlines the laws and restrictions that everyone in the organization must follow. It also defines who manages data governance at the organization.
- A data usage policy talks about the ethical use of company data, and should include rules that mandate people access data only for business purposes.
- A data access policy outlines the rules about who can access what data.
- Data integrity policies address the reliability and accuracy of data.
Organizations need to customize policies for their use cases and datasets. Healthcare data is different from financial data, which is different from HR data, and so on.
Healthcare is a particularly tricky area, given the fact that personally identifiable information (PII) is heavily regulated by HIPAA and other laws. This means that some tables may not be accessible to anyone who wants to view or create reports using them, and some fields have to be masked or otherwise obfuscated. This is precisely the purview of data governance software, which can help ensure data privacy for these key fields and tables.
While the goal of data governance for every organization is better data management, data governance policies are not “one size fits all.” Every organization’s data is different, and anyone setting up data governance has to be aware of how different teams and applications handle data within their organization, while keeping in mind any relevant regulations.
Data governance policies are the behind-the-scenes rules that ensure that the right people have access to the right data and that data is being handled appropriately. By identifying and tagging PII and applying relevant policies, businesses can control how they manage sensitive data.
Processes, in relation to data governance, refer to the flow of data within the organization. To implement data governance, you have to know all of the processes that touch on business data. Data governance defines processes that address how data is:
- expired or deleted
Roles are the way data governance software determines who can be involved in any of the defined processes — who can access data or modify it, for instance. Using role-based access, you assign all employees roles that determine which schemas, tables and fields they can access, and whether they can read or modify those objects.
Typical roles include:
- data owner or administrator, who can do just about anything
- data steward, who can access and modify data to ensure its accuracy
- data custodian, who is responsible for tasks like backups
- data users, who can see data that pertains to their job functions
Roles also apply to business divisions. For instance, employees in the human resource department must be able to load and access sensitive information about employees, while the sales teams cannot. Rather sales teams need access to account contact information and email addresses.
Note that data governance roles define how people relate to specific data regardless of their business roles. Just because someone is a CEO doesn’t mean they should be able to see (let alone change or delete) any and all data. Similarly, an HR administrator might be a data steward for certain HR-related data, but only a data user for other data, while someone in IT is actually the designated data administrator. Access to different processes depends on the rules set forth in the data governance policies.
Just as different organizations have different policies, there are also different roles. Like policies, roles have to be customized to the organization, though most companies have certain common roles, like the ones we just touched on.
To be successful, data governance should be integrated across technologies that make up the modern data stack. Software for data governance can either be purpose built or baked into applications that make up the modern data stack. Purpose build tools can include data catalogs, such as Alation and Collibra, and data observability. In many cases, these tools are offered as part of larger suites that incorporate metadata management and data lineage features that track where data originates and flows through an organization's systems.
Other tools in the modern data stack, such as data integration solutions like Fivetran, are extending their data governance capabilities to better support, automate and ensure an organization’s overall governance strategy is effective.
Rethinking data governance for data integration
In thinking about how to make data governance more successful for organizations that want to use data analytics, it’s helpful to think about how data governance applies to a data pipeline.
Some organizations create data catalogs as part of their data governance efforts. Data catalogs make it easier for users to find data, but they’re only as good as the data they know about. If a data catalog can’t see what happens with data as it moves within an organization, it’s challenging to apply policies and control workflows.
A data integration tool like Fivetran can help expose metadata for a data catalog and make the data more trustworthy — and that, after all, is the purpose of data governance.
Data governance policies ensure the right people have access to the right data and that data is being handled appropriately. By tagging personally identifiable information (PII) from the source and applying the relevant policies, data stewards can control if and how sensitive data is loaded to the destination. Additionally, if something is tagged at the integration step, that tag should be carried through to the warehouse to ensure downstream workflows handle it appropriately.
Supporting distributed data teams
Many organizations now support multiple data teams in different departments, divisions or countries. Centralized data teams let business units “own” their own data. However, all corporate data still has to comply with company, industry and legislative compliance measures. Policies should be set up by a centralized data team, with the help of legal and security experts, and pushed down to distributed teams.
Download IDC’s Technology Spotlight paper to learn more about the challenges that enterprises are facing around securing their modern data stacks and what you can do to better secure your data integration environment.