How to craft the ultimate business case for data governance - Part 1
Explaining why good governance matters to different types of stakeholders
Selling data governance can feel like an uphill battle. It’s a big investment that often gets turned down because the benefits seem abstract compared to the upfront costs. Yet the importance of data governance is undeniable. It’s not just about avoiding risks — it’s about unlocking the full potential of your data.
Today, along with Austin Kronz (Director of Data Strategy at Atlan), I’m kicking off a series about selling data governance. We’ve created a toolkit to advocate for the resources you need to make your governance initiative a success, based on our experience helping hundreds of data leaders.
We’ll focus on the first part of tackling governance — explaining its importance to different types of stakeholders. This includes assessing current gaps in your data practices and explaining the consequences of ignoring good governance. Then, in Part 2, we’ll explain how to showcase success and positive outcomes once your governance initiative is running.
🔒 Goal #1: Safeguard data security and compliance
Target audience: Use this argument with people who care most about shadow data use and your company’s reputation (e.g. Legal, Brand, and PR teams).
The key issue here is data silos. Without visibility into your data estate, it’s difficult to identify and protect sensitive information at scale, leading to unusable data, subpar models, and fines.
How to help stakeholders think about this issue
This is a two-part process, where you prompt stakeholders to understand weaknesses in your current data governance process and then explain how these gaps are worth addressing.
First, ask stakeholders to rate the following on a scale from 1 to 5 (1 = "completely disagree", 5 = "completely agree"):
Sensitive data: We have clear visibility of our entire data estate to identify sensitive data at scale, and are confident there are no siloes we can’t see into.
Data management: We are confident about our management and use of data, and we have never been in violation of any data regulations.
Data use by producers: Users have access to enough data to effectively conduct analysis or build models.
Data requests: Our process for handling requests for data is automated, well documented and rarely results in error (e.g. data deletion, or sensitive data being shared).
For any low-rated statements, explain why those gaps matter using the points below:
Limited data quality: Without proper governance, analysis is often based on incomplete or unreliable data, leading to poor insights and weaker data models.
Heavy fines and reputational damage: Failing to protect sensitive data can result in steep fines and long-lasting reputational damage.
Missed revenue opportunities: In regulated markets like healthcare or financial services, data access or compliance issues can prevent companies from conducting meaningful analysis and tapping into new markets.
Low team productivity: Manual compliance processes drain resources and hurt team productivity.
How to demonstrate the “art of the possible” with good governance
Sometimes avoiding problems isn’t enough for stakeholders — they also need to envision a better future. Ask stakeholders the following questions to help them imagine the positive impact of a comprehensive governance strategy:
What could our team achieve if policy execution were automated and programmatic?
How would transparent data handling boost our brand's reputation? Which fines could be eliminated entirely?
What revenue opportunities could open up in regulated markets like healthcare or regions like the EU?
Which teams would benefit most from comprehensive access to the data estate? Why?
😎 Goal #2: Trusted data for confident decisions
Target audience: Use this argument with people who care most about turning data into immediate business impact and decisions (e.g. team leaders and directors).
The hallmark of this problem is uncertainty. People don’t trust the data, resulting in poor data culture and decision-making. You often hear complaints like “The dashboard looks wrong,” even when the data is accurate.
How to help stakeholders think about this issue
Ask stakeholders to rate the following on a scale from 1 to 5:
Self-service: Our domain users are capable of self-serving their data and generating insights with little intervention from a dedicated data team.
Data understanding: Decision-makers know the source, definition, and quality of data they see.
Downstream impact: There is clarity around the impact on downstream assets when changes are implemented upstream.
Trust in data: Our organization can confidently rely on the data available to make decisions, and our dashboards and insights rarely show conflicting or incorrect results.
For any low-rated statements, highlight these consequences:
Misunderstood data: Using data in the wrong way can lead to poor decisions, bad customer experiences, and increased risks with shareholders and regulators.
Missed opportunities: When data doesn’t appear as decision-makers expect, they may assume it is an error and overlook valuable insights.
Less agile teams: Data teams might hesitate to make necessary upstream changes, reducing overall agility.
Low morale: Distrust between business users and the data team can lead to inaccurate shadow analytics and harm morale.
How to demonstrate the “art of the possible” with good governance
Who in the organization would exceed their KPIs if they made better data-driven decisions?
What’s the benefit of everyone—engineers, analysts, executives, board members—speaking the same language when it comes to data?
How would previews of downstream impact and automated alerts change the team culture?
What does a trusted data team look like?
🚀 Goal #3: Accelerate insights, data projects, and AI
Target audience: Use this argument with people who care most about the business value of your company’s data (e.g. C-Suite and Finance team).
The main issue here is inefficiency. Without clear data visibility, teams spend too much time searching for data, delaying business insights and innovation.
How to help stakeholders think about this issue
Ask stakeholders to rate the following on a scale from 1 to 5:
Data visibility: We have full visibility on our overall data estate’s contents, usage, and value.
Data use by consumers: Data consumers can easily find and understand data on their own, and we have clarity around which data is being used to generate business insights.
Productivity: The data team is mostly doing high value work, effectively utilizing resources and generating new assets (e.g. tables, dashboards, and models) rather than maintaining old assets.
For any low-rated statements, highlight these consequences:
Delays in impact: Limited understanding of available data can cause project delays and push back potential revenue impacts.
Confusion among data consumers: The data team can spend most of their time answering repetitive questions while data consumers struggle to quickly launch new projects.
Wasted time for the data team: Data people may end up most of their time maintaining existing pipelines, leaving little bandwidth for revenue-generating data projects.
How to demonstrate the “art of the possible” with good governance
How would easily discoverable data assets and products accelerate time to insight and time to value?
If the data team spent all their time on creating new assets and models, what could they deliver for the business this year?
How would adding detailed metadata to each data asset improve the performance of your natural language query capabilities with data?
💸 Goal #4: Realize ROI on the data stack
Target audience: Use this argument with people who care most about your data infrastructure (e.g. CDOs and data leaders).
Achieving ROI on an expensive data stack is critical. The key hallmark of this issue is poor performance, often caused by data sprawl, poor visibility, and lack of documentation, which leads companies to suboptimal ROI and wasted resources.
How to help stakeholders think about this issue
Ask stakeholders to rate the following on a scale from 1 to 5:
Data sprawl: There is minimal data sprawl across systems and tools, making it easy to find and use data assets.
Data documentation: Data assets have rich metadata and documentation for better understanding and modeling.
Use cases: We are focused on high-value use cases and can confidently make big bets on leading edge capabilities like AI.
For any low-rated statements, highlight these consequences:
Overspending on data team and infrastructure: A sprawling data estate can lead to overspending on redundant or low-value data assets, while the data engineering team loses productivity and morale.
Low ROI: Poor documentation and understanding can result in underperforming BI tools and data assets.
Weak adaptation: When it’s difficult to implement the latest tools and architectures, you can end up with subpar results for AI models and other critical use cases.
How to demonstrate the “art of the possible” with good governance
What company objectives would benefit from better data insights and models? Would increased BI tool usage indicate this?
What is the projected spend on your data stack next year? What value would saving 25% offer?
What could be unlocked by achieving the remaining percentage of your vision for a modern data stack?
How do you measure the performance of data and AI models? What could improvements in performance mean for your business?
📚 More from my reading list
How many data roles does it take to screw in a dashboard? by Elliott Stam
The point of a dashboard isn't to use a dashboard by Terence Eden
Rethinking analyst roles in the age of generative AI by Ben Lorica
How to communicate data effectively by Olga Berezovsky and Thomas Schmidt
How we created data science personas for Spotify’s analytics platform by Federica Luraschi and Serena Fang
Mastering dashboard design: from good to unmissable data visualizations by Seoyeon Jun
Your guide to AI: September 2024 by Nathan Benaich and Alex Chalmers
How serious is your org about data quality? on r/dataengineering
Top links from last week:
Mapping the data technology landscape by Dylan Anderson
The life cycle in a data science project by David Andrés
A crash course on graph neural networks by Avi Chawla