Why data governance fails in today’s AI world
The 4 patterns that lead to a failed data governance deployment, according to interviews from data leaders
Welcome back to this cozy corner of the internet where I share my (meta 😛) thoughts on everything metadata.
You may have noticed it’s been a bit since the last issue — we’ve been on a bit of an unexpected hiatus. Over the past year, boards have consistently asked CIOs and CDOs about their AI roadmaps, who have realized that the main hurdle isn’t AI models but the lack of AI-ready data — data enriched with business context, trust, and security. As demand for AI data readiness and governance skyrocketed, we were catapulted into that exciting hockey stick growth, and I’ve spent the better part of this year preparing us for our next phase of scale.
Last week, we announced our $105 million Series C. (Read the coverage on the Wall Street Journal, TechCrunch, Reuters, and Datanami.) I’ve been pinching myself as I look at how far we’ve come, not just from our last fundraise but from our very first days. I’m so grateful for everyone who has trusted us, from our customers who see us as partners, to our phenomenal global team, to our investors who believe in our vision.
I can’t wait to see what lies ahead, not just for Atlan but for the data world as a whole. There’s so much to talk about in today’s AI-superpowered world, and you know I’m bringing all the spicy takes and hot debates to Metadata Weekly.
I’m also excited to open up this newsletter to include more voices and perspectives from the Humans of Atlan. We now have an amazing tribe of 300 passionate Atlanians who are building the future of (meta)data and data governance.
Today, I’m excited to welcome Sharif Karmally to Metadata Weekly. Sharif has been interviewing dozens of data leaders to find the root cause of why traditional data governance fails in today’s era, and we think we’ve found the 4 patterns that make for a failed data governance deployment.
✨ The 4 reasons that data governance fails
Can you guess what data leaders are focusing on in 2024? We’ll give you a hint — despite the hype, it’s not solely AI.
At this year’s Gartner Data & Analytics Summit, we surveyed over 600 data leaders. Their biggest priority, across industries and companies, is actually data governance.
With the explosive growth of AI, CIOs and CDOs are constantly being asked about their AI roadmap. The main hurdle, they’ve realized, isn’t the AI models. It’s the lack of AI-ready data, or data that is trustworthy, secure, and enriched with business context. According to studies from IBM and AWS, the key obstacles hindering generative AI implementations are data privacy, data lineage, and data quality — all key elements of data governance.
In interviews with data governance leaders, we’ve learned some surprising patterns about why data governance is such a challenge. Here are the 4 reasons that most data governance efforts fail in enterprises today.
1. Companies lack a “common currency” for talking about data
Data governance relies on information flowing from person to person and team to team. However, one of the key problems is that different groups of people talk about data differently.
For example, data producers may come into a conversation talking about data in terms of data pipelines or Python code, while data consumers are talking about SQL code and their data visualization needs. Meanwhile, data stewards are talking about policies and privacy regulations, such as HIPAA or GDPR. In short, each group has their own “currency”, or a language for exchanging information about data.
Data engineers are often tasked with sharing their particular currency, i.e. their pipeline or code knowledge, with the rest of the company. However, the way they’re asked to do this is impractical, usually by manually filling out documentation. This is just too much of a burden for data engineers.
The result — data producers can’t effectively translate their knowledge to a common currency, and incomplete or missing documentation leaves everyone in the dark. Here’s how one data leader described what it took to understand a data set before using it:
“You needed to determine who knew in which database your data was located, assuming it existed. So you were trying to find someone that you didn’t know existed, and you didn’t know if that person wanted to talk to you or had the time. If you found that person, you wouldn’t find anything that resembled metadata. You wouldn’t know what a column meant.”
– Data Governance Analyst at an insurance and banking company
Governance relies on documentation to create a shared currency, but this is just too difficult nowadays. While data leaders certainly can (and do!) ask that producers spend time on this, the reality is that it usually won’t happen without some level of embedded automation.
2. Governance is too rigid, expecting people to adapt to it
In today’s fast-paced data world, flexibility isn’t just a nice-to-have. It’s a necessity. At Atlan we support hundreds of organizations on their data governance, and I’ve yet to come across two governance practices that look exactly the same. Rigid data governance frameworks and applications aren’t able to accommodate the diversity of people (e.g. domains and roles), processes, and technology within a company.
Ownership, stewardship, and responsibility are more than likely completely separate (or non-existent) things when you're just getting started. Some domains might have a data steward, but that person isn't responsible for the quality of the data. Some data teams might own the management of a particular application's data, but they are not stewarding how the data is used.
“You might find in any company across the globe that there are dashboards, KPI owners, data fields, data health checks, data consumers, and not everything is connected. They’re flowing in an ocean, and no one understands their meaning or how they’re related.”
– Data Governance Lead at a digital experiences platform
One of the biggest tasks for data governance leaders today is moving from prescriptive governance to flexible processes and technology that adapt to their company’s specific structures and needs, rather than expecting people to adapt to them.
This often will involve identifying technical data owners, identifying and enabling (and sometimes temporarily playing the role of) data stewards, and building relationships with domain leaders who are ultimately responsible for data they produce and consume.
3. Governance is siloed and disconnected from daily work
In most companies, the data landscape is like a disconnected archipelago where the policies governing data live apart from data work. These essential rules are often housed in obscure corners of a company’s repositories, languishing as static digital files or even in paper documents, detached from the daily tasks that data people carry out.
This means that, no matter how well-intentioned a company and its employees are, data governance policies and rules are often not actually put into practice. When these rules aren’t built into data people’s routines, it can be tough to follow them to a tee. People may miss important aspects of data governance while deep in daily work, or they may forget about them entirely.
“One of the biggest challenges is context propagation from different business teams to the data producers. The goal is to transfer context from policy definers who know the scope of a regulation directly to data producers’ work, which can keep the dataset compliant and roll out access in line with policies.”
– Data Leader at a financial services company
Bringing policies from data stewards’ documents into data producers’ processes, in a way that reflects the way they work, is one of the keys to making data governance a reality. This ideally should involve modern technology that automates these policies within daily data work, through embedded code rather than distant, oft-forgotten PDFs.
4. Governance is closed off from the latest technology
Closed data systems are like walled gardens. They may offer security and control, but they also limit innovation and collaboration.
In search of privacy and protection, enterprises often lock their data in closed, legacy systems. However, this can hinder them from using their data. Without the ability to openly build on top of their data platform, companies end up locking themselves out of new initiatives like AI.
“We need a tool that won’t just be adopted by our business users of today, but support our future state initiatives, like AI readiness when LLM is a data consumer, and our data products.”
– Vice President at a Fortune 100 company
Of course, protecting your data is key. But as organizations seek to build modern governance systems for modern technology (such as generative AI or data products), overly rigid systems often end up holding them back. Just imagine working in a Word document while your colleagues get to play on the latest technology — this is often the case for data governance people today!
Open, extensible foundations are key to good data governance, since they allow technical teams to build applications, create custom systems, and use the latest tech to govern complex data ecosystems, including governing AI.
🎤 Join us at Re:Govern tomorrow
Tomorrow we’re taking on these problems and more at Re:Govern, the industry conference for a new era of data and AI governance, featuring data leaders from General Motors, Dropbox, Patagonia, Accenture, and more.
Join us to learn about the future of data and AI governance technology — one that enables responsible AI, inspires collaboration on business outcomes, and embraces flexibility.
📅 Tuesday May 14, 2024
⏰ 1 pm ET
📍 Virtual
📚 More from my reading list
The 2024 MAD (machine learning, AI & data) landscape by Matt Turck
Machine unlearning by Ken Liu
How do machines ‘grok’ data? by Anil Ananthaswamy
How to build successful business cases by Yordan Ivanov
The unbearable lightness of… naming things by Elena Dyachkova
AI and the workplace by Charlie Guo
Everything’s small data again by Randy Au
Women Lead Data podcast by Lindsay Murphy
P.S. Liked reading this edition of the newsletter? I would love it if you could take a moment and share it with your friends on social! If someone shared this with you, subscribe to upcoming issues here.