How companies are making Forrester’s idea of modern data cataloging a reality
The unified control plane in action
Last week, I explored a major shift in the data world — a transformation that extends beyond just data cataloging to encompass governance, quality, security, observability, and more.
This shift was captured in The Forrester Wave™: Enterprise Data Catalogs, Q3 2024 report, which marked a fundamental rethinking of what a data catalog can and should be. Forrester explained that we’re now in the midst of a “transformation” in the data cataloging space, driven by the rise of GenAI, increasingly fragmented data estates, the diverse needs of data users, and the critical nature of business use cases.
As a result, the best data catalogs are no longer just catalogs. They’re evolving into comprehensive platforms that go beyond simple cataloging and use AI and automation to integrate and manage a range of metadata-driven capabilities. I think of this as a “unified control plane”, a vital layer for managing context, governance, and compliance across an organization’s data landscape.
But what does this modern data catalog or unified control plane look like in action? How are real companies implementing it today?
In my role as co-founder of Atlan, I’ve had the chance to work with hundreds of organizations on their data cataloging, governance, and overall data strategy. Today, I’ll be sharing stories from two of these companies (anonymized for confidentiality) who are leading the way in making the modern data and AI control plane a reality.
🎥 Streamlining the Streaming Era: How a Unified Control Plane Powers a Multi-Cloud Media Empire
“This is not just a catalog. This is a governance platform as well.”
Prior state: Failed implementation of an open-source catalog
Problems to solve: Data silos, conflicting metrics, and a lack of alignment across teams
Results: Reduced manual effort and 50-60% saved man-hours, coupled with increased data trust and dashboard usage
In the fast-paced world of media and entertainment, where millions of viewers tune in across various platforms, the ability to manage and leverage data effectively is crucial. For this major media company, data is at the heart of everything, from personalized marketing to monetizing data products. However, managing their vast, complex data was no easy task.
With 300 million monthly active users, 6 million concurrent users, and 50,000 data points ingested per second, the scale was immense. Their data estate included sources from acquired business units, first-party data, third-party data, and vendor data, all of which needed to be unified to provide actionable insights.
Before adopting Atlan, they attempted to implement an open-source solution. Unfortunately, this approach failed to deliver the consistency and control they needed. Analysts were overwhelmed, struggling to use even a fraction of the available data for domain-level analysis. Data silos and the inability to enforce policies at scale left them inundated with ad-hoc requests, while executives were frustrated by conflicting metrics in critical dashboards, eroding trust in the company’s data.
The transition to a modern data catalog marked a turning point. “This is not just a catalog. This is a governance platform as well,” said a data governance leader at the company. The new platform acted as a central control layer for data producers, consumers, and stewards, enabling trust and transparency across domains and teams. This made it “easy to achieve those governance objectives of having standardization, having set those policies, and automating the process with your metadata”.
Automation was a key player in their success. For example, they used automation to define that if an asset is at the Bronze layer, it will be automatically tagged with the Bronze schema. This automation didn’t just streamline processes — it freed up data stewards to focus on driving business outcomes. As one data leader put it, “I’m actually using the AI-generated questions as a data testing tool, and it’s amazing. Not just asking the business questions — you can use it for any purpose. There are endless possibilities. This is going to be a game changer.”
The results were undeniable. The company saw a significant reduction in manual effort, according to the same leader: “I’m confident that it’s saving at least 50-60% of man hours. Before, [we] were spending 2 hours building or writing definitions for a view. Now, a couple of views hardly take 10-15 minutes.” At the same time, usage of their dashboards has increased and they have been able to provide trusted data products to advertisers.
🚗 Accelerating Autonomous Innovation: How an Automaker Used Governance to Unlock a Transformation
“We are currently undergoing a transformation … from an automaker to a platform innovator.”
Prior state: Failed implementation of IBM Knowledge Catalog
Problems to solve: Unclear data usage policies, duplicative efforts, and lack of standardization
Results: Their broad data transformation initiative, including Atlan for governance, has reduced time-to-insight from 28 days to less than 3 hours, and added $300+ million to their bottom line.
In the rapidly evolving world of autonomous and electric vehicles, data isn’t just important — it’s essential. For one of the world’s largest automotive manufacturers, data has become the backbone of their vision to create a safer and cleaner world.
Their data estate is vast, including one of the largest Databricks deployments globally. However, their initial attempt to manage this data with a legacy data catalog failed to gain traction, leaving them with siloed tools and inconsistent practices across business units and regions.
Two of the company’s business units were at the forefront of this challenge. They needed to leverage telemetry data from millions of vehicles to achieve their top strategic goals, including a platform for in-house and third-party development, and personalized driver experiences. But, there was a significant barrier: it took as much as 200 FTE equivalent each year for internal teams to find data, let alone understand it and the policies governing it.
To overcome this, the company embarked on a transformation, with Atlan as the governance platform for these two business units. It quickly became the central nervous system of their data and analytics governance stack, integrating seamlessly with their cloud data warehouses and lakehouses, policy management tools, quality rule engines, and pre-existing metadata repositories. The open platform allowed them to automatically govern data from vehicles at the source using APIs, providing stewards with a view of data lineage and usage across the entire data ecosystem, with end-to-end visibility from the cloud all the way back to on-prem systems.
Their broad data transformation initiative, including Atlan for governance, has reduced time-to-insight from 28 days to less than 3 hours, and added $300+ million to their bottom line. Two additional business units have also adopted Atlan based on this demonstrated success.
Interested in learning more about The Forrester Wave: Enterprise Data Catalogs, Q3 2024? Join us for an exclusive conversation with Jayesh Chaurasia, author of the groundbreaking report on September 26!
📚 More from my reading list
"We ran out of columns" - The best, worst codebase by Jimmy Miller
What do Large Language Models “understand”? by Tarik Dzekman
The life cycle in a data science project by David Andrés
Eight basic rules for causal inference by Peder M. Isager
Explaining why data & models aren’t always right & getting leaders to act on them by Vin Vashishta
How I use “AI” by Nicholas Carlini
Engineer’s guide to convincing your product manager to prioritize technical debt by Gregor Ojstersek and Robert Ta
Mapping the data technology landscape by Dylan Anderson
You guys have no idea just how much people hate generative AI by Alberto Romero
A crash course on graph neural networks by Avi Chawla
The much broader world of datetime formats by Randy Au
Top links from last week:
Data products = the future of data engineering by Zach Wilson
10 ways to be data illiterate (and how to avoid them) by Jason Liu and Eugene Yan
An open course on LLMs, led by practitioners, by Hamel Husain