What the recent Forrester Wave means for data catalogs
A massive transformation — data cataloging now includes governance, quality, security, monitoring, and more
Quick announcement: Metadata Weekly now has over 11,000 subscribers across Substack and LinkedIn! I’m so thankful for all of you who read and support this newsletter, and I’m excited to keep writing about all things data and metadata. 💙
In the last issue, I talked about why data catalogs are falling short today — in short, the modern data ecosystem and its users are more diverse than ever before, and metadata is itself evolving into big data. Whether they’re technical or universal, even the best data catalogs just can’t keep up, and companies still end up with widespread confusion and silos.
I’m not trying to be a Captain Negative, so what’s the solution? This is where some major news comes into play.
The Forrester Wave™: Enterprise Data Catalogs, Q3 2024 was just released. In this report, Forrester examined today’s most significant catalogs and emerged with plenty of thoughts about what it means to be a modern data catalog.
In today’s issue, let’s examine what we believe this Forrester Wave means for not just data cataloging, but also the data governance, quality, security, and observability categories.
💥 A “massive transformation” in the data cataloging space
Most data people have firsthand experience with the “data wiki”, a data catalog that aims to inventory and document all of a company’s data. It’s expensive to buy, slow to set up, a pain to populate… and ultimately people just don’t want to use it.
For the last few years, analysts have focused on what to add to these traditional data catalogs to make them successful. Forrester talked about machine learning data catalogs in 2018 and 2020, then focused on data catalogs for DataOps in 2022. Meanwhile, Gartner moved from traditional “metadata management” in 2020 to focusing on active metadata in 2022.
And yet, none of these additions seem to have fixed the problem with data catalogs. That’s why Forrester just announced a major transformation in the way it thinks about Enterprise Data Catalogs.
“Like other data management sectors, enterprise data catalogs (EDCs) are witnessing a transformation driven by AI advancements, fragmented and complex data estates, accessibility needs, and strategic imperatives to harness data for competitive advantage. The exponential surge in the velocity, variety, veracity, and volume of data demands solutions that transcend traditional metadata repositories and technical user bases. Customers seek solutions that can bridge the gap between complex datasets, governance, business insights, and AI enablement. Vendors are offering intelligent solutions, integrated AI and ML to automate and enhance data discovery, semantics curation, impact analysis, quality assessment, among other catalog functionalities. They are also improving user experiences to cater to both technical and nontechnical users, thereby supporting the goal of data democratization and self- service.”
– The Forrester Wave™: Enterprise Data Catalogs, Q3 2024 (emphasis added)
Let me highlight that: Forrester said that EDCs today need to “transcend traditional metadata repositories and technical user bases”. In other words, catalogs can’t just be data wikis for technical data people any more.
So what should a modern data catalog look like?
First, Forrester talked about how basic cataloging is no longer enough. Instead, EDCs need to automatically catalog, analyze, and govern your entire data ecosystem, from traditional databases to SaaS platforms, unstructured data, AI/ML repositories, and more.
“Advanced solutions offer features like automated metadata harvesting, cross-platform semantic mapping, policy enforcement, quality validation, and end-to-end lineage. This holistic approach ensures a complete view of all data assets, including AI/ML models, to enhance governance, compliance, and use across the organization.”
Second, this holistic approach can’t be powered by data stewards doing manual work. Instead, AI and automation are key to quickly rolling out catalogs and creating value with them. Note that this isn’t just about cataloging — it’s also about powering data governance and quality efforts, all within the catalog rather than in separate governance and quality tools.
“Modern solutions… offer advanced capabilities, including AI-assisted data discovery, generative AI (genAI) augmentation, ML-driven profiling, automated anomaly detection, predictive tagging, and proactive compliance reporting. These technologies are crucial for streamlining data governance, enhancing data quality, and unlocking actionable insights.”
Forrester then evaluated various cataloging tools based on what they deemed to be the key capabilities of a modern EDC. But instead of focusing on the standard aspects of a data catalog (e.g. metadata management, data discovery, data lineage), they also expected capabilities from what we often think of as separate spaces and tools — e.g. data governance, security, privacy, etc. Here’s the list of evaluation criteria under “Current Offering” (emphasis is my own):
Metadata management
Data products
Data discovery and profiling
Data lineage
Governance, risk, and compliance
User interface and user experience (UI/UX)
Deployment and time to value
Data quality and observability
Monitoring and alerts
Data privacy and security
Workflow and task management
Integration
Collaboration capabilities
Marketplace and exchange
Real time, IoT, and edge
Advanced capabilities
In short, Forrester is drawing a line in the sand, arguing that we are now witnessing a “transformation” in the data cataloging space, driven by GenAI, fragmented data estates, diverse user needs, and business-critical use cases. As a result, the best data catalogs can’t just be catalogs anymore. Instead, they should use AI and automation to take over other metadata-driven capabilities like data governance, security, observability, and monitoring.
This is a huge shift but I think it’s ultimately a good one. The data space is incredibly fragmented these days, so if we can merge several different spaces and tools into one, it’s ultimately better for users.
I personally think of this new idea of the EDC, the catalog that’s more than just a catalog, as a unified control plane — a comprehensive layer that can manage context, governance, and compliance across diverse tools and for diverse users.
⛰️ Recognition of the impact customers have with Atlan
Not to bury the lead but… we were named a Leader in the Forrester Wave™: Enterprise Data Catalogs, Q3 2024, with the highest scores across all vendors in the “Current Offering” and “Strategy” categories!
Click here to view a complementary copy of the report, including the Wave graphic.
Atlan got the highest score possible in 15 criteria, including Data lineage; Governance, risk, and compliance (where we were the only company to score a 5/5); Adoption; and Deployment and time to value. The report recognized us as "an unparalleled partner” for organizations “aiming for democratization and AI- enhanced self-service to governed data”.
“Atlan differentiates itself with a personalized, AI-driven catalog, providing quick value… Atlan’s Third-Gen Data Catalog is quickly outpacing established players by adeptly anticipating and addressing strategic customer needs through automation. Atlan is a visionary player with a clear, ambitious goal: to become the data and AI control plane enabling complex business use cases.”
With the highest possible scores in criteria like Vision, Innovation, and Roadmap, we’re more confident than ever about our vision of building a data and AI control plane, powered by active metadata, with complete configurability, interoperability, and openness to power every data team in every industry, however unique and complex their need.
📚 More from my reading list
World’s first major AI law enters into force by Ryan Browne
10 ways to be data illiterate (and how to avoid them) by Jason Liu and Eugene Yan
The space between data dogmatism and data nihilism by Jason Ganz
Data products = the future of data engineering by Zach Wilson
A crash course on model calibration by Avi Chawla
Mixed model arts - the convergence of data modeling disciplines by Joe Reis
We need positive visions for AI grounded in wellbeing by Joel Lehman and Amanda Ngo
An open course on LLMs, led by practitioners, by Hamel Husain
US regulators fine Citi $136 million for failing to fix longstanding data issues by Michelle Price, et al.
Top links from last week:
The data professional’s cheat sheet for working with stakeholders by Jerrie Kumalah
How top data teams are structured by Mikkel Dengsøe
The three biggest data problems companies face by Dylan Anderson