Why we failed 3x to set up a data catalog, the potential of data warehouse, and more

✨ Spotlight: We Failed to Set Up a Data Catalog 3x. Here’s Why.

Jan 26, 2022

Welcome to this week's edition of the ✨ Metadata Weekly ✨ newsletter.

Every week I bring you my recommended reads and share my (meta?) thoughts on everything around metadata.

In this edition, I am sharing some of my recent favorite reads, our lessons from experimenting to build a data catalog, and more. Let’s get started! 👇

We Failed to Set Up a Data Catalog 3x. Here’s Why.

For those of you new here, before Atlan, I started a data for social good company. Our team acted as a “data team” for our customers, so we experienced all the chaos and frustration of dealing with large-scale data firsthand. We were awoken with crisis calls every couple of days for even tiny issues — whether it was troubleshooting why a number on a dashboard was incorrect to trying to get access to the right data set.

We worked with a wide variety of data, everything from 600+ government data sources to unstructured data sources like satellite imagery. Our data grew faster than we expected, and we hadn’t really planned how to store or access it beforehand. We quickly realized we needed a central repository to help our team discover, understand, and build trust in all the data sets we were working with.

We thought it would be easy enough to figure this out, but we couldn’t have been more wrong. Here’s the story of how it took 4 attempts and 5 years to finally succeed in implementing a successful data catalog for our team. Some of the lessons we learned along the way:

The real challenge lies in building relevancy into data discovery — i.e. being able to curate and tag datasets and metadata so that our knowledge graph could build meaningful relationships and our search algorithms could understand what data was actually relevant to a user.
To create true shared context, the right solution needs to be inclusive. On one hand, the solution needs to make it easy for an engineer to push in metadata via APIs in their pipeline tool. On the other hand, business users need an easy UI or Slack integration to add their context.
Start with the user experience, make something that your team would want to use, and enable data teams to become more productive through embedded collaboration. Embedded collaboration is about work happening where you are, with the least amount of friction.

Read more about our lessons from multiple attempts (and failing) to build a data catalog solution for our team here.

We’ve only scratched the surface of the full potential for the data warehouse by Mikkel Dengsøe

Across teams, at Atlan, we rely on 50+ SaaS applications. Most of them try to create their version of "primitive" analytics or dashboards, and the results are pretty poor. As the data warehouse becomes an integral part of the technology stack of companies, is there a case for services and products built on the data warehouse? What if the future could look like a set of "add-ons" built on the data warehouse?

“The data warehouse already collects data from all sources. It will soon be able to send that data to any tool you want and operational and sales & marketing teams will use it as the core decision engine for how they work. Finance teams will be able to trust the data and start pushing for moving away from spreadsheet to using the data warehouse. What happens in this world?”

We’re clearly entering a new era, where the data warehouse is the new backend. Loved Mikkel’s article on the breakdown of the five phases of data warehouse – business intelligence, operational tools, sales & marketing, finance, and everything else.

“The data warehouse will be the control centre for companies in the future. It will expand from analytics and become the core of sales, operations, finance and much more.”

I love articles like these that open these conversations about the future and it makes me excited about the future of the modern data stack. We do have a long way to go, but there are some very exciting possibilities ahead of us!

📚 Other articles from my reading list

How the cloud will be reshuffled by Erik Bernhardsson
An experience of a “Data Ecosystem” by Antriksh Goel
Creating a data road map by Ilan Man

🐦 From Twitter

This tweet from Rahul Jain might be the most profound tweet I’ve read in 2022! When we launched Atlan, the first tag line on our website said “Data can be chaos. Work shouldn’t be”.

I personally believe that embracing the reality and imperfections of the data space is one of the most important mental shifts you need to be successful (and maintain sanity!). I’d highly recommend that every data person read about the concepts of wabi-sabi and Anekantavada.

Rahul Jain @rahulj51

Two eastern philosophical concepts that have helped me with comprehending and navigating the data space better /1

I'll see you next week with more interesting stuff around the modern data stack. Meanwhile, you can subscribe to the newsletter on Substack and connect with me on LinkedIn here.

Metadata Weekly

Why we failed 3x to set up a data catalog, the potential of data warehouse, and more

✨ Spotlight: We Failed to Set Up a Data Catalog 3x. Here’s Why.

We Failed to Set Up a Data Catalog 3x. Here’s Why.

👀 Fav Weekly Read

📚 Other articles from my reading list

🐦 From Twitter

Discussion about this post