Happy New Year!

Best of my readings and writings from 2021 ✨

Jan 04, 2022

Happy New Year, folks! 🤩 Hope you spent some time reflecting, celebrating, and acknowledging all that you were able to do in 2021.

Every week, I bring you my recommended reads and share my (meta?) thoughts on everything around metadata. Super excited to continue this journey through the evolution of the modern data stack in 2022.

Before I go back to our regular dose of Metadata Weekly next week, I wanted to ask you to share the newsletter with two of your friends/colleagues – help them start their new year with a dose of modern data! :)

In last week’s edition of the newsletter, I shared a list of my favorite blogs from the year 2021, along with some follow-up reading, to help you stay updated on this year’s emerging ideas around the modern data stack. Check it out here.

Maybe it's just me, but I still can’t honestly believe that we’re ALREADY in 2022! Wow. What a year it has been. 2021 was the first year that I managed to stick to a regular writing schedule and even launched this newsletter! So, given that this week might be the last week I can get away with a reminiscent post, I want to take a walk down memory lane and share my most-read blogs of 2021.

On data culture and dream data teams

Data teams are built from analysts, engineers, analytics engineers, scientists, business users, product managers, and more — all with their own tooling preferences, skillsets, and limitations. The result is a mess of collaboration overhead and data chaos.

This makes me believe that, as we enter 2022, the conversation needs to shift from the need for better tooling to the next “delta” that will finally help us create dream data teams — the modern data culture stack. These are the best practices, values, and cultural rituals that will help us diverse humans of data come together and collaborate effectively.

In this talk @ dbt Coalesce this year, I introduced the term "the modern data culture stack" and shared my experience building data dream teams, how we created our culture stack, and the best practices we found after lots of trials and error.

Read this blog here to learn how these practices enabled our team to innovate, build trust, and collaborate better. ✨

Additional Reading:

Data documentation woes? Here’s a framework by me
Building a data-centered culture at Ironclad by Jessica Cherny
How Postman’s data team works by me

On data governance

Data governance has a serious branding problem! It's seen by most as a way to impose control over data, processes, and people. It is usually framed around protection and risk (i.e. we have to govern our data to decrease our risk). Over the years, data governance has lost its identity. People fear it, when they should be celebrating it — because fundamentally it’s about creating better data teams, not controlling them.

As data teams become more mainstream, and the modern data stack has made it easier to ingest and transform data, the lack of data governance practices is one of the top barriers preventing data teams from being agile and driving impact.

For the first time, the need for governance is being felt bottom-up by practitioners, instead of being enforced top-down due to regulation. This bottom-up adoption is an opportunity for us to finally get data governance right. I wrote about this and more in my blog.

Additional Reading:

Overcoming data silos by Bita Motamedian
{Podcast} The keys to good data quality with Travis Lawrence

On active metadata and Data Catalogs 3.0

This year, we witnessed an incredible moment for #metadata in the modern data stack! Gartner took a huge step by scrapping its Magic Quadrant for Metadata Management Solutions and replacing it with a Market Guide for Active Metadata.

With Gartner's introduction to Active Metadata as a new category for the future, the modern data stack took a transformational leap in how it approached metadata. It will become the force behind augmented data catalogs, autonomous DataOps, data fabric and data mesh, data and analytics governance, and consumerization of data tools.

I believe the active metadata platform of the future will be built on a few principles:

One size doesn’t fit all in augmented data management: Every company, industry, and context is different, and a single ML algorithm won’t solve all your data management problems. Data Catalog 3.0 era tools understand this and build programmability and customization into AI/ML algorithms.
Open by default will drive infinite metadata-driven use cases: Metadata will be key to unlocking several futuristic operational use cases in the modern data stack, like auto-tuning pipelines and CI/CD pipelines. For this, the fundamental metastore needs to be open to allow teams to innovate. I'm really bullish about the idea of a metadata lake (a unified repository to store all kinds of metadata, in raw and processed forms, built on open APIs and powered by a knowledge graph) and excited to see this catching on in the industry!
Context should be embedded into teams’ daily workflows: Nobody wants to go to a separate "catalog" to get context about their data assets. Active metadata platforms understand this and ensure that context is available in the tools users use every day — BI tools, Slack, JIRA, and more!

Additional Reading:

Data Catalog 3.0: Modern metadata for the modern data stack by me
Data Catalog for Data Mesh by Arup Nanda
We failed to set up a data catalog 3x. Here’s why by me

On data strategy

There’s no one path to creating a data strategy. Every company is unique, every business is unique, every industry is unique, and so every company’s path is going to be unique. I think the key is examining your own needs and focusing on building the right "advantages" for you — i.e. the data investments that will help you build sustainable competitive advantages to outperform your competitors.

I scratched the Data Advantage Matrix, a framework to help leaders and companies figure out what types of data advantages they want to build and how far to advance them. When you think about finding your data strategy, start at the basics. Consider the different types and levels of advantages that you could build, start with the lowest hanging fruit that can create a meaningful impact, and just keep iterating from there.

Additional Reading:

The 4 kinds of “Data Moats” your company can build by me
Why do most analytics efforts fail by Crystal Widjaja

See you next week!

Metadata Weekly

Happy New Year!

Best of my readings and writings from 2021 ✨

On data culture and dream data teams

On data governance

On active metadata and Data Catalogs 3.0

On data strategy

Discussion about this post