There is no single “single pane of glass” in the data world, WTF happened at Data Council, and more

✨ Spotlight: What is a dataset?

Apr 05, 2022

Welcome to this week's edition of the ✨ Metadata Weekly ✨ newsletter.

Every week I bring you my recommended reads and share my (meta?) thoughts on everything metadata! ✨

If you’re new here, subscribe to the newsletter and get the latest from the world of metadata and the modern data stack.

There is no single “single pane of glass” in the data world

I was on a panel at this year’s Data Council with other data leaders from the community – all of us sharing our own thoughts on what a dataset actually is, and which aspect of the data stack should own the “single pane of glass” or single source of truth about our data assets.

My take: There is no single “single pane of glass” in the data world.

Here’s why: Data operates at the intersection of technology and business. Successful data projects require very different personas and skill sets — data engineers, analysts, scientists, and business users. Each of these personas are incredibly different, and context means different things to each of them.

For example, consider something as simple as a “table”. For a data engineer, the context that matters is, where does this data asset come from? Are the connected pipelines working or did they break? Was the data delivered as per SLAs?

For the same table, the context for a data analyst means something fundamentally different. The analyst wants to know, what do the column names mean? What are the “gotchas” I should know about this table before I start analyzing it? Are there missing values? What is the frequency distribution for this variable? What are the other kinds of analysis that have been driven from this table?

The “single pane of glass” for an analyst is incredibly different from the “single pane of glass” for a data engineer. I could go on about how the experience should be different for an analytics engineer, and so on.

Our problem in the data tooling space is that we haven’t fully embraced that diversity isn’t a bug. It's a fundamental feature of our data teams. And when we realize that this diversity is our biggest strength, we can stop approaching data tooling as an either/or problem (e.g. building for data engineers or analysts) and instead build tools that make it easier for these diverse personas to come together and collaborate seamlessly.

I believe there’s a lot we have to learn here from consumer experiences like Netflix. For example, each and every one of us has a unique personalized experience on Netflix. Can’t we create similar, personalized experiences for different data users based on their roles, teams, and projects? We absolutely can, and this is where principles of metadata activation and personalization really come alive.

So my hot take: We don’t need a single “single pane of glass”. We need a common platform that allows us to deliver personalized experiences to diverse data people 🙂

On that note, if you’re like me and geek out on the future of information discovery, I really enjoyed the article from Minn Kim on “Personalized discovery engines for knowledge”.

“The next massive search engine will be a personalized discovery engine. As we experiment with new platforms like AR/VR and explore new models of engagement in web3, I imagine that the next wave of human-computer interaction will focus more on ‘pushing’ us the right, relevant information versus the user ‘pulling’ for information... This Internet of the future will feel a lot more like a collaborative library — a place to browse and unearth ideas, publish your thoughts, encounter new opportunities, and make connections across bodies of knowledge.”

❤️ Fave links from last week

WTF happened at Data Council and what does this mean for our community?

Everybody has been asking, what really happened at Data Council?! There were some Wednesday fights, some FOMO... As Drew called out in his post a few weeks ago:

“Data Council has long billed itself as ‘the no-bullshit data conference’: it has historically been a conference of, by, and for practitioners featuring deeply technical talks on the state of the art in data. This year was... I don’t know... not that. The resounding takeaway that I heard from people in the hallways and happy hours was: ‘There are a lot of vendors and VCs here.’ I felt it too; it was palpable and ironic in a ‘you’re not sitting in traffic, you are the traffic’ kind of way, but it was true nonetheless.”

After two years of the pandemic, this was our first in-person event as a community. And for better or for worse, it wasn’t how our data community expected it to be. If I’m being honest, I had moments where I asked myself existential questions — “Is this modern data stack thing real or is it just a group of vendors fueled by VC money who are building up this crazy hype?”, “Are we living in an echo chamber?”, “Is data twitter really a thing, or is it the same people, saying the same thing?”, “Do data leaders actually care about unbundling vs bundling?”, “Where are the data practitioners in this whole thing?”

Anna Filippova’s take in this week in the Analytics Engineering Roundup is the most nuanced take on the subject, and I think I fully subscribe to her take.

"This is a normal cycle in an industry. The folks that gathered last week are just one cohort. The folks starting new jobs in applied data today are another... There were over 128K jobs of various levels posted last week in the US alone.”

The reality is that many data practitioners (especially the vocal ones) have made the transition from practitioner to vendor – mainly due to their ability to take what worked in a single organization and share it across the ecosystem. The current cohort is made of a core set of people who all know each other and are (mostly) friends. We all follow the same people on Twitter, read the same Medium posts, and follow the same Substack newsletters. We understand the insider jokes about "Friday fights" and "bundling & unbundling".

We often forget that Twitter and these newsletters are not the data universe. And if we’re not careful, we’ll end up with an insider track that is an echo chamber. This is why I loved Anna’s commitment to discovering new communities and encouraging non-tool-specific meetups.

My ask: Let’s all commit to meeting 5 people who aren't from regular "Data Twitter" every month to make our community bigger and more diverse!

Executing a data strategy with OKRs by Chris Brown
The rise of data reliability engineer by Alvin Lee
The 10 steps to building a great data team by Ethan and Ben Rogojan (aka The Seattle Data Guy)
Guardians of the Data by David Jayatillake
AI is not coming for analyst jobs anytime soon by Amit Prakash

I’ve also added some more to my data stack reading list. If you haven’t checked out the list yet, you can find and bookmark it here.

🎧 Podcast recommendations

The bundling vs unbundling debate: If you have been following this discussion on Twitter, Reddit... and everywhere else, you will like this chat between Tristan Handy (dbt Labs), Benn Stancil (Mode), and David Jayatillake.
On interoperability, governance, and divergent data teams: I chat with Sam Ramji for the amazing DataStax podcast on the power of technology/software, open-source data, and more. Tune in here.

🤓 Lastly, this tweet for all data viz nerds

And don’t forget to follow the #30DayChartChallenge on Twitter.

I'll see you next week with more interesting updates from the modern data stack! 👋 Meanwhile, you can subscribe to the newsletter on Substack and connect with me on LinkedIn here.

Share Metadata Weekly

Metadata Weekly