The Data Advantage Matrix, reshaping data engineering (and the rise of the metadata engineer), measuring analytical work, and more
✨ Spotlight: Data Advantage Matrix – A New Way to Think About Data Strategy
Before we jump into the links, I wanted to take a moment to thank you for the lovely messages and amazing response to the first edition of this newsletter! It means the world. 💙
✨ Spotlight: Data Advantage Matrix – A New Way to Think About Data Strategy
“How do I get started with my data strategy? Where do we start? What do we prioritize?”
As the co-founder of two data startups, I hear these questions often. But here’s the painful truth that I’ve learned from working with data leaders on over 200 data projects: There’s no one path to creating a data strategy.
Instead of looking at what other companies have done or focusing on a potential project's ROI, I think the key is examining your own needs and focusing on building the right "advantages" for you — i.e. which data investments will help you build sustainable competitive advantages to outperform your competitors?
I scratched the Data Advantage Matrix on a piece of paper when I was helping a fellow startup founder think through their data strategy. It's a framework to help leaders and companies figure out what types of data advantages they want to build and how far to advance them.
Want to know how this works? I wrote an in-depth article about these 4 types of data advantages and 3 stages of depth, plus examples of how two different kinds of companies — a SAAS company and a cab aggregator — could go about building a data strategy. Read the full article here.
❤️ Fave Links for This Week
How the Modern Data Stack is Reshaping Data Engineering by Max Beauchemin
Max, in his article about trends reshaping data engineering, wrote about the specialization of data roles — a topic that I've been following closely. His article aligns with the sentiment I shared in my last newsletter, that there's a lot that data teams can learn from software engineering teams in building specialization.
"Given the greater investment in data teams, we’re seeing further specialization around 'data professionals' (which I define as people with the word 'data' or 'analytics'-derivative somewhere in their title). 'Data ops', 'data observability', 'analytics engineering', 'data product manager', 'data science infrastructure'.
A clear trend is developing around applying some of the DevOps learning to data and creating a new set of roles and functions around 'DataOps'. We’re also seeing all sorts of tooling emerge around DataOps, data observability, data quality, metadata management. Extrapolating on this trend, maybe it’s just a matter of time before someone writes about 'the rise of the metadata engineer'."
The interesting thing that caught my eye was the title of "metadata engineer" suggested by Max. Now I realize that this might be a pun, given Max's very famous "rise of the data engineer" blog post from a few years ago. That being said, I've started to see titles like "Data Engineer, Metadata" and "Data Platform Lead (Metadata)" or even Stripe's cool "Backend Engineer, Metadata") come up more and more.
As metadata grows more ubiquitous, transforms from passive to active, and powers more use cases like data discovery, lineage, observability and even pipeline tuning, I wonder if there's actually a case to be made for a specialized metadata engineer.
What would a metadata engineer do and how will their role be different from a traditional data engineer? Well, I'm calling dibs on writing the blog post titled "The Rise of the Metadata Engineer"! 🙋
A Method for Measuring Analytical Work by Benn Stancil
I've always found it ironic that it's so hard to measure the true value of a data team when our jobs are about making things measurable. In this blog post, Benn argues that the key metric for data analysts should be speed — how quickly can they drive a decision.
"The moment an analyst is asked a question, a timer starts. When a decision gets made based on that question, the timer stops. Analysts’ singular goal should be to minimize the hours and days on that timer. That’s the only metric we should measure ourselves on—straight up, without qualification or caveat. The lower the number is across all the decisions we’re involved in, the better we’re performing."
Here's my take: I fully agree that analysts' true impact is driving decisions because it forces them to truly understand the problem they're solving. (In our data team days, we had a value called "problem first, solution second" to remind us of this North Star.) I also believe that velocity is one of the only truly measurable things that a data team can and should measure.
Here's where I disagree: I don't think it's fair to tie success only to the speed at which an analyst can drive a decision. That's because some data problems and questions are inherently more complex than others. Answering a (relatively) simple operational question about why there's a drop in ARR numbers is far easier than answering a complex strategic question like where to expand a company's operations. It’s in these scenarios where I believe the proposed metric fails.
For example, I'm pretty sure that my data project with the greatest impact was when I worked with India's national government to open 10,000 new clean cooking fuel distribution centers (watch the TEDx talk here). It was an incredibly complex problem, balancing profitability and accessibility, and took several months and iterations to get right. As the analysts working on that project, we only drove 1 decision in 3 months. That's a slow velocity, but is it fair to say that it wasn't impactful?
Looking for more articles? Read my full list of reads from this week on my Notion.
There are a lot of super interesting jobs open in amazing data teams. Check some of the full list of curated roles out there in this weekly list curated by my partner in crime Surendran.
There's so much happening in the fintech space right now! 😍 Prachi and her team at Splash Financial are hiring a Senior Data Engineer to help build their digital lending platform.
Chris and the Bambee team are looking for a Senior Data Analyst (based in Los Angeles, California) who will support and enable the company's data team, as well as build and develop data models.
Brett and the Classy team are hiring a Product Analytics Lead (US remote) to head the team's Product Analytics function.
Stay tuned till next Tuesday for more interesting stuff around the modern data stack. Meanwhile, follow the usual drill... like, share, subscribe. 😉 You can also connect with me on LinkedIn here.
....maybe a "metadata engineer" would be modelling everything out of metadata. This would then require all data to be registered and structured based on metadata in a way where all metadata can interact. I guess that means bringing all data structures and associated metadata up to the same logical abstraction level, preferably within the same system......?