🤩 Reading list: The top 5 must-read data blogs from 2022
A curated list of the year’s best articles from the modern data stack ✨
Just like that, we’re at the end of 2022! And what a rollercoaster ride it has been with major changes and uncertainty across every industry. (Especially for the bird app users 😛)
A lot happened in the world of the modern data stack this year. We talked about job titles, thought about saying goodbye to data science, debated centralized vs. embedded data teams and bundling vs. unbundling, kickstarted important discussions like the technical pay gap, and so much more.
Continuing our tradition from last year, we’re sharing the top blogs from 2022 along with some follow-up reading to keep you thinking. Happy reading!
P.S. Metadata Weekly is going on holiday next week. We’ll see you back here in 2023!
🖥️ On data as a product
Data product in changing environments: rethinking and updating investments by Eric Weber
“The last few years have been full of ‘here’s what we need to do next’ or ‘once we have this team, we can do this’. We plan how we’d support more personas and areas of the business with more investment, but we don’t think about what we’d do if we had to cut support. I get it. That doesn’t feel very comfortable. But just like succession planning for people, we need to have a plan for what we’d do in hard situations. In some cases, you might drop support for particular personas on a product. In others, you might drop support for a product altogether. It isn’t easy to say what the ‘right answer’ is. But spending time thinking about your answer is important.”
More follow-up reading:
Making data actionable: the immense challenge of good data products by Eric Weber
What’s the big deal about data products? by Willem Koenders
Building more effective data teams using the JTBD framework by Emilie Schario
Types of data products by Luke Lin
🛠️ On working with data
Should we be grateful for the modern data stack? by Benn Stancil
“That’s the paradox we need to solve. Why has data technology advanced so much further than value a data team provides? Does all of this new tooling actually hurt, by causing us to lose focus on the most important problems (e.g., the data in Salesforce) in favor of the shiny new things that don’t actually matter (e.g., the data in our twenty-fifth SaaS app)? Has the industry’s talent not caught up with the capacity of its tools, and we just need to be patient? Is the problem more fundamental? I’m not sure. But if our 2032 selves want to be as grateful for 2020s as we should be for the 2010s, those are the next questions we need to answer.”
More follow-up reading:
How to design your data stack for curiosity by Amit Prakash
Data management is context management by Randy Au
Build or buy: how we developed a platform for A/B tests by Olga Berezovsky
Data systems tend towards production by Ian Macomber
Not all data requests are urgent, so start by asking these 5 questions by Marie Lefevre
📜 On data contracts
The rise of data contracts by Chad Sanderson
“Data Contracts are API-like agreements between Software Engineers who own services and Data Consumers that understand how the business works in order to generate well-modeled, high-quality, trusted, real-time data.
Instead of data teams passively accepting dumps of data from production systems that were never designed for the purpose of analytics or Machine Learning, Data Consumers can design contracts that reflect the semantic nature of the world composed of Entities, events, attributes, and the relationships between each object.
This abstraction allows Software Engineers to decouple their databases/services from analytical and ML-based requirements. Engineers no longer have to worry about causing production-breaking incidents when modifying their databases, and data teams can focus on describing the data they need instead of attempting to stitch the world together retroactively through SQL.”
More follow-up reading:
An engineer's guide to data contracts - pt. 1 by Chad Sanderson and Adrian Kreuziger
An engineer's guide to data contracts - pt. 2 by Chad Sanderson and Adrian Kreuziger
Why data contracts are obviously a good idea by Yali Sassoon
📚 On building and leading a data team
Growing data teams from reactive to influential by Emily Thompson
“Data teams tend to be a fairly scrappy bunch, and often default to rolling up their sleeves and building what they need in order to get unblocked. But there is an opportunity here to start influencing roadmaps on other teams. Rather than filling in the technology gaps themselves with messy workarounds, my team’s charter also prescribed that they make technical recommendations to the teams we depended on.
Because the data team was now required to proactively drive the conversation, they made the time to work with partners and propose cross-functional solutions. Foundational work was considered part of the backlog of ‘impact-driving’ work, which led to specific quarterly goals, and progress was tracked just as every other initiative owned by the data team.”
More follow-up reading:
Good data citizenship doesn’t work by Benn Stancil
Managing the first year by Alex K Gold
How I learned to stop worrying and love being a manager by Brittany Bennett
Executing a data strategy with OKRs by Chris Brown
Dealing with difficult stakeholders by Oscar Baruffa
Leaders show their work by Ben Balter
BONUS: We talked with four amazing data leaders — Stephen Bailey (Data Engineer at Whatnot), Erica Louie (Head of Data at dbt labs), Taylor Murphy (Head of Data at Meltano), and Gordon Wong (Founder of Wong Decision Intelligence; formerly Senior Leader of Business Intelligence at Hubspot) — about what it takes to succeed in your first 365 days as a data leader. Download the Secrets of a Modern Data Leader ebook here.
📂 On metrics, data catalogs, active metadata, and more
People-first data stacks by Ilan Man
“The problem is your stakeholders, while giving you the thumbs up the whole time and claiming they’d love an easier way to discover data, are no longer using the tools you’ve painstakingly researched and implemented. They fall into their old habits and inevitably you see an incorrectly defined metric on a Powerpoint slide somewhere.
We need to ensure stakeholders adopt data tools in the ways they should. Reading documentation and taking a training is not enough. We need to reinforce good data-tooling hygiene. I’ve seen many instances of folks starting out in a BI tool, and a few months later they’re back in Excel, pivoting a CSV and pasting it into a presentation. There should always be room for creative solutions and serendipity, but the Data team needs to keep an eye on how stakeholders use the tools they implement. Data models and BI tools need to adapt to business changes.”
More follow-up reading:
Data's trillion dollar question mark by Benn Stancil
How to measure data quality by Mikkel Dengsøe
The many layers of data lineage by Borja Vazquez
The future of data catalogs by Prukalpa Sankar (aka me!)
✨ Bonus picks ✨
The important purple people outside the data team by Mikkel Dengsøe
A framework for embedding decision intelligence into your organization by Erik Balodis
AI is not coming for analyst jobs anytime soon by Amit Prakash
Manifesto for the data-informed by Julie Zhuo
Why are we still struggling to answer how many active customers we have? by Seattle Data Guy
Data teams: break out of your bubble by Mary MacCarthy
The future history of data engineering by Matt Arderne
Why it matters where you randomize users in A/B experiments by Adam Stone
Special shoutout to everyone who shared their data experiences, learnings, views, and observations this year! Now’s the time to have more open conversations about what we want for the future of data, and we’re so thankful for all the data practitioners who give their time to share insights, spark debate, and keep our industry moving forward.
🤖 Last week in Atlan: Supercharged automation for your data estate
In last week’s Atlan Activate, our quarterly product webinar, we launched a ton of new automation features to superpower your data and reduce the manual work that slows data teams down. ICYMI, here are the five new features in Atlan you should know about:
Metadata Playbooks for rule-based actions: Like Zapier for data, this is the first low-code/no-code metadata automation for data teams.
Atlan + AWS EventBridge event-based actions: Create production-grade, event-driven automations for the world of metadata, such as alerts when ownership changes or auto-tagged classifications.
Profiling and Popularity Insights: Use new column-level profiling, popularity, and usage metrics to assess data’s quality, find the most widely used queries, identify top users, and more.
Atlan to GitHub integration: Bring metadata right to GitHub to minimize risk and increase transparency before any changes are made to your data.
Trident AI powered by GPT-3: Say goodbye to manual documentation with increasingly intelligent automated descriptions, business terms, READMEs, and more.
Have any interesting articles that especially got you thinking? I’d love to read them! Just send them my way on LinkedIn.
P.S. Liked reading this edition of the newsletter? I would love it if you could take a moment and share it with your friends on social.