The future of the modern data stack in 2023
4 new emerging trends and 6 big trends from last year that you should know
It’s that time of year. No, not for making resolutions — though Metadata Weekly is on a mission to lose 10 pounds of buzzwords by summer 😛 It’s the time for reading all the annual roundups, looking back on the year that was, and eyeing the future with cautious hope.
During the holidays, I spent some downtime reflecting on what happened in the data world last year. And let’s be real — it was a wild year.
The data community was busy stirring up controversy with hot takes, debating tech and community, raising important conversations, and duking it out on Twitter with Friday fights. We were in growth mode, always searching for the next new thing and vying for a chunk of the seemingly infinite data pie.
But now, as we look into 2023, we can see a different world, one of recession and layoffs and budget cuts that 98% of CEOs expect will last 12–18 months. Companies are preparing for war, amping up the pressure and shifting from growth mode to efficiency mode.
In 2023, we’ll face a new set of challenges — improving efficiency, refocusing on immediate impact, and making data teams the most valuable resource in every organization. So what does this mean for the data world?
In Atlan’s annual report, we broke down the 10 big trends in the modern data stack this year: 4 emerging trends that will be a big deal in the coming year, and 6 existing trends that are poised to grow even further. The TL;DR is below, or get the full report here.
I’m excited to share that there will be some new names in Metadata Weekly soon. This year, I’ll be opening up this newsletter to guest authors — some of my amazing colleagues who have been helping to build this newsletter over the past year. I can’t wait to hand over the mic and let them share their data stories and insights!
✨ Spotlight: The future of the modern data stack in 2023
4 new trends that will emerge in 2023
👉 Optimizing data spend will become a major priority.
Storage has always been one of the biggest costs for data teams. Snowflake and Databricks have already started investing in product optimization, and we’ll see more improvements to help customers cut costs in 2023. We’ll also see a lot more tooling from independent companies and storage partners to further reduce data costs.
👉 Data teams will start being run around ROI and metrics.
In the past few years, data teams have been able to run free with less regulation and oversight, powered by an overwhelming belief in the power and value of data. However, as budgets tighten, data teams and their stacks will get more attention and scrutiny. In 2023, companies will get more serious about measuring data ROI, and data team metrics (e.g. proxy metrics around usage, satisfaction, and trust) will start becoming mainstream.
👉 The modern data stack will start consolidating.
For years, the modern data stack has been growing. However, in 2023, we’ll see fewer data companies and tools launching and slower expansion for existing companies. With limited funds, companies will have to focus on what they do best and partner with other companies for everything else, rather than trying to tackle every data problem in one platform — which will lead to the creation of the “best-in-class modern data stack” in 2023.
👉 Modern data stack companies will start expanding into on-prem connectors.
To expand their revenue, modern data stack companies will have to start reaching outside their comfort zone to find new customers. In 2023, the modern data stack will start to integrate with Oracle and SAP, the two legacy, enterprise data behemoths. (This may sound controversial, but it already began with Fivetran’s acquisition of HVR.)
6 trends that will carry through from 2022
👉 Active metadata will replace the “data catalog” category.
Where once this category was new and mostly ignored, many companies are now competing to claim it. This happened in part because analysts latched onto and amplified the idea of active metadata, and the market started to clearly separate “modern” data catalogs from traditional ones last year. As the data world aligns on the importance of modernizing metadata, we’ll see the rise of a distinct active metadata category, likely with a dominant active metadata platform.
👉 Data contracts and data governance will start shifting “left”.
While data contracts are an important issue in their own right, they’re part of a larger conversation about how to ensure data quality. In 2023, data governance will start shifting “left”, and data standards will become a first-class citizen in orchestration tools. Major tools have recently made changes that support this idea (e.g. dbt’s yaml files, Airflow’s Open Lineage, Fivetran’s Metadata API, our GitHub extension), and we’ll see even more in the coming year.
👉 The semantic layer will enter “adoption” mode, albeit slowly.
In October 2022, dbt Labs made a big splash at their annual conference by announcing their new Semantic Layer. This was a huge step forward for the modern data stack since it paves the way for metrics to become a first-class citizen. Progress has been measured since then, but in 2023, we expect the first set of Semantic Layer implementations to go live.
👉 Data activation will replace CDPs as marketing spend becomes more important.
In 2022, some of the main players in reverse ETL worked to redefine and expand their category with “data activation”, a new take on the “customer data platform” (CDP). In a world where data is commonly stored in a central data platform, CDPs and other specialized, pre-built SaaS data platforms are losing ground. Instead, the idea of data activation becomes powerful, where data can be “activated” from the warehouse to handle both traditional CDP functions and other diverse use cases.
👉 The first wave of data mesh implementations will start going live.
The data mesh was everywhere the last few years, and in 2022, the conversation started to shift from “What is it?” to “How can we implement it?” We’re still in the early phases, but in 2023, we predict that the first wave of data mesh “implementations” will go live, with “data as a product” front and center. We’ll start seeing more real data mesh architectures, and the data world will start to converge on a best-in-class reference architecture and implementation strategy for the data mesh.
👉 Data observability and quality will converge in a “data reliability” category.
In 2022, data observability continued to grow alongside adjacent categories, with existing companies getting bigger, new companies going mainstream, and new tools launching every month. As all of these companies compete to define and own this space, we’ll continue to see more confusion in the first part of 2023. However, in the near future, data observability and quality will start to converge in a larger “data reliability” category centered around ensuring high-quality data.
🔥 Don’t miss out: January with Bob Muglia, Doug Laney, and more
We’ve got a great slate of events planned to help you start the new year right! Kick off January with candid conversations and spicy debates about the year ahead in data.
JAN 12: Live Q&A on the secrets of a modern data leader
Coming this week!
There’s no playbook for modern data leaders. As a community, we’re still figuring out what it means to be a great data leader — and this is even harder in times of pressure and turmoil, like this year. Whether you’re looking to do better in 2023 or starting new in the role, this candid chat with three experienced data leaders will help you feel less overwhelmed and more ready to take on the year ahead.
Erica Louie (Head of Data, dbt Labs)
Gordon Wong (Founder, Wong Decision Intelligence; former BI Leader, HubSpot & Fitbit)
Stephen Bailey (Data Engineer, Whatnot)
JAN 24: Hot takes on the future of the modern data stack
We’ve assembled a panel of rockstars to share their thoughts and advice on the future of data in this year. Get ready to chose sides, learn, and challenge the panels in our first Great Data Debate of 2023!
Barr Moses (CEO & Co-Founder, Monte Carlo)
Benn Stancil (Chief Analytics Officer, Mode)
Bob Muglia (former CEO, Snowflake)
Doug Laney (Innovation Fellow, West Monroe; author of Infonomics & Data Juice)
Tristan Handy (CEO & Founder, dbt Labs)
📚 More from my reading list
As we strolled into a new year, a lot of people were thinking about what data people really need to focus on in 2023. Is it our data stack, culture, team organization, company processes, or something entirely different?
Elbows of data by Katie Bauer
Dear stakeholder by David Jayatillake
The meaning of metrics by Sean Byrnes
The modern data graph by Stephen Bailey
Data science has a tool obsession by Randy Au
Money for somethin’ by Joe Reis
The future history of data engineering by Matt Arderne
P.S. Liked reading this edition of the newsletter? I would love it if you could take a moment and share it with your friends on social! If someone shared this with you, subscribe to upcoming issues here.