Why the modern data stack will fail, and how generative AI will change everything — with Bob Muglia and Tristan Handy
Snippets from Atlan’s first Great Data Debate of 2023
Last week, I joined some of my favorite folks in the data ecosystem as a part of Atlan’s first Great Data Debate of 2023 to chat about where the data world is going this year. It was truly the highlight of my week!
In the first part of this debate, I chatted with Bob Muglia (former CEO, Snowflake) and Tristan Handy (Founder & CEO, dbt Labs) about the hottest topics in the modern data stack. This conversation wasn’t just fun but also gave me great food for thought on how data will change this year.
This year, I think that the modern data stack is going to go through some major changes. Not just because of economic changes, though I think that will prompt some important refocusing and consolidation. It’s also because generative AI recently burst onto the scene — a trend I’ve personally been following and thinking about a lot.
Keep reading for some of my favorite snippets from our chat on the future of generative AI and the modern data stack, or check out the full discussion.
We just published The Future of the Modern Data Stack in 2023. Check out the insights that Towards Data Science featured in their top 3 reads of last year. Download the report here.
✨ Spotlight: Snippets on the future of the modern data stack from Bob Muglia and Tristan Handy
How generative AI will change data
The TL;DR version
If you’ve been on the internet recently, you probably saw how OpenAI’s ChatGPT is taking the world by storm. From writing books and designing rooms to debugging code and explaining data concepts, it seems that automation and AI can now answer any question you ask. The result is a ton of hype around how generative AI will change every industry (e.g. education, healthcare, and cybersecurity).
“I think it's the biggest change I've seen in my lifetime.” —Bob Muglia
“I think it's going to be very possible to self-serve on the last mile of analytics with a ChatGPT-style interface. We've been needing that for such a long time.” —Tristan Handy
“I think one thing is, there's likely going to be a GitHub Copilot version for every kind of practice. What does the Copilot for a data engineer, analytics engineer, or analyst look like?” —Me
Bob Muglia:
I think it's the biggest change I've seen in my lifetime. I'll say that, and I'm older than most people — certainly in this panel. I never predicted that we would have systems that could understand English and respond to written language, the way we've seen progress in neural networks and deep learning and these generative AI systems.
Tristan Handy:
Back in undergrad, I worked for the University of Maryland Board of Regents, and I helped curate data about the 13 University of Maryland system schools and make it available to this board of regents. And this was all in Microsoft Access, it was very high tech.
My boss at the time was like, “It's great you've built this system that has drop-downs and stuff like that, but can't you just let the people type in a question and hit Enter into it?” And I was just like, “I know I'm only like a college junior, but I'm pretty sure that's not possible.” Just not at all possible today. I thought I would go through my entire 30-year career and it would continue to not be possible.
The thing I'm really excited about is: at least for some period of time, I think humans are going to continue to build business representations, dbt models, and once you build that kind of vocabulary in your system, I think it's going to be very possible to self-serve on the last mile of analytics with a ChatGPT-style interface. We've been needing that for such a long time.
Prukalpa Sankar:
I think that, to me, is the clearest part of the puzzle today — the final layer of the democratization, as a buzzword, of data or analytics. I think that it's very clear what the role of AI will do there, and I think that that final last mile can get addressed there.
What isn't as clear to me is what will the role of AI be in actually changing the lives of data practitioners. I think one thing is, there's likely going to be a GitHub Copilot version for every kind of practice, right?
Tristan Handy:
We ourselves are not working on anything like that... But I think there will be a lot of relevance to a Copilot-style interface to write the first 80% of a dbt model. I think that the last 20% is going to be increasingly challenging.
I think that's what we've seen with AIs tackling of a lot of real world problems. The first bit is very tractable at this point, and with self-driving cars, it turns out the last 1% actually matters a lot. I think that there will, in not too long, probably be an interface where you can get a long way, but then you'll need a human to actually make sure that it's correct.
Bob Muglia:
I think the “human in the loop” is a really critical element of these generative AI systems, and we're discovering that human-aided training is critical to get the kind of results that we need.
I'm super bullish. I'm very bullish on where this is going to go. Maybe overly bullish. I admit I've really drunk the Kool-Aid. But I do think it's a massive change, and we can now begin to model things.
These systems can help us model the semantics of data and ultimately of business, which were so difficult to achieve before, because they can work with so much information and help assist that process. I think that it's critical to what you're trying to do, Tristan, in terms of building the semantic model on top of dbt and the modern data stack.
Tristan Handy:
Prukalpa, I know that we're talking about semantic understandings, and it strikes me that the thing that you folks are spending all your time building is one particular interface — at least, the initial manifestation of this one particular interface to semantic concepts behind data.
I have to imagine that the idea of a catalog becomes different when it doesn't have to be static, like, here's this noun. It’s like, I can understand how all of these things are integrated together.
Prukalpa Sankar:
We actually launched this beta thing called Trident AI, powered on GPT-3. It basically auto-documents your data, and it auto-creates business meaning on top of your metrics, and things like that. We were surprised. It was an experiment, and we were surprised by the quality of results that we were getting.
In the governance world, I think everything changes. All the things that were really complicated to do, because of which you couldn't get the value that you needed downstream, suddenly changes. That can actually be transformative.
We think about this a lot: how do you actually activate metadata and get value from it, rather than spend all your time creating it? I think there’s a ton of value that can get generated.
Bob Muglia:
And these systems will start to act on the metadata too, right? They'll help us to make sure our systems stay in sync.
I think one of the major challenges that users have is understanding what's in the data systems that they have, maintaining and controlling them. All of the different tools and catalogs and things we have today are very early. They don't fit together quite as well as they need to. I think this is a key technology that will help enable it.
I continue to believe that we'll see database innovation as well to help us work with the complex relationships required, the graph-oriented relationships that you really can't handle in SQL. I think that's critical as we start to work with metadata — and, as you say, to activate metadata. But these machine-learning systems are a huge part of that activation. They will be doing a lot of that activation and driving a lot of that, as essentially data applications responding to data — metadata in this case.
Tristan Handy:
I think that what data has needed for a very long time is the ability to push the value that we all know is present in all of the systems that we work on into the hands of humans who are actually able to act on it or, or push it into systems of record who can take action without human intervention. So for anybody who is involved in building systems like this, I think seeing greater path to value is tremendously generative for all of us. I think it's a boon to the entire ecosystem.
Why the modern data stack will fail
The TL;DR version
We talk a lot about what's going well in the modern data stack and how much innovation is happening. Yet we’re still struggling with the same problems as we were a few years ago — self-service, democratization, governance, etc. Is it possible that the modern data stack is all hype, set to crash and burn in the coming years?
“I think the modern data stack is here to stay.” —Bob Muglia
“There's going to be evolution, but not revolution without some massive change.” —Tristan Handy“Without the modern data culture stack, we'll have the technology, but we still won’t reach the promised land.” —Me
Tristan Handy:
In order for the modern data stack to not work, it's not going to be displaced by something that is more scalable. The idea of the modern data stack is that you have cloud infrastructure — you bring your data to that, and you leave your data there, and the compute happens there, and it scales forever.
Until you have some very significant change in paradigm, which I don't know exactly where that's gonna come from, then we're going to keep doing what we're doing here. There's going to be evolution, but not revolution without some massive change.
Bob Muglia:
I think the modern data stack is here to stay. It will continue to improve — it's far from perfect, it has missing elements. But I think that the idea is going to be around for a very long time.
Eventually, we may work with something other than SQL to actually work with data. But I think it's when that evolves, 10–15 years from now, that we'll move forward from this. But I think this architecture is going to be present for a full generation or more of IT, which is 15–20 years. Once things get stuck in IT, they tend to stick around for a really long time.
Prukalpa Sankar:
I agree with what both of you're saying. There are two things that keep me up at night, like if 10 years later, everything was just a hype cycle.
One is recent, which is what is AI gonna do? I think every tool in the current modern data stack will have to fundamentally evolve. I think if that doesn't happen, then the modern data stack looks very different. So I think that's one, but hopefully we'll all innovate and that's a solvable problem.
The second one that really keeps me up at night is: all this technology innovation, it’s all to reach this promised land, right? And the promised land is, everybody makes data-driven decisions.
I think we've actually made ton of progress in the last five years on the data stack. I think the next evolution… My last Coalesce talk was about this: I think about the data culture stack, which is the people and the process and that side of it, and I don't think we're spending enough time actually even figuring that stuff out.
How does a data analyst grow? What does a career path look like? How do you run a data team standup? How does that even work? How do you work with your stakeholders? How do you actually do a PRD on a data project?
I think that stuff is really important to figure out. Otherwise we'll have the technology, but we still won’t reach the promised land. And then I think that that will lead to the Trough of Disillusionment.
Bob Muglia:
I think Atlan is doing a great job of helping with that. I think one of the attributes of Atlan that has always attracted me to the work that you and the team are doing is the fact that it really does involve people interacting with the data systems and how that interaction's taking place.
I've given Tristan a hard time, but dbt has had such a gigantic impact on the industry and will continue to have a gigantic impact.
It really is about how people work together with these systems. I mean, I'm a big believer in, in machine learning and AGI and everything else, but it, it is about people and the way we influence these systems that matter.
While these were my two favorite topics, we talked about so much more — hot takes on the latest buzzwords, whether the data mesh is hype or reality, the missing pieces of the modern data stack, and our predictions for 2025.
Then in part 2, Barr Moses (Co-Founder and CEO, Monte Carlo), Benn Stancil (Co-Founder, Mode), and Douglas Laney (Data & Analytics Strategy, West Monroe) had a great conversation about the future of data teams, culture, and ROI. Stay tuned for their thoughts in next week’s Metadata Weekly!
📚 More from my reading list
Product market misfit by David Jayatillake
Data contracts for the warehouse by Chad Sanderson and Daniel Dicker
Roadmapping as a tool for data leaders by Brittany Bennett
Introducing the metrics playbook by Ergest Xheblati
Context, and the lack thereof by Matt Arderne
Data governance, but make it a team sport by Maggie Hays
dbt: how we improved our data quality by cutting 80% of our tests by Noah Kennedy
The biggest data science, data engineering and analytics conferences not to miss in 2023 by Olga Berezovsky
P.S. Liked reading this edition of the newsletter? I would love it if you could take a moment and share it with your friends on social! If someone shared this with you, subscribe to upcoming issues here.