Discover more from Metadata Weekly
The past, present, and future of the semantic layer 🚀
What is a semantic layer and why should you care?
If you caught last week’s issue of this newsletter, you know that we, along with much of the data community, were at dbt Coalesce a couple of weeks ago. It was a really special week for us, not just because of the incomparable dbt community but also because something we were super excited about finally came to life.
But what actually is it? The term “semantic layer” (also known as a “metrics layer”) has been around for decades — dbt didn’t invent the concept, nor the word, but their version is certainly worth paying attention to. In today’s Metadata Weekly, we’ll break down what a semantic layer is in simple terms and why you should care about dbt’s Semantic Layer.
✨ Spotlight: What is a semantic layer, and where did it come from?
Semantic layer is a very literal term – it’s the “layer” in a data architecture that uses “semantics” (words) that the business user will actually understand. Instead of raw tables with column names like “A000_CUST_ID_PROD”, data teams build a semantic layer and rename that column “Customer”.
Back in the day (I’m talking about the ‘90s and early 2000s), we had pretty basic data tech. It was very slow and very hard to use if you didn’t have a deep IT background. Big companies like IBM, SAP, and Oracle built Business Intelligence (BI) tools, which would take smaller chunks of data from a clunky data warehouse and let IT people build these semantic layers for business users.
Essentially, they were more human-readable data layers for business users. This was a necessity because trying to run even a basic report across an entire data warehouse could take hours or even days. (Yes, days.)
Enter the first problem: old-school semantic layers took wayyyyy too long to build, since people depended on IT to set up and modify them. To make matters worse, they were cumbersome to maintain since business needs were always changing.
The business users’ solution… export to Excel! Enter fancy new BI tools like Tableau, Qlik, and Power BI. The theory was that if we empower the business users to “self-serve” with low-code or no-code BI tools, the IT bottleneck will go away and analytics will officially be democratized! At least, that was the idea.
Enter the second problem: we abandoned the semantic layer concept for years, in favor of agility.
Unlike old IT tools, more personas could buy and use these new BI tools. Instead of 1 BI tool using 1 semantic layer, built by 1 team from 1 data warehouse, we had multiple BI tools, being used by all kinds of teams with no real semantic layer.
Just picture this scenario, which probably seems all too real to most data people. I bring my Tableau dashboard to a meeting, someone else brings their Excel workbook, and someone else brings a Power BI dashboard. We all then show different numbers for “total revenue last quarter”. Uh oh!
After years of alternately ignoring and chasing the self-service BI dream, this topic blew up in the data world again. People across the data community were talking about the “missing metrics layer”, and companies like Airbnb, LinkedIn, Uber, and Spotify announced that they had been building home-grown metrics platforms.
This was such a hot topic that we flagged this as one of the six big ideas from 2022 in our Future of the Modern Data Stack report, and hosted a Great Data Debate on the topic with a fiery discussion between Drew Banin and Nick Handel.
The result has been a big open question in the data and analytics world — how can we bring back all the great things that IT loved about semantic layers (consistency, clear governance, and trusted reliable data) without compromising the agility that analysts and business users demand?
The dbt Semantic Layer
Enter dbt Labs and its new Semantic Layer The core concept: define things once, and use them anywhere.
Why does that make people happy? This brings the concept of a semantic layer and its universal metrics into dbt’s transformation layer. As dbt Labs put it, “Data practitioners can define metrics in their dbt projects, then data consumers can query consistently defined metrics in downstream tools.”
Regardless of what BI tool they use, analysts and business users can then grab data and go into that meeting, confident that their answer will be the same because they pulled the metric from a centralized place.
P.S. We just launched a new integration with the dbt Semantic Layer to bring metrics into the rest of the modern data stack and increase context, visibility, and self-service for diverse data teams. Learn more here 👉
😈 Just in time for Halloween…spooky data scares
On one dark and stormy day, a data team was sure they prepared everything for an important review meeting with their CFO… but they woke up with a dozen Slack messages, a broken dashboard, and frantic calls from stakeholders. 😰
Cold sweats and confusing looks – the meeting turned out to be a nightmare. So… which data monster attacked?
Check out what some of today’s data devils can look like. We’re screaming, erm… streaming them here. 👉
🌎 Upcoming: Atlan in the wild
Catch the Atlan team at the Snowflake Data World Tour in Toronto (today!) and San Francisco (November 7). Learn more and register here.
Missed our webinar with Snowflake, Fivetran, and dbt about the role of metadata in the modern data stack? We got you covered. Watch the recording here.
P.S. Liked reading this edition of the newsletter? I would love it if you could take a moment and share it with your friends on social.