Improving GDPR compliance in hours, not months — Tide’s story of automating privacy
How a fast-growing fintech embedded privacy-first policies into automated processes with Atlan
One of the reasons, I’m so personally excited about active metadata is how it takes the use cases of metadata beyond just data catalogs. In the past year, one of my personal highlights has been working with amazing data leaders and seeing how they do amazing things with metadata automation.
Today, I’d like to spotlight Metadata Weekly on two such data leaders — Hendrik Brackmann and Michal Szymanski from Tide, a fast-growing mobile-first financial platform focused on small business users, who are using active metadata to enable GDPR compliance.
Why I love what they did:
Taking data protection from an “afterthought” to embedding it in their processes: Typically privacy and protection is an afterthought or discussed at the end of projects, which doesn’t tend to be very scalable. Instead, the team at Tide built a metadata-driven automation to make protection part of the way their data operates.
From manual (and human error–prone) to automated: Setting up this automation involved identifying and tagging PII data. Doing it manually would be accurate but time-consuming. Making assumptions would speed up this process, but it was too risky. A single automated but precise workflow was the answer.
Significant speed in time to go live: They estimated that doing this process manually would have taken 50 days. Instead, they did it automatically in just 5 hours!
Keep reading for the TL;DR or get the full story here. Happy reading!
✨ Spotlight: How Tide embedded privacy into automated data processes
“We wanted to embed data protection and privacy into our running processes, rather than discussing it at the end of projects.” (Michal Szymanski, Tide)
Like every company, it’s critical that Tide is compliant with GDPR. A key component is the right to erasure, more commonly known as the “Right to be forgotten”, which gives Tide’s customers across the EU and UK the right to ask for their personal data to be deleted.
This was important but far from easy. Whenever someone wanted to delete PII data, the production support team would go through Tide’s back-end databases and delete personal data fields. They had a script to handle a lot of this, but it didn’t catch everything. The script caught personal data in the key data source, but it had trouble capturing data from all the new sources that kept appearing in the organization. Tide’s team had to manually go through secondary systems to find and delete local projections of the personal data fields.
As Tide continued to grow, its technology stack and architecture grew more complicated, new products and services were introduced, and customers increased over time, this just took more time and effort.
In an ideal world, when a customer exercised their right to be forgotten, a single click of a button would automatically identify and delete or archive all data about the customer in accordance with GDPR. Immense manual effort, and the risk of delays or human error, would be eliminated.
And that’s what they built!
Here’s the TL;DR of their implementation:
Driving a common understanding: One thing complicating this process was different definitions of personal data. They brought teams together and built a glossary to document everything once and for all. As Michal explained, “We said: Okay, our source of truth for personal data is Atlan. We were blessed by Legal. Everyone, from now on, could start to understand personal data.”
Using automated column-level lineage: They used this to find where PII data lived and how it moved through their architecture. From Michal: “This was very useful. It showed us how much data we have in our data warehouse, and then we could also extrapolate this to the upstream sources of Snowflake. We knew we had it in Snowflake because it’s coming from this and this database. So we informed the teams that they had a lot of personal data and we needed to come up with a design.”
Automated PII tagging using Playbooks: Michal learned about Atlan Playbooks, a new capability we launched to create the first low-code, Zapier-like automations for data team workflows. Instead of spending 50 days manually identifying and then tagging personally identifiable information, Tide used Playbooks to identify, tag, and then classify the data in a single, automated workflow.
The Tide team was ready to spend 50 days of effort on a task that would make clear improvements to Tide’s risk profile. But after integrating their data estate with Atlan and driving consensus on definitions, they used Playbooks’ automation to accomplish their goal in just a few hours.
Here’s a nugget of advice for fellow data leaders from Hendrik to wrap up: “Focus on business value, and the actual value you’re generating for your organization rather than finding a process everyone in the industry follows and adopting the same thing. Don’t try to do governance everywhere. Figure out what data sets are relevant to you, and focus on these ends.”
📚 More from my reading list
Looking past data infrastructure - how to deliver value with data by Seattle Data Guy (a conversation with friend of Metadata Weekly, Gordon Wong 💙)
Understanding the business as a data analyst w/ Olivia Höwing on the Analytics Anonymous podcast
Meaningful metrics: how data sharpened the focus of product teams by Erin Gustafson
The missing features in your data product by Chad Isenberg
On data products and how to describe them by Max Illis
The disillusionment of data careers by Ergest Xheblati
The evolving role of the data engineer by Tony Baer
On sight and insight by Stephen Bailey
Big data is dead by Jordan Tigani
P.S. Liked reading this edition of the newsletter? I would love it if you could take a moment and share it with your friends on social! If someone shared this with you, subscribe to upcoming issues here.