Driving Trusted Self-Service Data
Residential construction technology leader demystifies how to evaluate the Active Metadata Management market, then implement it successfully
Having supported more than one million construction professionals since its founding in 2006, Buildertrend offers market-leading construction management technology, providing project and materials management, financial tools, and sales and service support for more than two million construction projects across the globe.
In this edition of the newsletter, read more from Preston Badeer (Director of Data Engineering at Buildertrend) on how their team’s data stack evolved and how they implemented a data catalog (Atlan) to drive self-service and trust in their team.
You can read the complete story here →
For five years, Preston’s role at Buildertrend has been that of a “jack of all trades”, initially joining as a Product Strategist, working closely with a two-person data science team to ensure strategy decisions were data-driven. Moving into a blended role of Data and Product Strategy, Preston then worked to commercialize new data products for Buildertrend, before joining a burgeoning data team as a Data Architect, and then Director of Data Engineering.
“I like to attach myself to the biggest problem I can find and that I feel like I can have an impact on,” Preston shared. “And as I moved into the data team, it became clear that the biggest thing I could have an impact on was enabling our data scientists to do more, faster, with better data engineering. We didn’t have any tools and didn’t have any sort of documentation. It was just, kind of, the wild west.”
Starting with just two Data Engineers under the data science team umbrella, Preston was tasked with building a team to support all 20+ data scientists and 10+ customer researchers and help Buildertrend live up to the high expectations they had for their enterprise data.
With an initiative underway to ensure every team at Buildertrend’s work was customer-centric and data-driven, continuing to rely on their data science team to support not only their own work, but everything from data engineering to responding to requests for data, was untenable.
“The goal for the team that I’m on is to democratize our data. We’ve gotten to a point where the data science team can’t keep up, nor can they scale fast enough to serve the data needs of everyone in the company. We’re trying to split the load, and make what we do with data more scalable. But we really want to get more data into the hands of the business. If they want an answer to a question, they won’t have to submit a ticket and wait. They can find answers really quickly on their own, and then use Data Science for what they’re great at, which is more complex analysis and modeling.”
An Evolving Data Stack 🌀
Buildertrend’s data technology has grown by leaps and bounds. Mere years ago, their data scientists would create notebooks on their local machines, writing basic Python scripts, or queries in SQL Server. To better support their analysis, the team adopted Tableau, but was still writing queries against a replica of their production databases, and then publishing reports.
“The first major change we did in tooling was an enterprise data science environment. We ended up buying Dataiku, and that made a huge difference. We stopped throwing spreadsheets around and were storing tables for intermediate transformations,” Preston shared.
The adoption of cloud-based, collaborative tooling meant that Buildertrend’s data team were now utilizing shared resources, could back up their work, and could share their analysis collaboratively. But their next leap forward would take the form of a data engineering function and technology stack.
“Our philosophy is to avoid tribal knowledge and specialization as much as possible,” Preston explained. “Everyone on the team should be able to pick up any project that anyone has worked on without any kind of ‘Joe knows about that thing and he’s on vacation,’ or ‘I know you’re on vacation, but only you know this so I’m going to bug you,’ anymore.”
With a consistent work environment and toolset, Buildertrend’s data engineers can simply pick up a ticket, are well-versed in team best practices and coding frameworks, are provisioned access to IDE plugins and standards, and can simply complete the task at hand. Supporting this new approach is a growing workbench of modern, flexible data technology.
“The sort of new stack we’re implementing is dbt for basically everything. Our database engine is in BigQuery, so we’ve used that as our warehouse because it’s easy, requires no management, and is scalable. Then we run Python scripts and dbt jobs in GitHub Actions, which we migrated to in days and was more than 12 times cheaper for us to run. Then lastly, we chose Fivetran and have been super happy with it, as it’s the best tool for us because of a lot of the dbt-specific things they do.”
Rounding out Buildertrend’s modern data stack is Hightouch. While the majority of the data engineering team’s work is SQL, there was a significant amount of non-SQL custom code dedicated to Reverse ETL. The adoption of Hightouch ensured they would remain focused on enabling their colleagues, rather than writing and maintaining bespoke code.
“The short story of all of this is that we’re trying to keep our team small and efficient. I prefer to throw tools at problems before people,” Preston shared.
Searching for a Data Catalog 🕵🏻
With a growing team, a significant increase in requests for data, growing confusion about the nature of their data, and an array of market-leading data technology, Preston and his team began to search for a single place to ensure the data they provided was trusted and understood.
“Something that was always a high priority for me was how we identify a source of truth. How do we say that a data set is trustworthy or not, and where does that live?,” Preston explained.
Prior to COVID lockdowns and remote work, resolving questions about data rested on in-person interactions with or within Buildertrend’s data science team. While this collaborative way of working had some positive effects, a combination of remote work and a tripling in team size meant that a question-and-answer approach to data was unsustainable.
“We needed to scale data at Buildertrend, period. So, we started our search by looking at all the products we already had that offered data catalogs,” Preston shared. “Unsurprisingly, most of them have no way of ingesting metadata from anywhere else, which was ridiculous to me. I can’t give people 16 catalogs with different navigation systems.”
Buildertrend’s search for a data catalog continued with a thorough evaluation of the market, with Preston learning that many of the available solutions were mature, but did not meet their high user experience standards, or were too immature to support their complex use cases. But in Atlan, Preston and his team found a platform that met their high standards for both user experience and product maturity, and the right purchasing and evaluation process.
“Atlan immediately stuck out. As a product guy, I’m a big hands-on person, and I don’t want to sit through a demo. I want a trial,” Preston explained. “Having somewhat of an interactive tour was powerful for me because I learned more from that tour than I did about some other products during their demos.”
Preston and his team quickly worked to create a weighted matrix of requirements, placing particular emphasis on search experience, product experience, API maturity, and pace of product development.
“Atlan became the bar that I was feature comparing everybody else with,” Preston shared. “One of my test criteria was what happens when somebody enters something other than a table or column name in a search box, and every other product I looked at returned zero results. If I’m a data scientist looking up a specific table, that’s great, but that’s not search, that’s auto-complete. The product experience also really set it apart, and an example of that was the API having good coverage and public documentation, which is a real sign of maturity for me.”
Read more about Buildertrend’s implementation of Atlan as a data catalog →
📚 From Our Reading List
Riverbed: Optimizing Data Access at Airbnb’s Scale by Amre Shakim
How to Craft a Data Story? by Melis - Data Detective
Automating Data Analytics with ChatGPT by James-Giang Nguyen
P.S. Liked reading this edition of the newsletter? I would love it if you could take a moment and share it with your friends on social! If someone shared this with you, subscribe to upcoming issues here.