An Open Letter to Data Ninjas - Yes, You Need To Implement Data Contract System
With lot of love from your fellow data engineer
Not too long ago, I worked for a famous messaging platform; You may be typing on it as you read this blog. I took a break from data engineering and focused on system monitoring and observability for some time. Little did I know I would be building data pipelines to measure system efficiency instead of measuring business efficiency.
As the famous saying goes;
Once a data engineer, always a data engineer.
I worked with many incredible humans in reliability engineering; It’s fun, and there is always tons to learn every day. I learned state-of-the-art incident management processes and the quest for operational excellence to run large-scale distributed systems. There I saw the “Reliability Ninjas.”
The Reliability Ninjas:
Incidents are brutal and stressful; I’ve seen companies paying to be on-call or incentives to take a day off after stressful incidents. We have all seen it, and we have all gone through it.
However, I have seen a set of people; I call them fondly “The Reliability Ninjas.” When the pager alerts, they are enlightened. It is their call, time to roll up their sleeves, time for a quick expresso, and yay, it is the incident time.
Reliability Ninjas; Let’s admit it; You love incidents.
The Data Ninjas:
Is it 2010? I don’t remember the exact year. I consulted an online media company focusing on building a search engine for curated recipe content. The chefs enter their recipes in an internal content management system (CMS), and our job is to build a search & recommendation platform. Soon we ran into a search efficiency problem. The CMS provides a simple free text box for the Chefs to enter the ingredients, measurements, and other metadata. There is no structure, making it harder for us to bring relevancy to the search.
We embarked on a project to apply NLP to extract the structure; At that time, I asked the data engineering leader,
Me: “Hey, You know what; the data creation is in our control. Why can’t we provide a simple UI to enter the ingredient, quantity & unit of measure? Why do we need an NLP?”
The answer I got: “Ananth, if we do so, where is the technology challenge for us?”
It is where, I met the first Data Ninja, and it never stopped from there.
I saw the data ninjas unfurled with the Data Contract conversations. Some blame the people for not entering the ingredients to make it a people problem. Some say, Buckle up, data engineers; let the upstream send all data in whatever format. We got the dbt superpower to make things work; ignore the Snowflake billing.
Listening to all these conversations, After a decade, I still wonder why we can’t introduce simple “data contract” tooling to enable high-quality data creation to capture the ingredients. Am I worthy enough Data Engineer?
The difference between Reliability Ninjas vs. Data Ninjas:
Is being a Data Ninja bad? No, not at all. However, One vital characteristic I observe from the Reliability Ninja is their actions after the incident. The Reliability Ninjas go through a look-back analysis, understand why the incident happens, and figure out a way to prevent it from happening again.
I still remember the mantra; It’s okay to fail. Just don’t fail twice the same way.
Data Ninjas, on the other hand, double down on building patches for leakage since It is a matter of building another dbt model. We lost the data team's meaning on every data patching and restricted ourselves from delivering dashboards and building dbt models.
The Data Team exists to provide an end-to-end data solution for an organization, starting from the origination of the data itself. The quest to build a high-quality end-to-end data system is the difference between successful data-driven organizations and the rest of the organizations.
Now it is time for you to ask which persona you are. Are you a Reliability Ninja or Data Ninja? Let me know in the comments.
It is all excellent, Ananth; I’m a data leader; I want to implement end-to-end data solutions; Where should I start?
When I wrote Schemata, this was at the top of my mind. As a data leader, should I focus on teaching the product engineers about data engineering or build tooling to abstract it? How should I build systematic feedback to enable executive buy-in for data creation?
I will be writing more about this in the coming weeks; Meanwhile, if this is something you can resonate with and want to discuss more,
Book me a slot here: https://calendly.com/apackkildurai
Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.