The Data Founder Story: TUKAN
By Patricio Davila Barberena
Beginning of our journey
The year is 2018 and we are wrapping up college courses needed to graduate from university. We continue to hear from a multitude of professors that ‘cleaning data will keep the bread in our mouths’ and would enter the labor market as glorified data cleaners disguised as licensed mathematicians.
Garbage In, Garbage Out
Ilan and I were running a statistical analysis on the statistical significativeness of a national referendum in Mexico. When we were completing our experiments we realized that our data was wrong. After the initial panic subsided we took to manually inspecting our pipelines and trying to figure out what went wrong.
It turned out that one of the sources labeled state data differently than another and upon merging was mistakenly matched. We implemented a 'hotfix' of the geographic values that the electoral institute used compared to those used by the national institute of statistics and re-ran our data pipelines.
This experience would mark the first of many encounters manually parsing web data to fit into existing data models. The three of us that would eventually launch Tukan worked in different industries and yet were creating ETL jobs to fix a variety of the same data quality issues.
We were distraught calculating how much time and effort we could expect to spend cleaning data. Surely there is a better way for talented professionals to budget their time.
After realizing this was a shared pain point, we could not ignore the calling to solve. . TUKAN is a Mexican startup with the intent to standardize the world's public data.
Zero to One
After a brief stint in data journalism, we took up the challenge of building a data product and hired our first software developers to help us with our MVP. These first hires allowed us to launch our data catalog that grants users access to standardized public data including regulatory, financial, macroeconomic, demographic, and many more.
Our data catalog enables TUKAN users to integrate web data from continuously changing and cumbersome data sources.
Adapting and Rethinking
TUKAN’s initial launch landed a handful of clients. We are fortunate to focus on the maintenance and further expansion of aggregating diverse and scattered data points into the data catalog. We are rebuilding our search module by integrating active metadata into our search fields to facilitate data discovery. Furthermore, we are focusing on our ability to monitor external data sources for changes and outages to be able to quickly respond to a changing data ecosystem.
TUKAN’s built-in hierarchy of data table attributes allows our users to not just query individual entities but compare them to their parent attributes. For example, being able to compare an individual bank’s profit loss with the average of the entire commercial banking sectors’ within our application in one single query. Our tools are making operational analytics ubiquitous to stakeholders from multiple industries.
All rights reserved Pixel Impex Inc, India. Links are provided for informational purposes and do not imply endorsement. The data founder story contents provided by the featuring data founders in this article and Data Engineering Weekly is not responsible for any compliance with applicable laws, rules, and regulations.