Data Engineering Weekly #76
Weekly Data Engineering Newsletter
Data Council - Austin 2022
Data Council published the Austin 2022 schedule
here. The data engineering weekly readers can get a 20% discount using promo code
Let’s start this week’s edition with this excellent thread on the challenges of collecting SaaS metrics and how to approach solving them.
Bence Arató: Fundraising by data companies in 2021
2021 is an exciting year for data startups. The blog is a compilation of fund rounds raised by data startups.
John Cutler: The Data-Informed Product Cycle
How does the data-informed product loop look? The author narrates its lifecycle.
Have a strategy
Translate that into models
Add minimally viable measurement.
Identify leverage points
Vortexa: Choosing an Analytics Tool. Metabase Vs. Superset Vs. Redash
Vortexa writes about its selection process of choosing the analytical tool and why they decided on Metabase.
Sarah Krasnik: Choosing a Data Quality Tool
The blog is an excellent overview of the data quality tool's landscape. It is a good reference article if you're in the process of choosing a data quality tool.
Vimeo: Monitoring data quality at scale using Monte Carlo
Staying on the data quality story, Vimeo writes about monitoring data quality with Monte Carlo. The CI/CD flow for the data quality is check is an exciting read; I'm curious to read more about the feedback loop for error correction.
Adam Marcus: Data diffs- Algorithms for explaining what changed in a dataset
What changed and how much changed is the first question we ask when looking at the data. The blog narrates the explanatory algorithms for finding the difference between two datasets. The author implemented an open-source version of the paper Diff and explained how that could help solve the problem.
Sponsored: RudderStack - Data Modeling In the Warehouse For Data Engineers
Many companies still struggle to answer even basic questions with their data. These data modeling best practices from RudderStack will help you build a well-defined core data layer, enabling teams to answer harder questions while ensuring a better experience for every end-user.
Netflix: Robust Foundation for Data Pipeline at Scale - Lessons learned from Netflix
The QCon talks about the Netflix data pipeline now available. The workflow support for the event-driven and the scheduled trigger is an exciting approach for an orchestration engine.
Stephen Bailey: Kicking the tires on dbt Metrics
The metrics layer is an exciting development, and curious to see how it progresses. The author shared the experience of trying the
dbt metrics layer with a concern; there is a lot of YAML!!
Dream 11: Data Feast — A Highly Scalable Feature Store at Dream11
Dream 11 writes about its feature store Data Feast. The choice of HBase as a feature store is interesting. TIL about
, and looking forward to reading more on it.
PyTorch: Introducing TorchRec, a library for modern production recommendation systems
PyTorch announced TorchRec, a PyTorch domain library for Recommendation Systems. TorchRec library provides common sparsity and parallelism primitives, enabling researchers to build state-of-the-art personalization models and deploy them in production.
Outerbounds: Notebooks In Production With Metaflow
Metaflow, an orchestration engine for ML pipeline, popularized deploying notebooks in production. Outbounds introduce Notebook Cards, which allow data scientists to use notebooks to visualize and debug production workflows and help bridge the MLOps divide between prototype and production.
Expedia: Practical Schema Evolution with Avro
The article is an excellent compilation of Avro schema evolution with practical advice. It is a comprehensive guide to educate users on Avro schema evolution to simplify managing schema changes.
Sponsored: Rudderstack - The Data Stack Show Live: Is Reverse ETL Just Another Data Pipeline?
You’ve heard about Reverse ETL. Here’s your chance to learn all about the tooling from the folks who are creating it. Join Hosts Eric and Kostas for a live recording of The Data Stack Show on March 9th to get insights from experts at Census, Hightouch, and Workato.
Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.