Data Engineering Weekly

Share this post

Data Engineering Weekly #89

www.dataengineeringweekly.com

Data Engineering Weekly #89

The Weekly Data Engineering Newsletter

Ananth Packkildurai
Jun 20, 2022
9
Share this post

Data Engineering Weekly #89

www.dataengineeringweekly.com

Data Engineering Weekly Is Brought to You by RudderStack

RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.


Airbnb: Graph Machine Learning at Airbnb

We can frame many real-world machine learning and data analytics problems as graph problems. It's a constant question in me, yet we do not leverage graph modeling as much and treat it as an independent entity. Airbnb writes an exciting blog describing the benefit of using graphs for machine learning.

https://medium.com/airbnb-engineering/graph-machine-learning-at-airbnb-f868d65f36ee


LinkedIn: Towards data quality management at LinkedIn

LinkedIn writes about its data health monitor architecture to track data freshness and schema changes to detect the health of the data. It will be interesting to see the architecture evolves from “monitoring-after-the-fact.” to a preventive solution

https://engineering.linkedin.com/blog/2022/towards-data-quality-management-at-linkedin.


Spotify: How We Built Infrastructure to Run User Forecasts at Spotify

Forecasting key business metrics quarterly, weekly, or daily helps the business monitor performance, make business decisions, and improve our product offerings. Spotify writes about its forecasting infrastructure and lessons learned.

https://engineering.atspotify.com/2022/06/how-we-built-infrastructure-to-run-user-forecasts-at-spotify/


Grab: Automated Experiment Analysis - Making experimental analysis scalable.

Grab writes about its automated experimentation analytics to automate the basic analytics. The separation of metrics configuration and computation, dataset classification of bronze and gold datasets, and star schema modeling practices are exciting to read.

https://engineering.grab.com/automated-experiment-analysis


Sponsored: Firebolt - Embedded Analytics vs Data Apps

But Data Apps is still a loosely defined term, and there’s a lot of debate and confusion about what it really means, and how it differs from traditional dashboarding and embedded analytics. Boaz Farkash shares his point of view on the subject.

https://www.firebolt.io/blog/embedded-analytics-vs-data-apps


Etsy: Using Real-Time Streaming to Power Etsy's Offsite Ads

The streaming analytics from the transactional database to the analytical applications is always challenging. The infrastructure to stitch and maintain is expensive to maintain. Etsy writes about one of its challenges of building an analytical app using the change data capture.

https://www.etsy.com/codeascraft/using-real-time-streaming-to-power-etsy-offsite-ads


Gradient Flow: Distributed Computing for AI - A Status Report

Gradient Flow publishes the distributed computing roles and needs for each stage of the ML lifecycle to demonstrate the overlap of AI & distributed computing.

https://gradientflow.com/distributed-computing-for-ai-a-status-report/


Petr Janda: A path towards a data platform that aligns data, value, and people

The rapid expansion of additional data sources and the expansion of the use cases brings challenges to the modern cloud data infrastructure. The blog narrates the challenges and why the data product approach solves these emerging issues.

https://petrjanda.substack.com/p/a-path-towards-a-data-platform-that


Chad Sanderson: The Death of Data Modeling - Pt. 1

Is data modeling dead? The author narrates the importance of data modeling and why it’s hard to adopt in modern data technologies. The traditional approach of a centralized data modeling team won’t scale, and the author calls for rethinking data modeling.

https://dataproducts.substack.com/p/the-death-of-data-modeling-pt-1


Sponsored: RudderStack - What is the Growth Stack?

A detailed guide to building the Growth Stack—an architecture to centralize every data point into a comprehensive source of truth and activate that centralized data in downstream tools. The growth stack is phase two of RudderStack's Data Maturity Journey framework.
https://www.rudderstack.com/blog/what-is-the-growth-stack


Jarek Potiuk: Airflow Summit 2022 — The Best Of

Airflow summit hosts some excellent talks in data engineering, and the author summarizes the conference talks here.

https://potiuk.com/airflow-summit-2022-the-best-of-373bee2527fa


Zalando: Accelerate testing in Apache Airflow through DAG versioning

Zalando writes about how to version the Airflow DAGs on a single server through isolated pipeline and data environments to enable more convenient simulation and testing.

https://engineering.zalando.com/posts/2022/06/accelerate-apache-airflow-testing-through-dag-versioning.html


Altexsoft: Customer Churn Prediction Using Machine Learning - Main Approaches and Models

An excellent overview of how the SaaS companies handle the customer churn prediction and the models and approaches using machine learning to predict the customer churn.

https://www.kdnuggets.com/2019/05/churn-prediction-machine-learning.html


HomeToGo Engineering: How HomeToGo connected dbt and Superset to make metadata more accessible and reduce analytical overhead

HomeToGo writes about integrating dbt metadata with Superset making the metadata available for easier consumption by the data consumers. The approach to push the metrics definition from Superset to a more source-controlled solution and using dbt manifest as a source of truth is fascinating.

https://engineering.hometogo.com/how-hometogo-connected-dbt-and-superset-to-make-metadata-more-accessible-and-reduce-analytical-2223af539cc6


All rights reserved Pixel Impex Inc, India. Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Share this post

Data Engineering Weekly #89

www.dataengineeringweekly.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing