Data Engineering Weekly

Share this post

Data Engineering Weekly #84

www.dataengineeringweekly.com

Data Engineering Weekly #84

The Weekly Data Engineering Newsletter

Ananth Packkildurai
Apr 25, 2022
2
Share this post

Data Engineering Weekly #84

www.dataengineeringweekly.com

Data Engineering Weekly Is Brought to You by RudderStack

RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.


Ananth Packkildurai: Back To The Future - Emerging Trends In Data Engineering

I gave a talk about the emerging trends in data engineering last October at the CrunchConf. The video got published now.

Speaker Deck : https://speakerdeck.com/vananth22/back-to-the-future-emerging-trends-in-data-engineering


Meta: Inside Meta's AI optimization platform for engineers across the company

Meta writes about Looper, An AI platform to support the complete machine learning lifecycle from model training, deployment, and inference all the way to evaluation and tuning of products.

Hello, again Bundling vs. UnBundling

A coupling of things stands out in the blog,

  1. It is a declarative AI system, which means that product engineers only need to declare the functionality they want. The system fills in the software implementation based on the declaration.

  2. While other AI platforms often perform inference offline in batch mode, Looper operates in real-time.

https://ai.facebook.com/blog/looper-meta-ai-optimization-platform-for-engineers/


Lyft: Challenges in Experimentation

Customers, competitors, and the economy's direction are unpredictable in their own way. Experimentation is vital for testing the product change to build evidence to drive significant decisions. Lyft writes an exciting blog on the challenges of supporting the culture of experimentation.

https://eng.lyft.com/challenges-in-experimentation-be9ab98a7ef4


dbt labs: A Good Problem to Have…

The scheduler is a core part of data transformation. dbt writes about the scalability challenges with dbt and the recent improvements. I'm looking forward to part 2 of this to understand dbt cloud scheduler more!!.

https://www.getdbt.com/blog/a-good-problem-to-have/


Sponsored: Firebolt - Database Performance is Not About Performance

In this blog, we argue that performance is actually not about performance at all! We’ll contextualize real-world customer needs for data warehouse performance, and we’ll even make a bold prediction about the future of data warehousing (preview - it’s all about the new CDW).

https://www.firebolt.io/blog/future-of-performance-is-not-about-performance


Zalando: Machine Learning Platform - Architecture and tooling behind machine learning at Zalando

Zalando writes about the architecture and tooling behind its ML platform. The ZFlow on top of the AWS step function and the custom web interface on top of Backstage looks interesting. 

https://engineering.zalando.com/posts/2022/04/zalando-machine-learning-platform.html


DoorDash: Building the Model Behind DoorDash’s Expansive Merchant Selection

DoorDash writes about its expansive merchant selection to onboard high-value merchants to ensure the selection in every market matches customer demand. The model strategy to train the customer preference to the merchant onboard looks interesting, but I wonder how the team maintains algorithm fairness? Any potential AI bias can lead to social imbalance, but the blog does not mention how it handles algorithm fairness.

https://doordash.engineering/2022/04/19/building-merchant-selection/


Sponsored: Rudderstack - The Data Stack Show Live: Solving the Data Quality Problem

Data quality issues are universal, and dealing with them at scale is toil. Join The Data Stack Show on Wednesday at 10 PT for a live recording with some of the brightest minds working to solve the problem. Leaders at Bigeye, Great Expectations, Lightup, and Metaplane will discuss why data quality is so challenging and how to fix it.

https://datastackshow.com/live-data-quality/


Blinkit: Evolution of Redash at Blinkit

Blinkit writes about its usage of Redash and narrates the challenges of running the SQL dashboarding tools and how Blinkit effectively solved them.

https://lambda.blinkit.com/evolution-of-redash-at-blinkit-fb50a64770bf


Mikkel Dengsøe: Data tests and the broken windows theory

Building trust in data in an organization is the most crucial function of a data team. The author compares the broken window theory with the data testing function.

https://mikkeldengsoe.substack.com/p/broken-windows


Lil’Log: Learning with not Enough Data

A perfect labeled data is often hard to achieve with cost and the human effort involved. Yet, label data is critical for the supervised learning task. The author discusses the approaches to take when there is not enough labeled data in a three-part series.

Learning with not Enough Data Part 1: Semi-Supervised Learning

Learning with not Enough Data Part 2: Active Learning

Learning with not Enough Data Part 3: Data Generation


Sponsored: Monte Carlo Data - The Modern Data Leader’s Playbook

Learn how today’s best data engineering and analytics leaders are staying ahead of the competition in our exclusive guide.

Download the modern data leader’s playbook


Booking.com: Overtracking and trigger analysis - reducing sample sizes while INCREASING the sensitivity of experiments

An exciting article from booking.com discussing the danger of tracking users who can't be in the treatment category (called overtracking) affects the variance of the experimentation metrics and dilutes the treatment effect, making its detection harder.

https://booking.ai/overtracking-and-trigger-analysis-how-to-reduce-sample-sizes-and-increase-the-sensitivity-of-71755bad0e5f


Meryam Bukhari: What's the role of an ML PM?

Many companies adopt the product over project strategy and treat the internal platform as a product. The author discusses the role of a product manager in building ML-based products.

https://meryam.substack.com/p/whats-the-role-of-an-ml-pm


All rights reserved Pixel Impex Inc, India. Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Share this post

Data Engineering Weekly #84

www.dataengineeringweekly.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing