Data Engineering Weekly #227

The Weekly Data Engineering Newsletter

Jul 07, 2025

The Data Platform Fundamentals Guide

A comprehensive guide for data platform owners looking to build a stable and scalable data platform, starting with the fundamentals and wrapping up with real-world examples illustrating how teams have built in-house data platforms for their businesses.

Get the guide

Philipp Schmid: The New Skill in AI is Not Prompting, It's Context Engineering

Building powerful and reliable AI Agents is becoming less about finding a magic prompt or model updates. It is about engineering context and providing the right information and tools in the right format at the right time.

The author explains why context engineering is crucial in the development of AI agents.

https://www.philschmid.de/context-engineering

Piethein Strengholt: Unstructured Data Management at Scale

Unstructured data management will be the next significant challenge in big data management as we continually enhance our ability to parse and understand various forms of data. The author highlights the processing of unstructured data in alignment with the Medallion architecture and discusses Tensor Lake and LlamaParse.

https://piethein.medium.com/unstructured-data-management-at-scale-4c612f822f70

Médéric Hurier (Fmind): The Great Data Divergence: Why Generative AI Demands a New Approach Beyond the Data Lake

The article brings a new perspective from the previous article, questioning whether the Data Lake is a valid approach for emerging Gen AI cases. Freshness, Context, and Low-Latency access are the keys to the success of Gen AI applications, and the author questions the medallion architecture of the data lake.

https://mlops.community/the-great-data-divergence-why-generative-ai-demands-a-new-approach-beyond-the-data-lake/

DataDog: How we built reliable log delivery to thousands of unpredictable endpoints

DataDog writes about building a log delivery to external endpoints, drawing inspiration from the package delivery network. The design around writing a small microbatch per destination with an envelope is an interesting design case study for fanout write.

https://www.datadoghq.com/blog/engineering/reliable-log-delivery/

Gojek: Introducing xkafka — Kafka, but Simpler (for Go)

What if we could make using Kafka in Go feel more like writing a simple HTTP service?

Gojek details its Kafka SDK abstraction, following the ‘all batteries included’ pattern.

https://medium.com/gojekengineering/introducing-xkafka-kafka-but-simpler-for-go-91f4ce3edade

Uber: Reinforcement Learning for Modeling Marketplace Balance

Uber shares insights on leveraging reinforcement learning (RL) to enhance driver-rider matching by modeling it as an infinite-horizon Markov Decision Process (MDP) and applying a DQN-inspired value iteration method with temporal difference learning. Innovations include utilizing negative signals from driver idle states in reward modeling, employing contrastive loss for smoother geospatial embeddings, and validating models through a custom Monte Carlo-based evaluation pipeline. The global deployment resulted in a 0.52% increase in driver earnings and a 2.2% decrease in rider cancellations.

https://www.uber.com/en-IN/blog/reinforcement-learning-for-modeling-marketplace-balance/

Wix: Advancing Enterprise AI: How Wix is Democratizing RAG Evaluation

Wix open-sources WixQA, a realistic benchmark suite derived from customer support interactions, and RAGXplain, an evaluation framework turning metrics into human-readable insights for enterprise Retrieval-Augmented Generation (RAG) systems. WixQA includes expert-written, simulated, and synthetic datasets paired with a synchronized knowledge base, while RAGXplain provides clear, actionable recommendations based on six performance metrics.

https://www.wix.engineering/post/advancing-enterprise-ai-how-wix-is-democratizing-rag-evaluation

Deliveroo: Deliveroo's Machine Learning Platform: Powering the Future of ML

Deliveroo shares insights from building their centralized Machine Learning Platform, designed to standardize workflows, boost engineer productivity by 2-3x, and accelerate model deployment. Combining open-source technologies like Kubernetes, Argo, and Metaflow with custom-built tools, such as Inferoo (a real-time inference service handling over 1 billion daily requests) and a dedicated Feature Store, the platform emphasizes automation, cohesion, and self-service. The team is now enhancing the platform with an internal ML portal for unified model management, monitoring, and deployment.

https://deliveroo.engineering/2025/07/02/deliveroo-ml-platform.html

Henry Ko: TPU Deep Dive

The author delves into the architecture and design of its Tensor Processing Units (TPUs), specifically TPUv4, highlighting how systolic arrays, pipelining, and Ahead-of-Time compilation with XLA enable high throughput and energy efficiency in AI workloads. The article covers TPU hierarchies—from single-chip TensorCores and memory buffers (CMEM, VMEM) to multi-chip trays, racks, pods, and multislice setups—emphasizing flexible interconnects, such as Inter-Core Interconnect (ICI) and Optical Circuit Switching (OCS). It also explains how XLA abstracts complex distributed topologies (such as 3D torus and twisted torus) to optimize various parallelism strategies.

https://henryhmko.github.io/about/about.html

All rights reserved, ProtoGrowth Inc., India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Data Engineering Weekly