Data Engineering Weekly #241

The Weekly Data Engineering Newsletter

Oct 13, 2025

The data platform playbook everyone’s using

We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating undering a single platform.

In this book, you’ll learn:

- How composable architectures allow teams to ship faster
- Why data quality matters and how you can catch issues before they reach users
- What observability means, and how it will help you solve problems more quickly

Get the guide now

Netflix: Data as a Product: Applying a Product Mindset to Data at Netflix

Many organizations struggle to extract consistent value from their data because it’s treated as an afterthought rather than a managed asset. Netflix reframes this challenge by applying a product management mindset to data — defining clear purpose, ownership, lifecycle management, usability, and reliability for every dataset, metric, and model. The article demonstrates how the “data as a product” approach builds trust, reduces data debt, and ensures data consistently drives meaningful business outcomes and innovation across the company.

https://netflixtechblog.medium.com/data-as-a-product-applying-a-product-mindset-to-data-at-netflix-4a4d1287a31d

Meta: Introducing OpenZL: An Open Source Format-Aware Compression Framework

Compression frameworks often struggle to balance the efficiency of format-specific codecs with the simplicity of universal tools. Meta introduces OpenZL, a format-aware, open-source compression framework that learns a data’s structure through configurable transforms and offline training, achieving domain-specific efficiency with a single universal decoder. The benchmark shows the approach delivers higher compression ratios and faster speeds for structured data while maintaining operational simplicity, security, and backward compatibility across evolving datasets.

https://engineering.fb.com/2025/10/06/developer-tools/openzl-open-source-format-aware-compression-framework/

Jack Vanlightly: Beyond Indexes: How Open Table Formats Optimize Query Performance

Many teams misapply OLTP “index” thinking to Iceberg/Delta/Hudi and wonder why analytical queries still scan too much data. The article shows that open table formats win performance through layout and pruning—partitioning, sorting, compaction, Parquet row-group stats, Bloom filters, and platform features such as materialized views—rather than through secondary indexes. The author explains by designing tables and models (e.g., star schemas) for data skipping and augmenting with auxiliary structures, teams cut I/O, accelerate scans, and gain a clear path to future gains via richer standardized metadata and disciplined maintenance.

https://jack-vanlightly.com/blog/2025/10/8/beyond-indexes-how-open-table-formats-optimize-query-performance

Sebastian Raschka: Understanding the 4 Main Approaches to LLM Evaluation (From Scratch)

Evaluating large language models remains complex, as no single metric captures reasoning, correctness, and user preference equally well. The author outlines four main approaches: multiple-choice benchmarks like MMLU for knowledge recall; verifiers for objectively scoring free-form outputs in math or code; leaderboards such as LM Arena that rank models by human preferences; and LLM judges that score responses using rubric-based evaluation by another model. Each balances scalability, objectivity, and realism differently, showing that meaningful LLM assessment requires combining complementary methods aligned with real-world use cases.

https://magazine.sebastianraschka.com/p/llm-evaluation-4-approaches

NYT: Scaling Subscriptions at The New York Times with Real-Time Causal Machine Learning

Every subscription-based business struggles to balance engagement, registration, and subscription goals when static paywalls blunt discovery and conversion. The New York Times writes about replacing its batch “Dynamic Meter” with real-time, causal ML: always-on RCTs train S-learner predictors per action (paywall, reg wall, open), a multi-objective optimizer tunes action weights under business constraints (re-tuned daily), and IPW estimates plus A/B tests validate lift. The system makes millisecond decisions that outperform random policies, resulting in raising subscription and registration rates while holding wall-show rates within targets.

https://open.nytimes.com/scaling-subscriptions-at-the-new-york-times-with-real-time-causal-machine-learning-5f23a7b24ff4

Booking.com: How to estimate correlation between metrics from past A/B tests

Experimentation teams often rely on fast-moving proxy metrics, but naive correlations between proxies (like clicks) and goals (like bookings) can be badly misleading. Booking.com shows that observed experiment correlations mix true causal effects with correlated measurement noise, potentially creating false positives in A/A tests. By applying the Total Covariance Estimator—subtracting the noise covariance from observed effects—they recover the true correlation between metrics, revealing when proxies are directionally misaligned. This bias-corrected approach prevents the shipment of misleading “successful” experiments and underpins a standardized framework for trustworthy metric selection across the company.

https://booking.ai/how-to-estimate-correlation-between-metrics-from-past-a-b-tests-0a5d99e5a11d

Milvus: From Word2Vec to LLM2Vec: How to Choose the Right Embedding Model for RAG

RAG lives or dies by embedding quality: pick the wrong model, and you amplify noise and hallucinations rather than retrieving grounding context. The article maps the landscape—from sparse, dense, and hybrid (BGE-M3) to LLM-based embeddings (LLM2Vec, NV-Embed)—and lays out eight practical selection levers (context window, tokenization, dimensionality, vocabulary, training data, cost, MTEB, domain fit) plus stress tests, then prescribes a workflow: shortlist via MTEB, validate on your data, check database compatibility, and plan for long-document chunking.

https://milvus.io/blog/how-to-choose-the-right-embedding-model-for-rag.md

Shopify: Beyond classification: How AI agents are evolving Shopify’s product taxonomy at scale

E-commerce taxonomies drift as products, attributes, and naming conventions evolve faster than manual curation can keep up, degrading search, filters, and classifier accuracy at scale. Shopify writes about deploying a multi-agent system that jointly analyzes taxonomy structure and real merchant data, detects equivalences (e.g., category = broader category + attribute filters), and routes proposals through domain-specific AI judges for automated QA before human review.

https://shopify.engineering/product-taxonomy-at-scale

Helpshift: How Data Powers Agent Productivity

Helpshift writes about support agents' productivity by building custom engagement, idle, and anomaly metrics based on real browser interaction events. The blog is a classic example of how instrumentation and measurements go hand in hand. By capturing focus, blur, and ticket activity signals, maintaining state across windows, and computing derived metrics via Spark, the team turned fragmented behavioral data into actionable insights that now power staffing, performance, and operational decisions for customers.

https://medium.com/helpshift-engineering/how-data-powers-agent-productivity-f310f414872d

Agoda: Why We Bet on Rust to Supercharge Feature Store at Agoda

Rust is making a prominent impact on data infrastructure. Agoda writes about rebuilding its feature-serving service in Rust, leveraging the compiler’s safety guarantees, async efficiency, and AI-assisted development to overcome a steep learning curve. The result: 5× higher traffic capacity with only 40% of prior CPU use, 84% lower compute costs, and stable sub-10 ms latencies—proving Rust’s reliability and performance gains far outweigh migration complexity for critical, low-latency systems.

https://medium.com/agoda-engineering/why-we-bet-on-rust-to-supercharge-feature-store-at-agoda-ed4a70d2efb7

All rights reserved, Dewpeche Pvt Ltd, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Data Engineering Weekly

Discussion about this post

Ready for more?