<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data Engineering Weekly]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com</link><image><url>https://substackcdn.com/image/fetch/$s_!AdQk!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png</url><title>Data Engineering Weekly</title><link>https://www.dataengineeringweekly.com</link></image><generator>Substack</generator><lastBuildDate>Fri, 12 Jun 2026 08:50:04 GMT</lastBuildDate><atom:link href="https://www.dataengineeringweekly.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Ananth Packkildurai]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[dataengineeringweekly@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[dataengineeringweekly@substack.com]]></itunes:email><itunes:name><![CDATA[Ananth Packkildurai]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ananth Packkildurai]]></itunes:author><googleplay:owner><![CDATA[dataengineeringweekly@substack.com]]></googleplay:owner><googleplay:email><![CDATA[dataengineeringweekly@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ananth Packkildurai]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Data Engineering Weekly #273]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-273</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-273</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 08 Jun 2026 02:44:02 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=06_07_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Volg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Volg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Volg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Volg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Volg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19448,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=06_07_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Volg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Volg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Volg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Volg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a52045b-d0b2-45c7-acd1-9a4f6e56eee3_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How to Build a Data Platform</h1><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating under a single platform. <br><br>In this book, you&#8217;ll learn:<br><br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=06_07_26_data_engineering_weekly">Download your free copy now</a></strong></p><div><hr></div><h1>Fei-Fei Li: A Functional Taxonomy of World Models</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OzEc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OzEc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic 424w, https://substackcdn.com/image/fetch/$s_!OzEc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic 848w, https://substackcdn.com/image/fetch/$s_!OzEc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic 1272w, https://substackcdn.com/image/fetch/$s_!OzEc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OzEc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic" width="1374" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1374,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8672,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OzEc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic 424w, https://substackcdn.com/image/fetch/$s_!OzEc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic 848w, https://substackcdn.com/image/fetch/$s_!OzEc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic 1272w, https://substackcdn.com/image/fetch/$s_!OzEc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdcaa3f-563c-4883-8601-7976b150d04b_1374x670.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The World Model is one of the most overloaded terms today. The author writes about the taxonomy of the world model into three functions,</p><ol><li><p>Renderer: A renderer outputs observations in the form of pixels meant for human eyes, and the quality that matters most is visual fidelity.</p></li><li><p>Simulator: A simulator outputs a state: a geometrically, physically, or dynamically faithful representation of the world that humans and computer programs can both compute on and interact with.</p></li><li><p>Planner: A planner outputs actions.</p></li></ol><p>From a data warehouse perspective, the metric tree is an attempt to build a simpler version of a business simulator. We need to think about extending the simulation of the entire business as it emerges. </p><p><strong><a href="https://drfeifei.substack.com/p/a-functional-taxonomy-of-world-models">https://drfeifei.substack.com/p/a-functional-taxonomy-of-world-models</a></strong></p><div><hr></div><h1>Gorgias: When Event Time Meets Reality: Lessons from Building Billing on Apache Flink</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8GYh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8GYh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic 424w, https://substackcdn.com/image/fetch/$s_!8GYh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic 848w, https://substackcdn.com/image/fetch/$s_!8GYh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic 1272w, https://substackcdn.com/image/fetch/$s_!8GYh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8GYh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic" width="1400" height="241" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:241,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:10522,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8GYh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic 424w, https://substackcdn.com/image/fetch/$s_!8GYh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic 848w, https://substackcdn.com/image/fetch/$s_!8GYh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic 1272w, https://substackcdn.com/image/fetch/$s_!8GYh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c1cbf32-0bf3-4bae-8d1d-108fd6aa43dd_1400x241.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In real-time pipelines, we assume that the watermark configuration alone preserves ordering, but reprocessing historical data with keyBy operators breaks source-level alignment downstream. The author aligns deduplication and consolidation using the same customer-user key, removes intermediate repartitioning, and extends consolidation by 1 day when event lag exceeds 15 minutes. The combined fix cuts overlapping consolidation windows by 10x without delaying real-time billing, anchored to lag-aware timer registration with explicit cleanup.</p><p><strong><a href="https://medium.com/gorgias-engineering/when-event-time-meets-reality-lessons-from-building-billing-on-apache-flink-581ff895c60d">https://medium.com/gorgias-engineering/when-event-time-meets-reality-lessons-from-building-billing-on-apache-flink-581ff895c60d</a></strong></p><div><hr></div><h1>Cloudflare: Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse.</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-eJt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-eJt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic 424w, https://substackcdn.com/image/fetch/$s_!-eJt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic 848w, https://substackcdn.com/image/fetch/$s_!-eJt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic 1272w, https://substackcdn.com/image/fetch/$s_!-eJt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-eJt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic" width="1456" height="1444" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1444,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30344,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-eJt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic 424w, https://substackcdn.com/image/fetch/$s_!-eJt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic 848w, https://substackcdn.com/image/fetch/$s_!-eJt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic 1272w, https://substackcdn.com/image/fetch/$s_!-eJt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2f191275-51bd-44e2-b9d9-2228e1d23214_1999x1983.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Cloudflare writes that Multi-tenant ClickHouse partitioning hides query-planner coupling at scale &#8212; the part-list mutex and vector copy serialize across concurrent queries, even when each query reads few parts. Cloudflare patches ClickHouse&#8217;s MergeTreeData hotspot with three upstream fixes &#8212; replacing the exclusive lock with std::shared_lock, deferring the parts-vector copy, and binary-searching the sorted namespace prefix.</p><p><strong><a href="https://blog.cloudflare.com/clickhouse-query-plan-contention/">https://blog.cloudflare.com/clickhouse-query-plan-contention/</a></strong></p><div><hr></div><h1>Sponsored: AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=06_07_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Y83!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!6Y83!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!6Y83!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!6Y83!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Y83!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:25914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=06_07_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Y83!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!6Y83!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!6Y83!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!6Y83!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe18c399-757f-4169-8e0c-3e0a2b7d8fad_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=06_07_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Helpshift: Migrating from a Monolithic Orchestrator to Apache Airflow</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5cKR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5cKR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic 424w, https://substackcdn.com/image/fetch/$s_!5cKR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic 848w, https://substackcdn.com/image/fetch/$s_!5cKR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic 1272w, https://substackcdn.com/image/fetch/$s_!5cKR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5cKR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic" width="1456" height="728" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15960,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5cKR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic 424w, https://substackcdn.com/image/fetch/$s_!5cKR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic 848w, https://substackcdn.com/image/fetch/$s_!5cKR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic 1272w, https://substackcdn.com/image/fetch/$s_!5cKR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F114a8c61-1c70-4879-9d22-78e26fc9e34d_1774x887.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Helpshift writes about migrating a monolithic Clojure-based scheduler system to Apache Airflow. The team writes about its journey and the overall impact on metrics such as reliability, availability, performance, data quality, &amp; the developer productivity. </p><p><strong><a href="https://medium.com/helpshift-engineering/migrating-from-a-monolithic-orchestrator-to-apache-airflow-30fde94bcdc0">https://medium.com/helpshift-engineering/migrating-from-a-monolithic-orchestrator-to-apache-airflow-30fde94bcdc0</a></strong></p><div><hr></div><h1>Indeed: Distilling Long-Tail User Behavior into Scalable Embeddings for Job Search</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fKnq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fKnq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic 424w, https://substackcdn.com/image/fetch/$s_!fKnq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic 848w, https://substackcdn.com/image/fetch/$s_!fKnq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic 1272w, https://substackcdn.com/image/fetch/$s_!fKnq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fKnq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic" width="1456" height="617" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17235,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fKnq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic 424w, https://substackcdn.com/image/fetch/$s_!fKnq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic 848w, https://substackcdn.com/image/fetch/$s_!fKnq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic 1272w, https://substackcdn.com/image/fetch/$s_!fKnq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe14d7007-fa81-4f32-ad58-2cf1a9fa1a8b_1927x816.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p><em><strong>We have long relied on distillation to compress complexity. But while humans reason within finite cognitive dimensions, foundation models now navigate data at an order-of-magnitude greater scale. We are moving from a paradigm of static, structural data modeling to a new era where the embedding model itself has become our primary, high-dimensional data model. - Ananth</strong></em></p></blockquote><p>This case study perfectly illustrates the thesis: Indeed&#8217;s shift from traditional feature engineering to a unified, transformer-based UBM architecture demonstrates the evolution from static, structural data design to the embedding model as the primary, high-dimensional data model for reasoning.</p><p><strong><a href="https://engineering.indeedblog.com/blog/2026/06/distilling-long-tail-user-behavior-into-scalable-embeddings-for-job-search/">https://engineering.indeedblog.com/blog/2026/06/distilling-long-tail-user-behavior-into-scalable-embeddings-for-job-search/</a></strong></p><div><hr></div><h1>Netflix: Dynamic Repartitioning for Time Series Workloads</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-CFI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-CFI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic 424w, https://substackcdn.com/image/fetch/$s_!-CFI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic 848w, https://substackcdn.com/image/fetch/$s_!-CFI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic 1272w, https://substackcdn.com/image/fetch/$s_!-CFI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-CFI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic" width="1400" height="1109" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1109,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23394,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-CFI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic 424w, https://substackcdn.com/image/fetch/$s_!-CFI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic 848w, https://substackcdn.com/image/fetch/$s_!-CFI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic 1272w, https://substackcdn.com/image/fetch/$s_!-CFI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce578fc6-9ba8-4670-adc9-433ebad861d9_1400x1109.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Netflix discusses how static Cassandra partitioning breaks down under evolving time-series workloads when a small percentage of IDs accumulate disproportionate event volumes, driving wide partitions that cause tail-latency timeouts. Netflix layers two runtime adjustments onto its TimeSeries Abstraction &#8212; a background worker returning future Time Slice partitions, and per-ID dynamic splitting that routes reads via Bloom filters.</p><p><strong><a href="https://netflixtechblog.com/dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads-0eded064f456">https://netflixtechblog.com/dynamically-splitting-wide-partitions-in-cassandra-for-time-series-workloads-0eded064f456</a></strong></p><div><hr></div><h1>Jack Vanlightly: Broker-Visible vs Client-Local Parallelism</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oHxw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oHxw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic 424w, https://substackcdn.com/image/fetch/$s_!oHxw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic 848w, https://substackcdn.com/image/fetch/$s_!oHxw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic 1272w, https://substackcdn.com/image/fetch/$s_!oHxw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oHxw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic" width="736" height="343" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:343,&quot;width&quot;:736,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:10581,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oHxw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic 424w, https://substackcdn.com/image/fetch/$s_!oHxw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic 848w, https://substackcdn.com/image/fetch/$s_!oHxw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic 1272w, https://substackcdn.com/image/fetch/$s_!oHxw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecff15a2-66fb-47c7-acd6-b4cef7ee7c7b_736x343.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Messaging system parallelism has a hidden cost &#8212; where the unit of parallelism lives determines whether scaling stays cheap or explodes into broker-managed TCP connections and state. The author compares broker-visible parallelism with client-local parallelism using virtual threads, showing that the latter scales a 60K msg/sec workload from 60,000 consumers down to 60. Client-local parallelism shifts coordination cost off the broker but adds client complexity, leaving share groups in need of a parallel-processing library layer, anchored to per-message acknowledgment semantics.</p><p><strong><a href="https://jack-vanlightly.com/blog/2026/6/3/broker-visible-vs-client-local-parallelism">https://jack-vanlightly.com/blog/2026/6/3/broker-visible-vs-client-local-parallelism</a></strong></p><div><hr></div><h1>Criteo: Introducing CLEPR, our model for semantic understanding</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xDXM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xDXM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic 424w, https://substackcdn.com/image/fetch/$s_!xDXM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic 848w, https://substackcdn.com/image/fetch/$s_!xDXM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic 1272w, https://substackcdn.com/image/fetch/$s_!xDXM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xDXM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic" width="1400" height="468" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:468,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11942,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/201079948?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xDXM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic 424w, https://substackcdn.com/image/fetch/$s_!xDXM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic 848w, https://substackcdn.com/image/fetch/$s_!xDXM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic 1272w, https://substackcdn.com/image/fetch/$s_!xDXM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F538f5bd6-56fd-43d5-a8d5-2f287efe6566_1400x468.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Criteo writes about building CLEPR, a two-tower contrastive embedding model with separate keyword and product encoders &#8212; training on session-stitched pseudo-clicks and deduplicating repeated pairs to strip popularity bias. CLEPR scores 10 billion keyword-product pairs daily as a semantic guardrail before performance ranking, lifting CTR by 6%, anchored to embeddings that generalize to human-judged relevance.</p><p><strong><a href="https://medium.com/criteo-engineering/introducing-clepr-our-model-for-semantic-understanding-d3984eed84c8">https://medium.com/criteo-engineering/introducing-clepr-our-model-for-semantic-understanding-d3984eed84c8</a></strong></p><div><hr></div><h1>WMG: Why we shrank our TimescaleDB chunks from 30 days to 7</h1><p>Storage configuration choices made when a database is small calcify silently as ingest grows, turning yesterday&#8217;s reasonable defaults into compression failures and backfill storms. WMG Lab shrinks TimescaleDB chunk intervals from 30 days to 7 across its hot hypertables &#8212; using set_chunk_time_interval to retune future chunks only, without rewrites or locks. The smaller chunks let compression finish cleanly and reduce backfill to a single 7-day decompression instead of a month, anchored to the 25% active-chunk memory rule.</p><p><strong><a href="https://tech.wmg.com/why-we-shrank-our-timescaledb-chunks-from-30-days-to-7-07cab8afefc5">https://tech.wmg.com/why-we-shrank-our-timescaledb-chunks-from-30-days-to-7-07cab8afefc5</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #272]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-272</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-272</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 01 Jun 2026 03:18:34 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_31_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D0DX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!D0DX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!D0DX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!D0DX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D0DX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19038,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_31_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D0DX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!D0DX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!D0DX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!D0DX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ed9039-4c5e-4591-a213-9a7f0847cba8_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How to Build a Data Platform</h1><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating under a single platform. <br><br>In this book, you&#8217;ll learn:<br><br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_31_26_data_engineering_weekly">Download your free copy now</a></strong></p><div><hr></div><h1>Netflix: High-Throughput Graph Abstraction at Netflix</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6Yav!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6Yav!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic 424w, https://substackcdn.com/image/fetch/$s_!6Yav!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic 848w, https://substackcdn.com/image/fetch/$s_!6Yav!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic 1272w, https://substackcdn.com/image/fetch/$s_!6Yav!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6Yav!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic" width="1400" height="1016" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1016,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37305,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6Yav!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic 424w, https://substackcdn.com/image/fetch/$s_!6Yav!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic 848w, https://substackcdn.com/image/fetch/$s_!6Yav!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic 1272w, https://substackcdn.com/image/fetch/$s_!6Yav!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff1a8d83-8077-4a71-a2fb-c567c0016a8a_1400x1016.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One emerging data model envisions the organization model as a property graph abstraction, though the underlying data may be stored in a relational model, key/value store, or document store. GraphQL is one abstraction, but it leads to many complications while integrating the api. Netflix writes about its high-throughput graph abstraction and explains how it handles read-aside cache and write-aside cache. </p><p><strong><a href="https://netflixtechblog.com/high-throughput-graph-abstraction-at-netflix-part-i-e88063e6f6d5">https://netflixtechblog.com/high-throughput-graph-abstraction-at-netflix-part-i-e88063e6f6d5</a></strong></p><div><hr></div><h1>Slack: Slack AI - The Path to Multi-Cloud</h1><p>Building AI Infrastructure for a multi-tenant stack with enterprise security requirements is challenging in itself. Slack writes about evolving from SageMaker-managed endpoints to Bedrock provisioned throughput, Bedrock on-demand spillover, and a multi-cloud Vertex AI architecture with normalized APIs, model hierarchies, circuit breakers, health-aware routing, and feature-level model selection. The platform focuses on reducing the infrastructure lock-in, enabling same-day model migration, improving the quality of complex reasoning by about 10%, cutting latency for low-token workloads by about 67%, and increasing resilience against regional and provider-wide failures.</p><p><strong><a href="https://slack.engineering/slack-ai-the-path-to-multi-cloud/">https://slack.engineering/slack-ai-the-path-to-multi-cloud/</a></strong></p><div><hr></div><h1>Sponsored: Agents for Data Engineering</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.altimate.sh/reddit?utm_source=data-engineering-weekly-newsletter-sponsorship&amp;utm_medium=email&amp;utm_campaign=altimate-code-launch-newsletter-sponsorship" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mw3X!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic 424w, https://substackcdn.com/image/fetch/$s_!Mw3X!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic 848w, https://substackcdn.com/image/fetch/$s_!Mw3X!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic 1272w, https://substackcdn.com/image/fetch/$s_!Mw3X!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mw3X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic" width="1100" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14322,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://www.altimate.sh/reddit?utm_source=data-engineering-weekly-newsletter-sponsorship&amp;utm_medium=email&amp;utm_campaign=altimate-code-launch-newsletter-sponsorship&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mw3X!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic 424w, https://substackcdn.com/image/fetch/$s_!Mw3X!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic 848w, https://substackcdn.com/image/fetch/$s_!Mw3X!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic 1272w, https://substackcdn.com/image/fetch/$s_!Mw3X!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2572a6a7-84f6-4cde-9211-262e80c5e8c7_1100x578.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI agents are transforming data engineering &#8212; but they need the right tools to do it reliably. <br><br>Altimate Code in an open-source project that gives any agent 100+ deterministic tools for SQL, lineage, dbt, and warehouse connectivity, with a proven #1 ranking on ADE-Bench. One install. Tech-stack agnostic. No hallucinations. Production-ready from day one.</p><p><strong><a href="https://www.altimate.sh/reddit?utm_source=data-engineering-weekly-newsletter-sponsorship&amp;utm_medium=email&amp;utm_campaign=altimate-code-launch-newsletter-sponsorship">Try it out today &gt;</a></strong></p><div><hr></div><h1>Uber: Solving the Identity Crisis for AI Agents</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ecHS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ecHS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic 424w, https://substackcdn.com/image/fetch/$s_!ecHS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic 848w, https://substackcdn.com/image/fetch/$s_!ecHS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic 1272w, https://substackcdn.com/image/fetch/$s_!ecHS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ecHS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic" width="1456" height="1463" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1463,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30787,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ecHS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic 424w, https://substackcdn.com/image/fetch/$s_!ecHS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic 848w, https://substackcdn.com/image/fetch/$s_!ecHS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic 1272w, https://substackcdn.com/image/fetch/$s_!ecHS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F312d8ea1-aba7-4ffb-9da8-acf96ad6c63e_3015x3030.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The article seems so timely, as I recently finished a similar design for AI Identity. The conventional way to start the design is to assign the user role to the agent to work on their behalf, but the model soon fails. The services-centric IAM role also fell apart, since the same service can potentially assume multiple roles. Uber provides an in-depth account of how it handles these challenges.   </p><p><strong><a href="https://www.uber.com/us/en/blog/solving-the-agent-identity-crisis/">https://www.uber.com/us/en/blog/solving-the-agent-identity-crisis/</a></strong></p><div><hr></div><h1>Mehul Batra: A field journal on Ray Data and Daft for multimodal data lake</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7VHH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7VHH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic 424w, https://substackcdn.com/image/fetch/$s_!7VHH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic 848w, https://substackcdn.com/image/fetch/$s_!7VHH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic 1272w, https://substackcdn.com/image/fetch/$s_!7VHH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7VHH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic" width="1400" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23621,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7VHH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic 424w, https://substackcdn.com/image/fetch/$s_!7VHH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic 848w, https://substackcdn.com/image/fetch/$s_!7VHH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic 1272w, https://substackcdn.com/image/fetch/$s_!7VHH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19f38f4c-ad51-4575-982e-47d53cdc270d_1400x788.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Multimodal data lake pipelines on Kubernetes require engines that can handle mixed CPU, GPU, and async inference workloads, along with cataloging and checkpointing, across text, PDF, image, audio, video, and LLM metadata. The author compares Ray Data 2.55.1 and Daft 0.7.13 on the same KubeRay, DigitalOcean, Gravitino, Iceberg, Lance, and H100 setup, measuring wall time, GPU occupancy, memory, code size, native primitives, concurrency behavior, and completion reliability across eight use cases. Ray Data scored 56 versus Daft&#8217;s 47 out of 70 because both engines tied on most multimodal batch workloads, Daft delivered stronger native media ergonomics, and Ray Data won the decisive async LLM inference workload by completing the 50k-email job with enforced concurrency while Daft failed to finish.</p><p><strong><a href="https://mehulbatra.medium.com/a-field-journal-on-ray-data-and-daft-for-multimodal-data-lake-e0c26839f5b5">https://mehulbatra.medium.com/a-field-journal-on-ray-data-and-daft-for-multimodal-data-lake-e0c26839f5b5</a></strong></p><div><hr></div><h1>Sponsored: AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=05_31_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qsl0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!qsl0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!qsl0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!qsl0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qsl0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15511,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=05_31_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qsl0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!qsl0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!qsl0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!qsl0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13894556-eedd-415c-ae67-ffed4e67f8a0_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=05_31_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>LakeOps: Routing Multiple Query Engines with Iceberg</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CEF6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CEF6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic 424w, https://substackcdn.com/image/fetch/$s_!CEF6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic 848w, https://substackcdn.com/image/fetch/$s_!CEF6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic 1272w, https://substackcdn.com/image/fetch/$s_!CEF6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CEF6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28488,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CEF6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic 424w, https://substackcdn.com/image/fetch/$s_!CEF6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic 848w, https://substackcdn.com/image/fetch/$s_!CEF6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic 1272w, https://substackcdn.com/image/fetch/$s_!CEF6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F406d7963-ce24-4723-ac35-b5c96327b84f_1672x941.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Multi-engine Iceberg deployments let Spark, Trino, Flink, DuckDB, Athena, Snowflake, and StarRocks share the same tables. Still, they leave query placement, cost control, dialect compatibility, and workload isolation to each client team. LakeOps writes about QueryFlux, a Rust-based SQL routing proxy that supports existing database protocols, selects engine groups via routing rules, translates SQL dialects with sqlglot, enforces concurrency limits, and dispatches queries based on cost, latency, throughput, or health. </p><p><strong><a href="https://lakeops.dev/blog/routing-multiple-query-engines-with-iceberg">https://lakeops.dev/blog/routing-multiple-query-engines-with-iceberg</a></strong></p><div><hr></div><h1>Rion Williams: Prepare for Launch: Enrichment Strategies for Apache Flink</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wcjj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wcjj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic 424w, https://substackcdn.com/image/fetch/$s_!Wcjj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic 848w, https://substackcdn.com/image/fetch/$s_!Wcjj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic 1272w, https://substackcdn.com/image/fetch/$s_!Wcjj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wcjj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic" width="1456" height="805" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:805,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22363,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wcjj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic 424w, https://substackcdn.com/image/fetch/$s_!Wcjj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic 848w, https://substackcdn.com/image/fetch/$s_!Wcjj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic 1272w, https://substackcdn.com/image/fetch/$s_!Wcjj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F154d6183-f70a-429a-8ab0-f085c93f75af_1958x1082.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Flink enrichment jobs become fragile when they depend on referential data that is not already present in state, forcing teams to choose between slow per-record lookups, incomplete early results, or complex state preloading. The article compares external enrichment, CDC-backed gradual enrichment, State Processor API&#8211;based two-phase bootstrapping, and gated enrichment as ways to manage the launch-time pain of warming the state before live traffic depends on it. </p><p><strong><a href="https://rion.io/2026/01/27/prepare-for-launch-enrichment-strategies-for-apache-flink/">https://rion.io/2026/01/27/prepare-for-launch-enrichment-strategies-for-apache-flink/</a></strong></p><div><hr></div><h1>Agoda: How Agoda Simulates Booking Flows to Test Flight Integrations</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XMWJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XMWJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic 424w, https://substackcdn.com/image/fetch/$s_!XMWJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic 848w, https://substackcdn.com/image/fetch/$s_!XMWJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic 1272w, https://substackcdn.com/image/fetch/$s_!XMWJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XMWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic" width="1400" height="611" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:611,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XMWJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic 424w, https://substackcdn.com/image/fetch/$s_!XMWJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic 848w, https://substackcdn.com/image/fetch/$s_!XMWJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic 1272w, https://substackcdn.com/image/fetch/$s_!XMWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9579918c-00a9-4ed4-b042-e669a775f844_1400x611.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Agents perform relatively well when there is a faster feedback loop to tune, but unfortunately, in many business cases, the transactions are sensitive. Simulation is the most critical aspect of engineering components, not only for testing but also as an RL environment for Agents. Digital twins are emerging as a critical component of simulation. Agoda describes a simulation system for booking flows to test flight integrations. </p><p><strong><a href="https://medium.com/agoda-engineering/how-agoda-simulates-booking-flows-to-test-flight-integrations-204ec4f2e128">https://medium.com/agoda-engineering/how-agoda-simulates-booking-flows-to-test-flight-integrations-204ec4f2e128</a></strong></p><div><hr></div><h1>Databricks: Introducing Arrow UDFs in PySpark: A Faster, Leaner Replacement for Pandas UDFs</h1><p>PySpark Pandas UDFs improved Python execution with Arrow serialization and batching, but still incur Pandas/Arrow conversion costs, add memory copies, and struggle with complex data types. Databricks writes about the introduction of native Arrow UDFs, Arrow aggregate functions, Arrow UDTFs, and DataFrame-level <code>mapInArrow</code> and <code>applyInArrow</code> APIs that operate directly on PyArrow arrays, record batches, and tables. Arrow UDFs preserve columnar execution end-to-end, run about 10% faster than Pandas UDFs, use about 40% less memory, and support complex datatype and table-in/table-out transformations with familiar Python and SQL interfaces.</p><p><strong><a href="https://www.databricks.com/blog/introducing-arrow-udfs-pyspark-faster-leaner-replacement-pandas-udfs">https://www.databricks.com/blog/introducing-arrow-udfs-pyspark-faster-leaner-replacement-pandas-udfs</a></strong></p><div><hr></div><h1>Giannis Polyzos: When Tables Became the Language of Time</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5IZz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5IZz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic 424w, https://substackcdn.com/image/fetch/$s_!5IZz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic 848w, https://substackcdn.com/image/fetch/$s_!5IZz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic 1272w, https://substackcdn.com/image/fetch/$s_!5IZz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5IZz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic" width="1230" height="597" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:597,&quot;width&quot;:1230,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/200062203?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5IZz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic 424w, https://substackcdn.com/image/fetch/$s_!5IZz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic 848w, https://substackcdn.com/image/fetch/$s_!5IZz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic 1272w, https://substackcdn.com/image/fetch/$s_!5IZz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ed493d4-7161-4c60-95dd-5c98c6e66332_1230x597.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Batch and streaming architectures create duplicated logic, fragmented corrections, and inconsistent state when systems treat logs as the source of truth instead of modeling how data changes over time. The author reframes streams and tables as two views of the same history, using Flink for streaming-first compute, lakehouse commits for durable memory, and Apache Fluss as table-first streaming storage with native updates, deletes, schemas, snapshots, and columnar access. Table-centric streaming collapses batch jobs, replays, late corrections, and real-time dashboards into different temporal views of evolving state, making unification an architectural consequence rather than a glue layer.</p><p><strong><a href="https://ipolyzos.substack.com/p/when-tables-became-the-language-of">https://ipolyzos.substack.com/p/when-tables-became-the-language-of</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #271]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-271</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-271</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 25 May 2026 16:05:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_24_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TS4P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!TS4P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!TS4P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!TS4P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TS4P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20296,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_24_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TS4P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!TS4P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!TS4P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!TS4P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fecfaca37-0a9c-4e5c-a72c-27f244fac0f3_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How to Build a Data Platform</h1><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating undering a single platform. <br><br>In this book, you&#8217;ll learn:<br><br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_24_26_data_engineering_weekly">Download your free copy now</a></strong></p><div><hr></div><h1>Netflix: The Evolution of Cassandra Data Movement at Netflix</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dC_s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dC_s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic 424w, https://substackcdn.com/image/fetch/$s_!dC_s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic 848w, https://substackcdn.com/image/fetch/$s_!dC_s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic 1272w, https://substackcdn.com/image/fetch/$s_!dC_s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dC_s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic" width="1400" height="514" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:514,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22132,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dC_s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic 424w, https://substackcdn.com/image/fetch/$s_!dC_s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic 848w, https://substackcdn.com/image/fetch/$s_!dC_s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic 1272w, https://substackcdn.com/image/fetch/$s_!dC_s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d9df320-c115-4e1e-bf5c-2871799b3de6_1400x514.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Change Data Capture (CDC) from the operational store is often expensive and involves multiple staging hops and an expensive merge operation in Iceberg. Netflix writes one such case study with its Cassandra, the challenges with capturing operational data into Iceberg tables, and its solution to avoid partition skew with a layered approach.  </p><p><strong><a href="https://netflixtechblog.medium.com/the-evolution-of-cassandra-data-movement-at-netflix-6e13329c80a1">https://netflixtechblog.medium.com/the-evolution-of-cassandra-data-movement-at-netflix-6e13329c80a1</a></strong></p><div><hr></div><h1>Grab: The Hugo evolution: Engineering Grab&#8217;s unified, one-click data ingestion platform with Apache Flink</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jdHB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jdHB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic 424w, https://substackcdn.com/image/fetch/$s_!jdHB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic 848w, https://substackcdn.com/image/fetch/$s_!jdHB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic 1272w, https://substackcdn.com/image/fetch/$s_!jdHB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jdHB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24919,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jdHB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic 424w, https://substackcdn.com/image/fetch/$s_!jdHB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic 848w, https://substackcdn.com/image/fetch/$s_!jdHB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic 1272w, https://substackcdn.com/image/fetch/$s_!jdHB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F926e0fc8-a051-4280-b8b8-9168553e606d_1672x941.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Grab narrates a similar challenge with the CDC, highlighting the disintegration of data ingestion across multiple operational data stores with schema management and ingestion issues. The unified pipeline, with the Flink pipeline, auto-detects schema changes and ingests the data back into the Hive Tables. </p><p><strong><a href="https://engineering.grab.com/one-click-data-ingestion-platform-with-apache-flink">https://engineering.grab.com/one-click-data-ingestion-platform-with-apache-flink</a></strong></p><div><hr></div><h1>Sponsored: Agents for Data Engineering</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.altimate.sh/reddit?utm_source=data-engineering-weekly-newsletter-sponsorship&amp;utm_medium=email&amp;utm_campaign=altimate-code-launch-newsletter-sponsorship" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Qsp-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic 424w, https://substackcdn.com/image/fetch/$s_!Qsp-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic 848w, https://substackcdn.com/image/fetch/$s_!Qsp-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic 1272w, https://substackcdn.com/image/fetch/$s_!Qsp-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Qsp-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic" width="1100" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15293,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://www.altimate.sh/reddit?utm_source=data-engineering-weekly-newsletter-sponsorship&amp;utm_medium=email&amp;utm_campaign=altimate-code-launch-newsletter-sponsorship&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Qsp-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic 424w, https://substackcdn.com/image/fetch/$s_!Qsp-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic 848w, https://substackcdn.com/image/fetch/$s_!Qsp-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic 1272w, https://substackcdn.com/image/fetch/$s_!Qsp-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F948b323a-7bb5-4b08-86da-cf2143550a58_1100x578.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI agents are transforming data engineering &#8212; but they need the right tools to do it reliably. <br><br>Altimate Code in an open-source project that gives any agent 100+ deterministic tools for SQL, lineage, dbt, and warehouse connectivity, with a proven #1 ranking on ADE-Bench. One install. Tech-stack agnostic. No hallucinations. Production-ready from day one.</p><p><strong><a href="https://www.altimate.sh/reddit?utm_source=data-engineering-weekly-newsletter-sponsorship&amp;utm_medium=email&amp;utm_campaign=altimate-code-launch-newsletter-sponsorship">Try it out today &gt;</a></strong></p><div><hr></div><h1>Meta: A Blueprint for Valuing Content When A/B Tests Are Not an Option</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XiTe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XiTe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic 424w, https://substackcdn.com/image/fetch/$s_!XiTe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic 848w, https://substackcdn.com/image/fetch/$s_!XiTe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic 1272w, https://substackcdn.com/image/fetch/$s_!XiTe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XiTe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic" width="1400" height="362" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:362,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16828,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!XiTe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic 424w, https://substackcdn.com/image/fetch/$s_!XiTe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic 848w, https://substackcdn.com/image/fetch/$s_!XiTe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic 1272w, https://substackcdn.com/image/fetch/$s_!XiTe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6750f43b-f271-4cc4-ae75-a4ae417c89be_1400x362.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Content is a primary driver of the Quest ecosystem. With the recent announcement at Google I/O about seamless shopping integration with content, it is evident that content-driven commerce has reached the mainstream. How do you value the contents when there is no A/B testing option available? Meta writes about implementing the <strong><a href="https://docs.doubleml.org/stable/index.html">DoubleML method</a></strong> to tackle the challenge. </p><p><strong><a href="https://medium.com/@AnalyticsAtMeta/meta-a-blueprint-for-valuing-content-when-a-b-tests-are-not-an-option-7880bac721f1">https://medium.com/@AnalyticsAtMeta/meta-a-blueprint-for-valuing-content-when-a-b-tests-are-not-an-option-7880bac721f1</a></strong></p><div><hr></div><h1>Uber: Scaling Real-Time Traffic Forecasting with a Graph-Aware Transformer</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5i_Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5i_Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic 424w, https://substackcdn.com/image/fetch/$s_!5i_Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic 848w, https://substackcdn.com/image/fetch/$s_!5i_Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic 1272w, https://substackcdn.com/image/fetch/$s_!5i_Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5i_Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic" width="1412" height="1300" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1300,&quot;width&quot;:1412,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79288,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5i_Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic 424w, https://substackcdn.com/image/fetch/$s_!5i_Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic 848w, https://substackcdn.com/image/fetch/$s_!5i_Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic 1272w, https://substackcdn.com/image/fetch/$s_!5i_Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd48af68c-1524-4049-8e51-f9b509129460_1412x1300.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Uber writes about rebuilding the traffic forecasting stack, DeepETT, a real-time traffic forecasting system. DeepETT approaches forecasting as a fixed-input graph-aware transformer that combines pre-aggregated segment, road-graph, regional, historical, real-time, and event features with continuous Flink-based calibration.</p><p><strong><a href="https://www.uber.com/us/en/blog/scaling-real-time-traffic/">https://www.uber.com/us/en/blog/scaling-real-time-traffic/</a></strong></p><div><hr></div><h1>Sponsored: Free Course: AI-Driven Data Engineering</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_24_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T4Tz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!T4Tz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!T4Tz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!T4Tz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T4Tz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8174,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_24_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T4Tz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!T4Tz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!T4Tz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!T4Tz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7de3f0e9-e092-4a8b-9446-1acba1066825_1200x630.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI coding agents are changing how data engineers work. This Dagster University course shows how to build a production-ready ELT pipeline from prompts while learning practical patterns for reliable AI-assisted development.<br><br>This course is designed for engineers exploring agentic coding workflows and engineers who want to learn Dagster or become Dagster power users.</p><p><strong><a href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_24_26_data_engineering_weekly">Get started now</a></strong></p><div><hr></div><h1>Airbnb: Scaling Airbnb&#8217;s identity graph with a unified knowledge graph infrastructure</h1><blockquote><p>Counting and Finding Unique Users are the two hard problems in Data Engineering. </p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!a7Te!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!a7Te!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic 424w, https://substackcdn.com/image/fetch/$s_!a7Te!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic 848w, https://substackcdn.com/image/fetch/$s_!a7Te!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic 1272w, https://substackcdn.com/image/fetch/$s_!a7Te!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!a7Te!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic" width="1036" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1036,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11664,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!a7Te!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic 424w, https://substackcdn.com/image/fetch/$s_!a7Te!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic 848w, https://substackcdn.com/image/fetch/$s_!a7Te!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic 1272w, https://substackcdn.com/image/fetch/$s_!a7Te!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F677f80d4-2e92-4db2-86f4-3bf080c45716_1036x788.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One of the long-standing questions in data engineering is: since many real-world systems are fundamentally about connections, why can&#8217;t we model them using the graph data model? Airbnb highlighted the reasons for the scalability issues with Graph and its adoption of JanusGraph, using DynamoDB as a backend. </p><p><strong><a href="https://medium.com/airbnb-engineering/scaling-airbnbs-identity-graph-with-a-unified-knowledge-graph-infrastructure-ebac467b7836">https://medium.com/airbnb-engineering/scaling-airbnbs-identity-graph-with-a-unified-knowledge-graph-infrastructure-ebac467b7836</a></strong></p><div><hr></div><h1>Pinterest: Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jW2V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jW2V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic 424w, https://substackcdn.com/image/fetch/$s_!jW2V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic 848w, https://substackcdn.com/image/fetch/$s_!jW2V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic 1272w, https://substackcdn.com/image/fetch/$s_!jW2V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jW2V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic" width="1400" height="947" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:947,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11568,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jW2V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic 424w, https://substackcdn.com/image/fetch/$s_!jW2V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic 848w, https://substackcdn.com/image/fetch/$s_!jW2V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic 1272w, https://substackcdn.com/image/fetch/$s_!jW2V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcd1d72fb-a6f8-40c4-af8d-4ee88f724120_1400x947.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The user journey/user sequence of actions is one of the most important signals for analyzing user behavior. Pinterest publishes a comprehensive case study on how to approach user sequence data as a product and its architectural patterns. </p><p><strong><a href="https://medium.com/pinterest-engineering/making-user-sequence-data-more-cost-efficient-faster-and-easier-to-use-2a56a928cae1">https://medium.com/pinterest-engineering/making-user-sequence-data-more-cost-efficient-faster-and-easier-to-use-2a56a928cae1</a></strong></p><div><hr></div><h1>Yelp: How Partition Access Visualizations Reduced our Data Lake S3 Cost by 33%</h1><p>Usage-driven data retention &amp; storage class optimization is a must-have tool for your Lakehouse management, given the growing need to ingest more data. Yelp applies the art and science of table management by collecting usage metrics at the table-partition level to optimize storage. </p><p><strong><a href="https://engineeringblog.yelp.com/2026/05/partition-access-visualizations.html">https://engineeringblog.yelp.com/2026/05/partition-access-visualizations.html</a></strong></p><div><hr></div><h1>LinkedIn: Crosscheck: Benchmarking AI Models in the Real World</h1><p>Static AI benchmarks lose signal as models optimize toward them, collapsing role-, industry-, and task-specific performance into one number that answers no professional&#8217;s actual question. LinkedIn writes about Crosscheck, which extends the Bradley-Terry comparison model with time-decay weighting, low-data regularization, and confidence-aware ordinal tiering &#8212; surfacing only differences supported by 95% statistical evidence.</p><p><strong><a href="https://www.linkedin.com/blog/engineering/ai/crosscheck-benchmarking-ai-models-in-the-real-world">https://www.linkedin.com/blog/engineering/ai/crosscheck-benchmarking-ai-models-in-the-real-world</a></strong></p><div><hr></div><h1>Jack Vanlightly: Introducing Dimster, a performance benchmarking tool for Apache Kafka</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GtB5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GtB5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic 424w, https://substackcdn.com/image/fetch/$s_!GtB5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic 848w, https://substackcdn.com/image/fetch/$s_!GtB5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic 1272w, https://substackcdn.com/image/fetch/$s_!GtB5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GtB5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic" width="980" height="503" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:503,&quot;width&quot;:980,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13897,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/199198908?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GtB5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic 424w, https://substackcdn.com/image/fetch/$s_!GtB5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic 848w, https://substackcdn.com/image/fetch/$s_!GtB5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic 1272w, https://substackcdn.com/image/fetch/$s_!GtB5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3bad468e-5eb1-472c-bea5-c842b922d2c9_980x503.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Kafka performance benchmarks rarely travel &#8212; results lack the configuration, hardware, and version metadata that another engineer needs to reproduce or trust them. The author builds Dimster, a Kafka benchmarking tool centered on dimensional testing &#8212; sweeping config axes like batch.size or consumer type while emitting self-contained result bundles. Dimster runs explore, drain-backlog, and correctness modes on Kubernetes as a portable runtime, making benchmark campaigns reproducible across any cloud or laptop, anchored to traceable result artifacts.</p><p><strong><a href="https://jack-vanlightly.com/blog/2026/5/20/introducing-dimster-a-performance-benchmarking-tool-for-apache-kafka">https://jack-vanlightly.com/blog/2026/5/20/introducing-dimster-a-performance-benchmarking-tool-for-apache-kafka</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #270]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-270</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-270</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 18 May 2026 01:54:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_17_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fgL1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!fgL1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!fgL1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!fgL1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fgL1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22035,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_17_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fgL1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!fgL1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!fgL1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!fgL1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F09b34458-254f-46a9-8f1d-18c281487b36_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>The Data Platform Fundamentals Guide</h1><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating under a single platform. <br><br>In this book, you&#8217;ll learn:<br><br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_17_26_data_engineering_weekly">Download your free copy now</a></strong></p><div><hr></div><h1>Airbnb: Viaduct 1.0 and the future of Airbnb&#8217;s data mesh</h1><p>The term Data Mesh focuses on the platform &amp; services, Viaduct aims to be the multi-tenant GraphQL system, unlike the data mesh as we know it in data engineering. However, the first principle of Viaduct is something the data teams should drive towards to bring a more connected experience across the business functions. </p><blockquote><p><em>One global schema. Independent teams contribute schema and resolvers </em></p></blockquote><p><strong><a href="https://medium.com/airbnb-engineering/viaduct-1-0-and-the-future-of-airbnbs-data-mesh-6bab4ec98b89">https://medium.com/airbnb-engineering/viaduct-1-0-and-the-future-of-airbnbs-data-mesh-6bab4ec98b89</a></strong></p><div><hr></div><h1>Netflix: Data Projects: Managing Data Assets at Netflix Scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RZ3p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RZ3p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic 424w, https://substackcdn.com/image/fetch/$s_!RZ3p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic 848w, https://substackcdn.com/image/fetch/$s_!RZ3p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic 1272w, https://substackcdn.com/image/fetch/$s_!RZ3p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RZ3p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic" width="1400" height="556" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:556,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:10941,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RZ3p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic 424w, https://substackcdn.com/image/fetch/$s_!RZ3p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic 848w, https://substackcdn.com/image/fetch/$s_!RZ3p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic 1272w, https://substackcdn.com/image/fetch/$s_!RZ3p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91b80e8a-c191-4467-8a8f-f764babc0552_1400x556.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Netflix writes about the operational challenges of enforcing ACLs at the individual user level. When pipelines run under an On-Behalf-Of permission model, access frequently breaks when employees change teams or leave the organization. To address this, Netflix adopts the concept of Data Projects, replacing user-bound identities with durable, team-owned application identities.</p><p><strong><a href="https://netflixtechblog.medium.com/data-projects-managing-data-assets-at-netflix-scale-7ca25888591e">https://netflixtechblog.medium.com/data-projects-managing-data-assets-at-netflix-scale-7ca25888591e</a></strong></p><div><hr></div><h1>Sponsored: Agents for Data Engineering</h1><p>AI agents are transforming data engineering &#8212; but they need the right tools to do it reliably. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g56H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g56H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic 424w, https://substackcdn.com/image/fetch/$s_!g56H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic 848w, https://substackcdn.com/image/fetch/$s_!g56H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic 1272w, https://substackcdn.com/image/fetch/$s_!g56H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g56H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic" width="1100" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1100,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17363,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g56H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic 424w, https://substackcdn.com/image/fetch/$s_!g56H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic 848w, https://substackcdn.com/image/fetch/$s_!g56H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic 1272w, https://substackcdn.com/image/fetch/$s_!g56H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe44ff102-a3f6-47c8-a255-6a8cfc11453f_1100x578.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Altimate Code in an open-source project that gives any agent 100+ deterministic tools for SQL, lineage, dbt, and warehouse connectivity, with a proven #1 ranking on ADE-Bench. One install. Tech-stack agnostic. No hallucinations. Production-ready from day one.</p><p><strong><a href="https://www.altimate.sh/reddit?utm_source=data-engineering-weekly-newsletter-sponsorship&amp;utm_medium=email&amp;utm_campaign=altimate-code-launch-newsletter-sponsorship">Try it out today &gt;</a></strong></p><div><hr></div><h1>Meta: Migrating Data Ingestion Systems at Meta Scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GQOL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GQOL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic 424w, https://substackcdn.com/image/fetch/$s_!GQOL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic 848w, https://substackcdn.com/image/fetch/$s_!GQOL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic 1272w, https://substackcdn.com/image/fetch/$s_!GQOL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GQOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic" width="1406" height="550" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:550,&quot;width&quot;:1406,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13379,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GQOL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic 424w, https://substackcdn.com/image/fetch/$s_!GQOL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic 848w, https://substackcdn.com/image/fetch/$s_!GQOL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic 1272w, https://substackcdn.com/image/fetch/$s_!GQOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe69dda67-2474-48e9-bc33-b314f10f4eb6_1406x550.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Meta writes about successfully migrating tens of thousands of mission-critical pipelines at the petabyte scale. The blog focused on how it ensures the shadow testing mode to enable end-to-end robustness of the migration process. The partition-level metadata flags are an interesting approach. If a partition was marked as &#8220;bad,&#8221; the system automatically halted new delta landings and forced merges with older, known-good partitions, stopping the bleeding instantly.</p><p><strong><a href="https://engineering.fb.com/2026/05/12/data-infrastructure/migrating-data-ingestion-systems-at-meta-scale/">https://engineering.fb.com/2026/05/12/data-infrastructure/migrating-data-ingestion-systems-at-meta-scale/</a></strong></p><div><hr></div><h1>Databricks: The Convergence of Open Table Formats and Open Catalogs: Catalog Commits is Generally Available</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TJAR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TJAR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic 424w, https://substackcdn.com/image/fetch/$s_!TJAR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic 848w, https://substackcdn.com/image/fetch/$s_!TJAR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic 1272w, https://substackcdn.com/image/fetch/$s_!TJAR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TJAR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic" width="1456" height="617" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9398,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TJAR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic 424w, https://substackcdn.com/image/fetch/$s_!TJAR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic 848w, https://substackcdn.com/image/fetch/$s_!TJAR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic 1272w, https://substackcdn.com/image/fetch/$s_!TJAR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3eff149-daa8-44a8-8376-d1ea44e2f5dc_1920x814.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The data infrastructure went through a full lifecycle, from Hive metastore to S3 as the primary metastore, and back to catalogs. Databricks writes about the convergence of table formats and the catalogs, with how Unity Catalog and Delta Lake formats interplay. </p><p><strong><a href="https://www.databricks.com/blog/convergence-open-table-formats-and-open-catalogs-catalog-commits-generally-available">https://www.databricks.com/blog/convergence-open-table-formats-and-open-catalogs-catalog-commits-generally-available</a></strong></p><div><hr></div><h1>Sponsored: Free Course - AI-Driven Data Engineering</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_17_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QKre!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!QKre!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!QKre!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!QKre!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QKre!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_17_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QKre!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!QKre!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!QKre!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!QKre!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb19afd95-b0c6-49d2-aa2a-4e5266774297_1200x630.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI coding agents are changing how data engineers work. This Dagster University course shows how to build a production-ready ELT pipeline from prompts while learning practical patterns for reliable AI-assisted development.<br><br>This course is designed for engineers exploring agentic coding workflows and engineers who want to learn Dagster or become Dagster power users</p><p><strong><a href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_17_26_data_engineering_weekly">Get started now</a></strong></p><div><hr></div><h1>Eric Sun: A Query Proxy for Analytical and Fast Data</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!47jB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!47jB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic 424w, https://substackcdn.com/image/fetch/$s_!47jB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic 848w, https://substackcdn.com/image/fetch/$s_!47jB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic 1272w, https://substackcdn.com/image/fetch/$s_!47jB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!47jB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic" width="799" height="561" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/283d4015-231a-4661-9a83-d06994f48c28_799x561.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:561,&quot;width&quot;:799,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12387,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!47jB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic 424w, https://substackcdn.com/image/fetch/$s_!47jB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic 848w, https://substackcdn.com/image/fetch/$s_!47jB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic 1272w, https://substackcdn.com/image/fetch/$s_!47jB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F283d4015-231a-4661-9a83-d06994f48c28_799x561.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The author writes about the Query Proxy, which fronts Snowflake, Databricks, StarRocks, and Iceberg via gRPC, with service-credential-based RBAC, and returns async query IDs and presigned Parquet URLs to clients. The proxy collapses per-engine clients into a single interface, federates hot Postgres data with warm Iceberg partitions, and tracks query-shape telemetry, all anchored to a stateless gateway model.</p><p><strong><a href="https://eric-sun.medium.com/a-query-proxy-for-371907878996">https://eric-sun.medium.com/a-query-proxy-for-371907878996</a></strong></p><div><hr></div><h1>The Craft: How we rebuilt search ranking at Faire with deep learning</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LpjC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LpjC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic 424w, https://substackcdn.com/image/fetch/$s_!LpjC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic 848w, https://substackcdn.com/image/fetch/$s_!LpjC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic 1272w, https://substackcdn.com/image/fetch/$s_!LpjC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LpjC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic" width="1400" height="563" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18490,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LpjC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic 424w, https://substackcdn.com/image/fetch/$s_!LpjC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic 848w, https://substackcdn.com/image/fetch/$s_!LpjC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic 1272w, https://substackcdn.com/image/fetch/$s_!LpjC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8dd885dd-6725-4221-ab76-3841b16988bc_1400x563.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Faire writes about replacing XGBoost with a Deep and Cross Network trained with session-normalized listwise cross-entropy, then accelerates new-ranker development by fine-tuning Brand Page models from Product Search. The rebuild increases Product Search order volume by 2.14% in North America and 1.54% in Europe while cutting new-surface launch cycles by half through a shared multi-task representation.</p><p><strong><a href="https://craft.faire.com/how-we-rebuilt-search-ranking-at-faire-with-deep-learning-14f080679c83">https://craft.faire.com/how-we-rebuilt-search-ranking-at-faire-with-deep-learning-14f080679c83</a></strong></p><div><hr></div><h1>Alibaba: When MySQL Meets the Columnar Storage Engine DuckDB in the AI Era</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0cdW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0cdW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic 424w, https://substackcdn.com/image/fetch/$s_!0cdW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic 848w, https://substackcdn.com/image/fetch/$s_!0cdW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic 1272w, https://substackcdn.com/image/fetch/$s_!0cdW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0cdW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic" width="1080" height="606" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:606,&quot;width&quot;:1080,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13989,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/198193665?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0cdW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic 424w, https://substackcdn.com/image/fetch/$s_!0cdW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic 848w, https://substackcdn.com/image/fetch/$s_!0cdW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic 1272w, https://substackcdn.com/image/fetch/$s_!0cdW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd409aff1-b3b0-492b-97e9-bdb0522923aa_1080x606.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AliSQL writes about integrating DuckDB as a pluggable storage engine, transforming MySQL into a lightweight Hybrid Transactional and Analytical Processing (HTAP) database. Motherduck recently released the <strong><a href="https://motherduck.com/blog/duckdb-client-server/">client-server protocol</a> </strong>DuckDB, as it is slowly moving to a warehouse-style engine on its own. </p><p><strong><a href="https://www.alibabacloud.com/blog/when-mysql-meets-the-columnar-storage-engine-duckdb-in-the-ai-era_603117?spm=a2c65.11461433.0.0.2e5d5355O7iepw">https://www.alibabacloud.com/blog/when-mysql-meets-the-columnar-storage-engine-duckdb-in-the-ai-era_603117?spm=a2c65.11461433.0.0.2e5d5355O7iepw</a></strong></p><div><hr></div><h1>Marc Bowes: Aurora DSQL: Meet Coupler</h1><p>The author writes about the design of Coupler, a highly scalable CDC solution for Aurora DSQL. Traditionally, CDC is hard on databases and degrades performance. Because DSQL&#8217;s architecture already decouples its read/write paths using a low-latency journal fan-out, Coupler acts as another subscriber, making it highly performant. </p><p><strong><a href="https://marc-bowes.com/dsql-coupler.html">https://marc-bowes.com/dsql-coupler.html</a></strong></p><div><hr></div><h1>Pete: Full-Text Search with DuckDB</h1><p>The author writes about DuckDB&#8217;s Full-Text Search (FTS) extension for data workflows, such as exploratory text mining and document archive analysis. The development is worth watching, as I have always found cases that balance both free-text analytics and typical exploratory analysis. Apache Pinot &amp; ParadeDB are the two systems I know of that try to handle these cases. </p><p><strong><a href="https://peterdohertys.website/blog-posts/full-text-search-w-duckdb.html">https://peterdohertys.website/blog-posts/full-text-search-w-duckdb.html</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #269]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-269</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-269</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 11 May 2026 02:50:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_10_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ObXU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!ObXU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!ObXU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!ObXU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ObXU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32846,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_10_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ObXU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!ObXU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!ObXU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!ObXU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F92fb4558-4875-4979-bca7-f2ce58e233e7_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>The Data Platform Fundamentals Guide</h1><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating under a single platform. <br><br>In this book, you&#8217;ll learn:<br><br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=05_10_26_data_engineering_weekly">Download your free copy now</a></strong></p><div><hr></div><h1>Meta: How We Built an AI Second Brain for 60K Knowledge Workers</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HLYm!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HLYm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic 424w, https://substackcdn.com/image/fetch/$s_!HLYm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic 848w, https://substackcdn.com/image/fetch/$s_!HLYm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic 1272w, https://substackcdn.com/image/fetch/$s_!HLYm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HLYm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic" width="1126" height="892" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/faf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:892,&quot;width&quot;:1126,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15160,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HLYm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic 424w, https://substackcdn.com/image/fetch/$s_!HLYm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic 848w, https://substackcdn.com/image/fetch/$s_!HLYm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic 1272w, https://substackcdn.com/image/fetch/$s_!HLYm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaf7c910-8ba2-4877-9001-576968c9c658_1126x892.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p> Meta writes about building an AI Second Brain that combines a PARA-based workspace model, authenticated infrastructure access via MCPs and CLIs, agentic execution loops, and reusable markdown-defined skills that continuously organize, retrieve, and act on structured work context across tools and projects. The article is a classic case of data engineering evolving into context engineering, making a broader impact in the industry. </p><p><strong><a href="https://medium.com/@AnalyticsAtMeta/how-we-built-an-ai-second-brain-for-60k-knowledge-workers-78c507dd795b">https://medium.com/@AnalyticsAtMeta/how-we-built-an-ai-second-brain-for-60k-knowledge-workers-78c507dd795b</a></strong></p><div><hr></div><h1>Salesforce: How Informatica Built a Multi-Agent AI System to Reduce Data Workflows from Months to Days</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ijjK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ijjK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic 424w, https://substackcdn.com/image/fetch/$s_!ijjK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic 848w, https://substackcdn.com/image/fetch/$s_!ijjK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic 1272w, https://substackcdn.com/image/fetch/$s_!ijjK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ijjK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic" width="1456" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15075,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ijjK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic 424w, https://substackcdn.com/image/fetch/$s_!ijjK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic 848w, https://substackcdn.com/image/fetch/$s_!ijjK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic 1272w, https://substackcdn.com/image/fetch/$s_!ijjK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe6c2ded-329b-4c95-b834-013e832f0980_2048x933.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Enterprise data workflows break down when discovery, governance, data quality, and orchestration span disconnected systems, requiring repeated human coordination across tools, teams, and execution stages. Informatica writes about the CLAIRE platform, which addresses fragmentation through a multi-agent architecture that combines orchestration agents, semantic context modeling, deterministic tool routing, and specialized execution agents to coordinate workflows involving 50&#8211;60 model calls with validation checkpoints and adaptive planning. </p><p><strong><a href="https://engineering.salesforce.com/how-informatica-built-a-multi-agent-ai-system-to-reduce-data-workflows-from-months-to-days/">https://engineering.salesforce.com/how-informatica-built-a-multi-agent-ai-system-to-reduce-data-workflows-from-months-to-days/</a></strong></p><div><hr></div><h1>Netflix: Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qvTO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qvTO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic 424w, https://substackcdn.com/image/fetch/$s_!qvTO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic 848w, https://substackcdn.com/image/fetch/$s_!qvTO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic 1272w, https://substackcdn.com/image/fetch/$s_!qvTO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qvTO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic" width="682" height="496" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:496,&quot;width&quot;:682,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8916,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qvTO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic 424w, https://substackcdn.com/image/fetch/$s_!qvTO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic 848w, https://substackcdn.com/image/fetch/$s_!qvTO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic 1272w, https://substackcdn.com/image/fetch/$s_!qvTO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab60205d-5b8a-4cca-b12f-9ce4c05b3173_682x496.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>ML platforms fragment across registries, orchestrators, feature stores, and experimentation systems, forcing practitioners to traverse multiple tools to trace lineage or reuse assets across domains. Netflix writes about its Metadata Service, which ingests events via a unified AIP URI scheme and materializes a Model Lifecycle Graph in Datomic, connecting models, features, pipelines, and experiments through asynchronous enrichment. The graph collapses multi-system investigations into a single GraphQL traversal from models to features to pipelines to A/B tests, standardizing cross-domain discovery anchored in a normalized entity model.</p><p><strong><a href="https://netflixtechblog.com/democratizing-machine-learning-at-netflix-building-the-model-lifecycle-graph-5cc6d5828bb1">https://netflixtechblog.com/democratizing-machine-learning-at-netflix-building-the-model-lifecycle-graph-5cc6d5828bb1</a></strong></p><div><hr></div><h1>Sponsored: Free Course: AI-Driven Data Engineering</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_10_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vGeo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!vGeo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!vGeo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!vGeo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vGeo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11062,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_10_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vGeo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!vGeo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!vGeo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!vGeo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55d662d9-b294-4418-bf6e-3a098e364c00_1200x630.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI coding agents are changing how data engineers work. This Dagster University course shows how to build a production-ready ELT pipeline from prompts while learning practical patterns for reliable AI-assisted development.<br><br>This course is designed for engineers exploring agentic coding workflows and engineers who want to learn Dagster or become Dagster power users</p><p><strong><a href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_10_26_data_engineering_weekly">Get started now</a></strong></p><div><hr></div><h1>Confluent: Stream Processing vs. Real-Time OLAP: Flink, ClickHouse &amp; Pinot Compared</h1><p>Real-time data platforms routinely collapse when teams treat stream processors and OLAP engines as substitutes, exposing dashboards to stateful pipelines or routing continuous ETL through scan engines. Confluent writes about drawing the boundary at computation timing &#8212; stream processors evaluate data in motion through event-time windows and watermarks. In contrast, OLAP engines evaluate data at rest through columnar scatter-gather scans. </p><p><strong><a href="https://www.confluent.io/blog/stream-processing-vs-real-time-olap-flink-clickhouse-and-pinot-compared/">https://www.confluent.io/blog/stream-processing-vs-real-time-olap-flink-clickhouse-and-pinot-compared/</a></strong></p><div><hr></div><h1>Kirill Bobrov: The Power of Data Sketches: A Comprehensive Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!z6d8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z6d8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic 424w, https://substackcdn.com/image/fetch/$s_!z6d8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic 848w, https://substackcdn.com/image/fetch/$s_!z6d8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic 1272w, https://substackcdn.com/image/fetch/$s_!z6d8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z6d8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic" width="1456" height="526" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:526,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17264,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z6d8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic 424w, https://substackcdn.com/image/fetch/$s_!z6d8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic 848w, https://substackcdn.com/image/fetch/$s_!z6d8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic 1272w, https://substackcdn.com/image/fetch/$s_!z6d8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10271724-1a98-4e5c-89d5-0c98a60c86d4_1922x694.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Exact answers to cardinality and quantile queries become infeasible at the billion-event scale because COUNT DISTINCT shuffles every unique value across the cluster rather than aggregating partially. The author walks through Apache DataSketches &#8212; Theta, HyperLogLog, CPC, KLL, and Frequent Items &#8212; replacing full-data shuffles with kilobyte hash summaries that merge across partitions. Sketches shift analytics from query-time scans to precomputed dimensional cubes, trading bounded approximation for parallel ingestion and late-data merges, anchored in a fixed-memory streaming model.</p><p><strong><a href="https://luminousmen.com/post/the-power-of-data-sketches-a-comprehensive-guide/">https://luminousmen.com/post/the-power-of-data-sketches-a-comprehensive-guide/</a></strong></p><div><hr></div><h1>Whatnot: The ML Feature Pipeline That Got Slower and No One Noticed</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yh4y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yh4y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic 424w, https://substackcdn.com/image/fetch/$s_!yh4y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic 848w, https://substackcdn.com/image/fetch/$s_!yh4y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic 1272w, https://substackcdn.com/image/fetch/$s_!yh4y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yh4y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic" width="1400" height="469" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:469,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9922,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yh4y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic 424w, https://substackcdn.com/image/fetch/$s_!yh4y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic 848w, https://substackcdn.com/image/fetch/$s_!yh4y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic 1272w, https://substackcdn.com/image/fetch/$s_!yh4y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb56a61f-829c-4f06-a7ed-dc71cecd347c_1400x469.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Hourly pipelines leave a shorter window to catch errors, compressing the failure window and exposing silent degradation &#8212; zero-row regressions, runtime drift, and cadence slippage that never trigger threshold alerts. Whatnot writes about handling such pipeline failures by splitting monitoring into two tiers &#8212; a zero-row signals page on-call, and distribution shifts routed to Slack &#8212; backed by a 2-day Redis TTL that absorbs missed updates.</p><p><strong><a href="https://medium.com/whatnot-engineering/the-ml-feature-pipeline-that-got-slower-and-no-one-noticed-8e90c224eae3">https://medium.com/whatnot-engineering/the-ml-feature-pipeline-that-got-slower-and-no-one-noticed-8e90c224eae3</a></strong></p><div><hr></div><h1>Pinterest: Enhancing Ad Relevance: Integrating Real-Time Context into Sequential Recommender Models</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bYc_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bYc_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic 424w, https://substackcdn.com/image/fetch/$s_!bYc_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic 848w, https://substackcdn.com/image/fetch/$s_!bYc_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic 1272w, https://substackcdn.com/image/fetch/$s_!bYc_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bYc_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic" width="1400" height="787" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:787,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15288,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bYc_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic 424w, https://substackcdn.com/image/fetch/$s_!bYc_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic 848w, https://substackcdn.com/image/fetch/$s_!bYc_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic 1272w, https://substackcdn.com/image/fetch/$s_!bYc_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa1f815af-8346-493a-b4a0-f6663262d6c8_1400x787.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Sequential recommender models trained on offline behavioral history fail on contextual surfaces because the user embedding ignores what the viewer is currently browsing. Pinterest writes about injecting a context layer into the two-tower query tower &#8212; concatenating Transformer history with subject-Pin features and training with synthetic context from positive labels. The hybrid flow precomputes the Transformer offline and runs the context layer online, lifting Recall@K by 3-10x and ROAS by 0.7%, anchored to dynamic user embeddings.</p><p><strong><a href="https://medium.com/pinterest-engineering/enhancing-ad-relevance-integrating-real-time-context-into-sequential-recommender-models-bc3a2f9b682e">https://medium.com/pinterest-engineering/enhancing-ad-relevance-integrating-real-time-context-into-sequential-recommender-models-bc3a2f9b682e</a></strong></p><div><hr></div><h1>Grab: Enhancing Flink deployment with shadow testing.</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!52_9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!52_9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic 424w, https://substackcdn.com/image/fetch/$s_!52_9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic 848w, https://substackcdn.com/image/fetch/$s_!52_9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic 1272w, https://substackcdn.com/image/fetch/$s_!52_9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!52_9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic" width="1456" height="936" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:936,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23873,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!52_9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic 424w, https://substackcdn.com/image/fetch/$s_!52_9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic 848w, https://substackcdn.com/image/fetch/$s_!52_9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic 1272w, https://substackcdn.com/image/fetch/$s_!52_9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e6688a-d92c-4788-a25d-f1d397198ab7_1600x1029.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Staging environments cannot reproduce production-specific failure modes in streaming applications &#8212; checkpoint incompatibility, traffic shape, and environment drift only surface under live traffic. Grab writes about adding a Shadow Testing stage to its Flink deployment pipeline &#8212; running a parallel shadow job in an isolated Kubernetes namespace with prefixed Kafka consumer groups and dedicated shadow sinks. The pre-deployment shadow run absorbs the failure window that caused 10-minute rollback outages, raising Deployment Frequency and cutting Change Failure Rate, anchored to connector-level isolation.</p><p><strong><a href="https://engineering.grab.com/enchancing-flink-shadow-testing">https://engineering.grab.com/enchancing-flink-shadow-testing</a></strong></p><div><hr></div><h1>Halodoc: Building Self-Healing Data Pipelines at Halodoc</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IIiL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IIiL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic 424w, https://substackcdn.com/image/fetch/$s_!IIiL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic 848w, https://substackcdn.com/image/fetch/$s_!IIiL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic 1272w, https://substackcdn.com/image/fetch/$s_!IIiL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IIiL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic" width="1456" height="231" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:231,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17447,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/197165483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IIiL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic 424w, https://substackcdn.com/image/fetch/$s_!IIiL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic 848w, https://substackcdn.com/image/fetch/$s_!IIiL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic 1272w, https://substackcdn.com/image/fetch/$s_!IIiL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F351fce95-294f-419f-91bc-6bfa8d60e42a_1649x262.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Generic retry-everything pipeline recovery breaks down at scale because CDC restarts, OOM failures, orphaned warehouse queries, and cascading backfills each demand distinct checkpoint logic. Halodoc layers six targeted recovery mechanisms &#8212; CDC restart with three-gate eligibility checks, file-size-aware mini-batching, OOM-classified retry scaling, and watermark-based warehouse lock cancellation. The platform cuts CDC recovery to under 5 minutes, drops on-call alerts by 80%, and collapses backfill setup to 15 minutes, anchored to per-failure-mode contracts.</p><p><strong><a href="https://blogs.halodoc.io/building-self-healing-data-pipelines-at-halodoc/">https://blogs.halodoc.io/building-self-healing-data-pipelines-at-halodoc/</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #268]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-268</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-268</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 04 May 2026 01:09:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_03_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hWH6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!hWH6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!hWH6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!hWH6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hWH6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9750,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_03_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/196363682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hWH6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!hWH6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!hWH6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!hWH6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50e1021a-0285-47f2-a072-1fd9498ef009_1200x630.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Free Course: AI-Driven Data Engineering</h1><p>AI coding agents are changing how data engineers work. This Dagster University course shows how to build a production-ready ELT pipeline from prompts while learning practical patterns for reliable AI-assisted development.<br><br>This course is designed for engineers exploring agentic coding workflows and engineers who want to learn Dagster or become Dagster power users</p><p><strong><a href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=05_03_26_data_engineering_weekly">Get started now</a></strong></p><div><hr></div><h1>Event Highlight: Don't Miss <strong>AI Council</strong></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iVR5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iVR5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic 424w, https://substackcdn.com/image/fetch/$s_!iVR5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic 848w, https://substackcdn.com/image/fetch/$s_!iVR5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic 1272w, https://substackcdn.com/image/fetch/$s_!iVR5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iVR5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic" width="1456" height="820" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:820,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24070,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/196363682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iVR5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic 424w, https://substackcdn.com/image/fetch/$s_!iVR5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic 848w, https://substackcdn.com/image/fetch/$s_!iVR5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic 1272w, https://substackcdn.com/image/fetch/$s_!iVR5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe106fdb1-95f9-4cb7-be9a-fe82e9a7383c_2100x1182.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em><strong>- The technical conference for humans who ship</strong></em></p><p>Join the people actually building AI &amp; data infrastructure - and hear them share what&#8217;s working, what broke in prod, and what&#8217;s coming next. May 12&#8211;14 in San Francisco.</p><p><strong>Speakers include:</strong> The co-inventor of ChatGPT. Creator of DuckDB. Creator of Codex. Engineers from ClickHouse, Databricks, Datadog, and LangChain.</p><p><strong><a href="https://aicouncil.com/sf-2026">&#8594; Save 20% on your ticket with code DATAEW20 through 5/5</a></strong></p><div><hr></div><h1>Grab: Data Mesh at Grab Part II: The Foundational Tools behind Certification</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ky2o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ky2o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic 424w, https://substackcdn.com/image/fetch/$s_!ky2o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic 848w, https://substackcdn.com/image/fetch/$s_!ky2o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic 1272w, https://substackcdn.com/image/fetch/$s_!ky2o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ky2o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic" width="1456" height="567" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:567,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16119,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/196363682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ky2o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic 424w, https://substackcdn.com/image/fetch/$s_!ky2o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic 848w, https://substackcdn.com/image/fetch/$s_!ky2o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic 1272w, https://substackcdn.com/image/fetch/$s_!ky2o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fa2ebd0-ab0c-4815-8cef-57c386fededd_1674x652.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Scaling a data mesh across hundreds of thousands of assets demands enforceable trust, consistent governance, and reliable quality without centralized bottlenecks. Grab operationalizes certification through three integrated platforms &#8212; Hubble for metadata and event-driven certification, Genchi for continuous quality validation, and a Data Contract Registry enforcing producer-consumer agreements. The system converts data mesh principles into a metadata-driven workflow, reducing dataset sprawl and driving convergence toward certified, reusable assets anchored to analytics and AI workloads.</p><p><strong><a href="https://engineering.grab.com/data-mesh-2">https://engineering.grab.com/data-mesh-2</a></strong></p><div><hr></div><h1>Doug Turnbull: Can agents replace the search stack?</h1><p>Search APIs depend on layered pipelines for query understanding, retrieval, and reranking, creating complexity that struggles to adapt across heterogeneous user intents. The author writes about testing GPT-5 and GPT-5-mini agents with basic BM25 and e5 embedding tools on Amazon ESCI, lifting NDCG from 0.289 to 0.453 &#8212; exploration prompts and duplicate-query rejection further closed the gap on smaller models. The findings reframe retrieval for product-style &#8220;finding things&#8221; workloads as an agent-driven loop. However, knowledge-gap tasks like MSMarco still favor traditional embedding stacks anchored in dense-retrieval quality.</p><p><strong><a href="https://softwaredoug.com/blog/2026/04/28/search-apis-replaced-by-agents">https://softwaredoug.com/blog/2026/04/28/search-apis-replaced-by-agents</a></strong></p><div><hr></div><h1>Pinterest: Optimizing ML Workload Network Efficiency (Part I): Feature Trimmer</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZxPR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZxPR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic 424w, https://substackcdn.com/image/fetch/$s_!ZxPR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic 848w, https://substackcdn.com/image/fetch/$s_!ZxPR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic 1272w, https://substackcdn.com/image/fetch/$s_!ZxPR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZxPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic" width="1400" height="661" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:661,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15892,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/196363682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZxPR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic 424w, https://substackcdn.com/image/fetch/$s_!ZxPR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic 848w, https://substackcdn.com/image/fetch/$s_!ZxPR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic 1272w, https://substackcdn.com/image/fetch/$s_!ZxPR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe4edcaa2-0a26-4e33-8e38-55896b77a801_1400x661.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Root-leaf ML serving architectures unlock GPU specialization but bottleneck on network bandwidth when fanning out feature payloads to partitioned model inference. Pinterest writes about building Feature Trimmer, a &#8220;Send What You Use&#8221; system that treats the model signature as the source of truth &#8212; version-aware allowlists sync with bundle deployments through the same staged rollout, fallback, and concurrency safeguards as model configs. </p><p><strong><a href="https://medium.com/pinterest-engineering/optimizing-ml-workload-network-efficiency-part-i-feature-trimmer-ae20beb08d69">https://medium.com/pinterest-engineering/optimizing-ml-workload-network-efficiency-part-i-feature-trimmer-ae20beb08d69</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=05_03_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HwU0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!HwU0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!HwU0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!HwU0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HwU0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17764,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=05_03_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/196363682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HwU0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!HwU0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!HwU0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!HwU0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F68a07d28-6188-4bc2-b1e9-d54c4d107c4d_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Will your data platform accelerate your AI initiatives or become their biggest bottleneck? Learn how to build a data platform that's ready for AI:<br><br>- Transform from Big Complexity to AI-ready architecture<br>- Real metrics from organizations achieving 50% cost reductions<br>- Introduction to Components: YAML-first pipelines that AI can build</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=05_03_26_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Pinterest: From Clicks to Conversions: Architecting Shopping Conversion Candidate Generation at Pinterest</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tCIN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tCIN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic 424w, https://substackcdn.com/image/fetch/$s_!tCIN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic 848w, https://substackcdn.com/image/fetch/$s_!tCIN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic 1272w, https://substackcdn.com/image/fetch/$s_!tCIN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tCIN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic" width="1400" height="704" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:704,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17698,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/196363682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tCIN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic 424w, https://substackcdn.com/image/fetch/$s_!tCIN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic 848w, https://substackcdn.com/image/fetch/$s_!tCIN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic 1272w, https://substackcdn.com/image/fetch/$s_!tCIN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32b228a0-5f83-4065-967f-b74f7c8ae361_1400x704.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Optimizing ad retrieval for offsite conversions requires modeling sparse, noisy, delayed signals that engagement-based candidate generators were never designed to surface. Pinterest writes about built a two-tower shopping conversion retrieval model that unifies conversions and click-duration-weighted engagement under a single multi-task head, paired with a parallel DCN v2 and MLP cross-layer architecture, and an advertiser-level loss to stabilize sparse Pin-level supervision. </p><p><strong><a href="https://medium.com/pinterest-engineering/from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation-at-pinterest-04cae5e1455b">https://medium.com/pinterest-engineering/from-clicks-to-conversions-architecting-shopping-conversion-candidate-generation-at-pinterest-04cae5e1455b</a></strong></p><div><hr></div><h1>FiveTran: How we accelerated transpilation by compiling SQLGlot with mypyc</h1><p>Fivetran writes about compiled SQLGlot with mypyc, transpiling type-annotated Python into C extensions, contributing five upstream string primitives, and inlining hot paths like sentinel tokens, native i64 integers, and pre-built dispatch dictionaries &#8212; all while preserving the pure Python install path. The compiled <code>sqlglot[c]</code> package accelerates parsing by 5x, SQL generation by 2.5x, and optimizer passes by 2-2.5x, anchored to a dual-distribution model that keeps SQL transpilation portable across multi-engine data lake architectures.</p><p><strong><a href="https://www.fivetran.com/blog/how-we-accelerated-transpilation-by-compiling-sqlglot-with-mypyc">https://www.fivetran.com/blog/how-we-accelerated-transpilation-by-compiling-sqlglot-with-mypyc</a></strong></p><div><hr></div><h1>Robin Moffatt: Materialized Tables in Apache Flink</h1><p>Streaming SQL frameworks split table definitions from the population logic, leaving INSERT jobs orphaned across restarts and forcing operators to manage schema evolution and lifecycle as separate concerns. The author walks through Flink 2.2&#8217;s Materialized Tables, which bind the refresh query to the table definition and support CONTINUOUS or FULL refresh modes, partition-scoped reloads, suspend/resume via savepoints, and unified batch-streaming semantics through a single <code>FRESHNESS</code> parameter. The construct collapses three legacy patterns &#8212; CREATE/INSERT, CTAS, and external schedulers &#8212; into a single durable object. However, catalog support beyond Paimon and the embedded scheduler remains anchored in early-stage maturity gaps.</p><p><strong><a href="https://rmoff.net/2026/04/28/materialized-tables-in-apache-flink/">https://rmoff.net/2026/04/28/materialized-tables-in-apache-flink/</a></strong></p><div><hr></div><h1>Alexey Makhotkin: 5NF and Database Design</h1><p>Traditional database normalization tutorials present 5NF through contrived table-splitting exercises that obscure the underlying business logic and leave practitioners unable to apply it in practice. The author reframes 5NF design around two logical patterns &#8212; the AB-BC-AC triangle for independent M: N relationships across three anchors, and the ABC+D star pattern, where a fourth entity binds three 1:N links &#8212; thereby driving table construction directly from business requirements rather than from post hoc decomposition. The approach replaces 5NF reasoning with a deterministic logical-to-physical workflow, anchored to anchor-link modeling that produces normalized schemas without invoking decomposition theorems.</p><p><strong><a href="https://kb.databasedesignbook.com/posts/5nf/">https://kb.databasedesignbook.com/posts/5nf/</a></strong></p><div><hr></div><h1>ultrathink: SQLite in Production: Lessons from Running a Store on a Single File</h1><p>Single-file embedded databases promise operational simplicity, but their filesystem-level locking model breaks down when modern container orchestration introduces concurrent writers across overlapping deploys. Ultrathink writes about running a production e-commerce store on Rails 8 with four SQLite databases on a shared Docker volume, diagnosing lost orders through <code>sqlite_sequence</code> after eleven rapid Kamal blue-green deploys caused overlapping containers to corrupt WAL writes despite successful Stripe charges.</p><p><strong><a href="https://ultrathink.art/blog/sqlite-in-production-lessons">https://ultrathink.art/blog/sqlite-in-production-lessons</a></strong></p><div><hr></div><h1>Capital One: Spark tuning: executor optimization for performance</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0XLS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0XLS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic 424w, https://substackcdn.com/image/fetch/$s_!0XLS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic 848w, https://substackcdn.com/image/fetch/$s_!0XLS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic 1272w, https://substackcdn.com/image/fetch/$s_!0XLS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0XLS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic" width="1400" height="621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:621,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22340,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/196363682?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0XLS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic 424w, https://substackcdn.com/image/fetch/$s_!0XLS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic 848w, https://substackcdn.com/image/fetch/$s_!0XLS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic 1272w, https://substackcdn.com/image/fetch/$s_!0XLS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d828012-c8c8-4f4a-a8b6-63a8e41f4b10_1400x621.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Spark applications often underperform when executors are configured by default, leaving CPU cores and memory underutilized while introducing fault-tolerance risks and network overhead across worker nodes. Capital One Tech walks through executor sizing trade-offs &#8212; fat executors maximize data locality but concentrate failure risk, thin executors improve parallelism but flood the network &#8212; and codifies an optimal configuration recipe reserving cores and memory for OS overhead, capping executors at 3-5 cores, and accounting for the 384 MB or 10% memory overhead. The framework converts executor tuning from guesswork into a deterministic sizing exercise, anchored to balanced parallelism, fault tolerance, and resource utilization across distributed Spark clusters.</p><p><strong><a href="https://medium.com/capital-one-tech/spark-tuning-executor-optimization-for-performance-c757b39f0efe">https://medium.com/capital-one-tech/spark-tuning-executor-optimization-for-performance-c757b39f0efe</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #267]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-267</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-267</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 27 Apr 2026 00:52:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wyeh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wyeh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!Wyeh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!Wyeh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!Wyeh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wyeh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8368f05-80d8-475f-b78a-690146213041_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:10204,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/195572824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wyeh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!Wyeh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!Wyeh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!Wyeh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8368f05-80d8-475f-b78a-690146213041_1200x630.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Free Course: AI-Driven Data Engineering</h1><p>AI coding agents are changing how data engineers work. This Dagster University course shows how to build a production-ready ELT pipeline from prompts while learning practical patterns for reliable AI-assisted development.<br><br>This course is designed for engineers exploring agentic coding workflows and engineers who want to learn Dagster or become Dagster power users.</p><p><strong><a href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=04_26_26_data_engineering_weekly">Enroll today</a></strong></p><div><hr></div><h1>Monzo: A &#8220;meshy&#8221; approach to Data: Enabling 100+ teams to build Data Models</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DwjA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DwjA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic 424w, https://substackcdn.com/image/fetch/$s_!DwjA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic 848w, https://substackcdn.com/image/fetch/$s_!DwjA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic 1272w, https://substackcdn.com/image/fetch/$s_!DwjA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DwjA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic" width="1456" height="558" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15581,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/195572824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DwjA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic 424w, https://substackcdn.com/image/fetch/$s_!DwjA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic 848w, https://substackcdn.com/image/fetch/$s_!DwjA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic 1272w, https://substackcdn.com/image/fetch/$s_!DwjA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1a63be1-f2de-4527-9ab0-34757e2c60b8_2400x919.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Decentralized data ownership breaks down when cross-team dependencies remain implicit and upstream schema changes silently cascade through downstream models. Monzo introduces Interfaces&#8212;explicitly declared, tested dbt models that serve as governed data contracts&#8212;stabilizing cross-domain consumption across its 12,000-model warehouse. The migration has already reduced processing costs by 40% and accelerated data landing times by 25%, proving that formalized contracts scale distributed data modeling.</p><p><strong><a href="https://monzo.com/blog/a-meshy-approach-to-data">https://monzo.com/blog/a-meshy-approach-to-data</a></strong></p><div><hr></div><h1>Aparna Dhinakaran: Context Management in Agent Harnesses</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DI_t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DI_t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic 424w, https://substackcdn.com/image/fetch/$s_!DI_t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic 848w, https://substackcdn.com/image/fetch/$s_!DI_t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic 1272w, https://substackcdn.com/image/fetch/$s_!DI_t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DI_t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic" width="1200" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7874,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/195572824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DI_t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic 424w, https://substackcdn.com/image/fetch/$s_!DI_t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic 848w, https://substackcdn.com/image/fetch/$s_!DI_t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic 1272w, https://substackcdn.com/image/fetch/$s_!DI_t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64255fd7-bcc7-40cd-b1c7-3bda46e5948a_1200x480.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Long-running AI agents degrade as context windows fill with unbounded tool outputs, stale conversation history, and redundant file reads. The author analyzes five agent frameworks&#8212;Pi, OpenClaw, Claude Code, Letta, and Arize's Alyx&#8212;revealing convergence on hard file caps, token-triggered compaction, and isolated sub-agents. These patterns mirror the classical memory hierarchy&#8212;registers, cache, and swap&#8212;suggesting that context management is maturing into an invisible system-level discipline.</p><p><strong><a href="https://x.com/aparnadhinak/status/2048492731929149929">https://x.com/aparnadhinak/status/2048492731929149929</a></strong></p><div><hr></div><h1>Spotify: Flow generation through natural language: An agentic modeling approach</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lA4P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lA4P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic 424w, https://substackcdn.com/image/fetch/$s_!lA4P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic 848w, https://substackcdn.com/image/fetch/$s_!lA4P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic 1272w, https://substackcdn.com/image/fetch/$s_!lA4P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lA4P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic" width="1456" height="158" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:158,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12195,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/195572824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lA4P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic 424w, https://substackcdn.com/image/fetch/$s_!lA4P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic 848w, https://substackcdn.com/image/fetch/$s_!lA4P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic 1272w, https://substackcdn.com/image/fetch/$s_!lA4P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd91f01a0-a081-4a08-bbce-6601de743cc6_1999x217.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>LLMs struggle to reason over deeply nested domain-specific schemas that lack representation in pretraining data. Shopify builds a bidirectional transpiler that converts its Flow automation JSON into Python&#8212;improving syntactic correctness by 22% and semantic correctness by 13% for its fine-tuned Qwen3-32B model. The approach delivers a Sidekick assistant that runs 2.2x faster and 68% cheaper than the closed-source frontier model it replaces.</p><p><strong><a href="https://shopify.engineering/fine-tuning-agent-shopify-flow">https://shopify.engineering/fine-tuning-agent-shopify-flow</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O9vw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O9vw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!O9vw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!O9vw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!O9vw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O9vw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19451,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/195572824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O9vw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!O9vw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!O9vw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!O9vw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F373df4a4-882f-4e16-bf44-65c5512d7558_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Will your data platform accelerate your AI initiatives or become their biggest bottleneck? Learn how to build a data platform that's ready for AI:<br><br>- Transform from Big Complexity to AI-ready architecture<br>- Real metrics from organizations achieving 50% cost reductions<br>- Introduction to Components: YAML-first pipelines that AI can build</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=04_26_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Pratish Yadava: Data agents - When enterprise analytics learns to reason</h1><p>Traditional dashboards answer predefined questions but struggle to diagnose root causes or recommend actions within live business workflows. The author outlines continuous data agents that interpret intent and make bounded decisions&#8212;anchored in governed semantic layers and modular, domain-specific orchestrators. This operating model moves analytics from passive reporting into decision-adjacent automation with explicit guardrails and escalation paths.</p><p><strong><a href="https://medium.com/data-science-at-microsoft/data-agents-when-enterprise-analytics-learns-to-reason-13345ec8998e">https://medium.com/data-science-at-microsoft/data-agents-when-enterprise-analytics-learns-to-reason-13345ec8998e</a></strong></p><div><hr></div><h1>Pinterest: Smarter URL Normalization at Scale: How MIQPS Powers Content Deduplication at Pinterest</h1><p>Content platforms waste significant compute re-fetching identical pages, disguised by URL variations introduced by tracking tags, session tokens, and click identifiers. Pinterest engineers MIQPS&#8212;a data-driven algorithm that renders pages with and without each query parameter to empirically classify content-changing signals from noise. The system strips redundant parameters at runtime via precomputed offline maps, reducing duplicate fetches and improving catalog deduplication at scale.</p><p><strong><a href="https://medium.com/pinterest-engineering/smarter-url-normalization-at-scale-how-miqps-powers-content-deduplication-at-pinterest-4aa42e807d7d">https://medium.com/pinterest-engineering/smarter-url-normalization-at-scale-how-miqps-powers-content-deduplication-at-pinterest-4aa42e807d7d</a></strong></p><div><hr></div><h1>Meltwater: Doing More With Less: Rethinking Entity-Level Sentiment at Scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6P_u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6P_u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic 424w, https://substackcdn.com/image/fetch/$s_!6P_u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic 848w, https://substackcdn.com/image/fetch/$s_!6P_u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic 1272w, https://substackcdn.com/image/fetch/$s_!6P_u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6P_u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic" width="1200" height="518" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:518,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13317,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/195572824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6P_u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic 424w, https://substackcdn.com/image/fetch/$s_!6P_u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic 848w, https://substackcdn.com/image/fetch/$s_!6P_u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic 1272w, https://substackcdn.com/image/fetch/$s_!6P_u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb54662d3-3eff-4d74-bfab-25b3777a13c3_1200x518.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Entity-level sentiment analysis scales linearly when systems re-encode the same document once per entity, multiplying inference costs without improving accuracy. Meltwater extracts per-entity embeddings from a single shared Transformer forward pass, proving that local mention context carries sufficient sentiment signal. The approach reduces inference costs by 45.5% and improves accuracy by 3.02%, converting linear per-entity scaling into near constant-time processing.</p><p><strong><a href="https://underthehood.meltwater.com/blog/2026/04/23/rethinking-entity-level-sentiment-at-scale/">https://underthehood.meltwater.com/blog/2026/04/23/rethinking-entity-level-sentiment-at-scale/</a></strong></p><div><hr></div><h1>Halodoc: Implementing Apache Yunikorn on EMR on EKS at Halodoc</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cHCi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cHCi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic 424w, https://substackcdn.com/image/fetch/$s_!cHCi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic 848w, https://substackcdn.com/image/fetch/$s_!cHCi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic 1272w, https://substackcdn.com/image/fetch/$s_!cHCi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cHCi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic" width="1456" height="851" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:851,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20045,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/195572824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cHCi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic 424w, https://substackcdn.com/image/fetch/$s_!cHCi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic 848w, https://substackcdn.com/image/fetch/$s_!cHCi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic 1272w, https://substackcdn.com/image/fetch/$s_!cHCi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e2392d1-74f4-4b40-a8f1-96d93cd7423b_1592x930.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Kubernetes-native Spark workloads trigger aggressive node scaling when the default scheduler evaluates pods independently&#8212;causing cost whiplash from rapid scale-outs followed by immediate underutilization. Halodoc adopts Apache YuniKorn's bin-packing strategy to fill existing nodes before provisioning new ones, paired with hierarchical queues that govern cross-team resource boundaries. Node utilization reaches 96%, with a 10% reduction in EC2 costs and increased Spot instance adoption due to improved scheduling predictability.</p><p><strong><a href="https://blogs.halodoc.io/implementing-apache-yunikorn-on-emr-on-eks/amp/">https://blogs.halodoc.io/implementing-apache-yunikorn-on-emr-on-eks/amp/</a></strong></p><div><hr></div><h1>Netflix: Scaling Camera File Processing at Netflix</h1><p>Media production pipelines struggle to manage massive daily camera footage when raw metadata remains unconformed and unsearchable across downstream workflows. Netflix integrates FilmLight's API into its Media Production Suite to parse and normalize camera metadata at ingest&#8212;conforming it to a standardized schema that enables automated retrieval and pipeline validation. The system deploys as stateless serverless functions on CPU-only instances, scaling elastically to handle spiky VFX plate generation without dedicated GPU infrastructure.</p><p><strong><a href="https://netflixtechblog.com/scaling-camera-file-processing-at-netflix-6dab2b1e80be">https://netflixtechblog.com/scaling-camera-file-processing-at-netflix-6dab2b1e80be</a></strong></p><div><hr></div><h1>Z1: Airflow DAG Bundles: Managing DAGs Across Teams Without Helm Upgrades</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VWoQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VWoQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic 424w, https://substackcdn.com/image/fetch/$s_!VWoQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic 848w, https://substackcdn.com/image/fetch/$s_!VWoQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic 1272w, https://substackcdn.com/image/fetch/$s_!VWoQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VWoQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic" width="1020" height="459" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:459,&quot;width&quot;:1020,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9989,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/195572824?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VWoQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic 424w, https://substackcdn.com/image/fetch/$s_!VWoQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic 848w, https://substackcdn.com/image/fetch/$s_!VWoQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic 1272w, https://substackcdn.com/image/fetch/$s_!VWoQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed901627-cc85-4397-b034-1ab1b005a7d3_1020x459.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Onboarding new data pipelines in Airflow typically requires Helm upgrades, pod restarts, and infrastructure tickets&#8212;turning every DAG addition into a deployment bottleneck. The author leverages Airflow 3.x DAG bundles with an S3-backed sidecar sync pattern that hot-reloads pipeline configurations without downtime or centralized repository dependencies. New DAGs appear in the Airflow UI within 30 seconds of a commit, decentralizing the entire pipeline lifecycle to self-service.</p><p><strong><a href="https://blog.platform.zerotoone.ai/blog/airflow-dag-bundles-without-helm-upgrades/">https://blog.platform.zerotoone.ai/blog/airflow-dag-bundles-without-helm-upgrades/</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #266]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-266</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-266</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 20 Apr 2026 03:09:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=04_19_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T2OR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!T2OR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!T2OR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!T2OR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T2OR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8661,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=04_19_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T2OR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!T2OR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!T2OR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!T2OR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a102af9-a633-4dbb-993a-95f666299715_1200x630.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Free Course: AI-Driven Data Engineering</h1><p>AI coding agents are changing how data engineers work. This Dagster University course shows how to build a production-ready ELT pipeline from prompts while learning practical patterns for reliable AI-assisted development.<br><br>This course is designed for engineers exploring agentic coding workflows and engineers who want to learn Dagster or become Dagster power users.</p><p><strong><a href="https://courses.dagster.io/courses/ai-driven-data-engineering?utm_campaign=Dagster%20University&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_driven_engineering_course&amp;utm_content=04_19_26_data_engineering_weekly">Enroll today</a></strong></p><div><hr></div><h1>Animesh Kumar: AI-Ready Data vs. Analytics-Ready Data</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WCAU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WCAU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic 424w, https://substackcdn.com/image/fetch/$s_!WCAU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic 848w, https://substackcdn.com/image/fetch/$s_!WCAU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic 1272w, https://substackcdn.com/image/fetch/$s_!WCAU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WCAU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic" width="1400" height="1149" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1149,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21826,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WCAU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic 424w, https://substackcdn.com/image/fetch/$s_!WCAU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic 848w, https://substackcdn.com/image/fetch/$s_!WCAU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic 1272w, https://substackcdn.com/image/fetch/$s_!WCAU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3059a21-3aeb-44cf-8837-17128f27b419_1400x1149.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Analytics and AI systems fail when teams treat data readiness as a single maturity path instead of two distinct axes optimized for different consumers. The author distinguishes analytics-ready data&#8212;designed for human interpretation through aggregation, stability, and explainability&#8212;from AI-ready data, which requires contextual completeness, timeliness, and semantic richness often lost in aggregation pipelines. </p><p><strong><a href="https://medium.com/@community_md101/ai-ready-data-vs-analytics-ready-data-f67ef0804341">https://medium.com/@community_md101/ai-ready-data-vs-analytics-ready-data-f67ef0804341</a></strong></p><div><hr></div><h1>Whatnot: The model is the easy part: Building the LLM Platform at Whatnot</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4qIC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4qIC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic 424w, https://substackcdn.com/image/fetch/$s_!4qIC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic 848w, https://substackcdn.com/image/fetch/$s_!4qIC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic 1272w, https://substackcdn.com/image/fetch/$s_!4qIC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4qIC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic" width="1400" height="695" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:695,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20720,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4qIC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic 424w, https://substackcdn.com/image/fetch/$s_!4qIC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic 848w, https://substackcdn.com/image/fetch/$s_!4qIC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic 1272w, https://substackcdn.com/image/fetch/$s_!4qIC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F81652a8f-38d2-4d9f-9919-086cba54255e_1400x695.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Production LLM systems fail primarily in the surrounding infrastructure rather than the model layer, where non-deterministic outputs, unconstrained inputs, and missing feedback loops undermine reliability and trust. Whatnot writes about structuring its LLM platform around three pillars&#8212;velocity, reliability, and trust. It uses the post-exposure A/B logging to isolate divergent outputs, a reusable tool registry, and LLM-as-a-judge calibration workflows to detect production drift early. </p><p><strong><a href="https://medium.com/whatnot-engineering/the-model-is-the-easy-part-building-the-llm-platform-at-whatnot-ec8730fa9bdf">https://medium.com/whatnot-engineering/the-model-is-the-easy-part-building-the-llm-platform-at-whatnot-ec8730fa9bdf</a></strong></p><div><hr></div><h1>Slack: Managing context in long-run agentic applications</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Oup5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Oup5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic 424w, https://substackcdn.com/image/fetch/$s_!Oup5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic 848w, https://substackcdn.com/image/fetch/$s_!Oup5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic 1272w, https://substackcdn.com/image/fetch/$s_!Oup5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Oup5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic" width="1456" height="1061" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1061,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16064,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Oup5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic 424w, https://substackcdn.com/image/fetch/$s_!Oup5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic 848w, https://substackcdn.com/image/fetch/$s_!Oup5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic 1272w, https://substackcdn.com/image/fetch/$s_!Oup5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5e8e64c8-1860-4dcb-aba3-b1a0d4750367_1596x1163.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Long-running multi-agent systems lose investigative coherence as stateless LLM APIs force context accumulation, degrading reasoning quality and amplifying hallucinations across rounds. Slack writes about addressing this with three context channels&#8212;a Director&#8217;s Journal for working memory, a Critic&#8217;s Review that scores findings on a five-level credibility rubric, and a Critic&#8217;s Timeline that prunes incoherent findings by enforcing narrative consistency.</p><p><strong><a href="https://slack.engineering/managing-context-in-long-run-agentic-applications/">https://slack.engineering/managing-context-in-long-run-agentic-applications/</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=04_19_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GWyV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!GWyV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!GWyV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!GWyV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GWyV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14580,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=04_19_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GWyV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!GWyV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!GWyV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!GWyV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3196207-e093-4261-9196-e3a8d9180942_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Will your data platform accelerate your AI initiatives or become their biggest bottleneck? Learn how to build a data platform that's ready for AI:<br><br>- Transform from Big Complexity to AI-ready architecture<br>- Real metrics from organizations achieving 50% cost reductions<br>- Introduction to Components: YAML-first pipelines that AI can build</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=04_19_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Atlassian: Engineering the Forge Billing Platform for Reliability and Scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iwiU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iwiU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic 424w, https://substackcdn.com/image/fetch/$s_!iwiU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic 848w, https://substackcdn.com/image/fetch/$s_!iwiU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic 1272w, https://substackcdn.com/image/fetch/$s_!iwiU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iwiU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic" width="1456" height="1197" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1197,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16714,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iwiU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic 424w, https://substackcdn.com/image/fetch/$s_!iwiU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic 848w, https://substackcdn.com/image/fetch/$s_!iwiU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic 1272w, https://substackcdn.com/image/fetch/$s_!iwiU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf3cf9a3-7a76-4df7-b007-8962e32183c5_1864x1532.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The billing pipeline, especially the usage-based billing pipeline, is often very complex, given the system's sensitivity to correctness. Atlassian describes Forge Billing as a deterministic pipeline&#8212;routing 300M daily usage events through StreamHub and UTS for deduplication and schema validation, then splitting them into cold-tier raw storage and StarRocks hot-tier aggregations for fast developer queries. The architecture handles counter and gauge metrics through idempotency keys and last-write-wins windowing, enabling full charge traceability from Developer Console back to raw events while scaling to 1B daily events within a year.</p><p><strong><a href="https://www.atlassian.com/blog/atlassian-engineering/engineering-the-forge-billing-platform-for-reliability-and-scale">https://www.atlassian.com/blog/atlassian-engineering/engineering-the-forge-billing-platform-for-reliability-and-scale</a></strong></p><div><hr></div><h1>Giannis Polyzos: From Events To Real-Time Profiles On Apache Fluss</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CNpN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CNpN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic 424w, https://substackcdn.com/image/fetch/$s_!CNpN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic 848w, https://substackcdn.com/image/fetch/$s_!CNpN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic 1272w, https://substackcdn.com/image/fetch/$s_!CNpN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CNpN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic" width="1456" height="753" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:753,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20054,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CNpN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic 424w, https://substackcdn.com/image/fetch/$s_!CNpN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic 848w, https://substackcdn.com/image/fetch/$s_!CNpN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic 1272w, https://substackcdn.com/image/fetch/$s_!CNpN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf805158-73d6-42e6-bf5e-3b214765a3c5_1456x753.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Real-time entity decisions fail when profile state lives in a separate OLAP layer, introducing ingestion lag between event arrival and visibility. The approach builds risk profiles directly in Apache Fluss by mapping identifiers to integers, encoding group membership as Roaring Bitmaps, and merging updates at write time using the Aggregation Merge Engine without stateful Flink jobs. This design removes the need for a separate profile store, reduces state latency from hours to seconds, and ensures correct recovery through replay-safe inverse operations in the UndoRecoveryOperator.</p><p><strong><a href="https://ipolyzos.substack.com/p/from-events-to-real-time-profiles">https://ipolyzos.substack.com/p/from-events-to-real-time-profiles</a></strong></p><div><hr></div><h1>Thiago Baldim: The journey to Agentic BI</h1><p>Agentic BI tools amplify data quality issues&#8212;ambiguous schemas, undocumented columns, and fragmented sources of truth&#8212;rather than resolving them at query time. SafetyCulture writes about rebuilding its data platform on Kimball architecture with SCD Type 2 dimensions, over 90% dbt test and documentation coverage, and column-level ownership aligned to business stakeholders, reducing pipeline execution time from 14 hours to 1.5 hours. </p><p><strong><a href="https://medium.com/@thiagobaldim/the-journey-to-agentic-bi-617975c106b7">https://medium.com/@thiagobaldim/the-journey-to-agentic-bi-617975c106b7</a></strong></p><div><hr></div><h1>Pinterest: Scaling Recommendation Systems with Request-Level Deduplication</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uSs9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uSs9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic 424w, https://substackcdn.com/image/fetch/$s_!uSs9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic 848w, https://substackcdn.com/image/fetch/$s_!uSs9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic 1272w, https://substackcdn.com/image/fetch/$s_!uSs9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uSs9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic" width="1132" height="676" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6156870-6248-4989-b72e-7cd81c372998_1132x676.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:676,&quot;width&quot;:1132,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6645,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uSs9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic 424w, https://substackcdn.com/image/fetch/$s_!uSs9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic 848w, https://substackcdn.com/image/fetch/$s_!uSs9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic 1272w, https://substackcdn.com/image/fetch/$s_!uSs9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6156870-6248-4989-b72e-7cd81c372998_1132x676.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Scaling recommendation models creates infrastructure pressure because request-level data&#8212;especially long user action sequences&#8212;is redundantly stored and processed once per candidate item across storage, training, and serving. Pinterest writes about applying request-level deduplication across the ML lifecycle using request-sorted Iceberg datasets, SyncBatchNorm, and user-level masking to preserve training correctness, and a Deduplicated Cross-Attention Transformer that caches user context for reuse across ranked items. </p><p><strong><a href="https://medium.com/pinterest-engineering/scaling-recommendation-systems-with-request-level-deduplication-93bd514142d9">https://medium.com/pinterest-engineering/scaling-recommendation-systems-with-request-level-deduplication-93bd514142d9</a></strong></p><div><hr></div><h1>Just Eat: Daedalus and the Data Labyrinth</h1><p>Data governance fails when organizations share data without sharing definitions, ownership, lineage, and reusable business logic across the systems humans and AI agents rely on. Just Eat frames modern governance as a layered navigation system&#8212;combining a business glossary, data catalog, metadata, data quality signals, lineage, and a semantic layer&#8212;to connect business language with trusted data assets and machine-usable definitions. These governance layers turn complex data platforms from opaque labyrinths into navigable systems, enabling consistent analytics, more reliable AI-generated queries, and a lower risk of conflicting metrics.</p><p><strong><a href="https://medium.com/justeattakeaway-tech/daedalus-and-the-data-labyrinth-2c166b1d9866">https://medium.com/justeattakeaway-tech/daedalus-and-the-data-labyrinth-2c166b1d9866</a></strong></p><div><hr></div><h1>Teads: We Let AI Agents Orchestrate Our ML Experiments</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kMcN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kMcN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic 424w, https://substackcdn.com/image/fetch/$s_!kMcN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic 848w, https://substackcdn.com/image/fetch/$s_!kMcN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic 1272w, https://substackcdn.com/image/fetch/$s_!kMcN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kMcN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic" width="1400" height="942" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:942,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19278,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194751980?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kMcN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic 424w, https://substackcdn.com/image/fetch/$s_!kMcN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic 848w, https://substackcdn.com/image/fetch/$s_!kMcN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic 1272w, https://substackcdn.com/image/fetch/$s_!kMcN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61fd0a94-8241-415d-b0ea-a63861b40427_1400x942.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Teads writes about extending its Datakinator platform with agentic orchestration by exposing APIs through MCP, enriching them with dataset probing and error-retrieval tools, and adding cost guardrails that estimate and gate expensive runs. This approach scales experiment throughput from hundreds to thousands, enables autonomous retry and correction of failed runs, and delivers 5&#8211;10% model improvements that translate to nearly $1M in margin gains despite increased cloud usage.</p><p><strong><a href="https://medium.com/teads-engineering/we-let-ai-agents-orchestrate-our-ml-experiments-fc8606816fde">https://medium.com/teads-engineering/we-let-ai-agents-orchestrate-our-ml-experiments-fc8606816fde</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[A Response to Our Reader Survey]]></title><description><![CDATA[Your feedback prompted us to review 233 articles across 25 issues, clarify our editorial scope, and raise the bar for what belongs in Data Engineering Weekly.]]></description><link>https://www.dataengineeringweekly.com/p/a-response-to-our-reader-survey</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/a-response-to-our-reader-survey</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Wed, 15 Apr 2026 16:26:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Thank you to every reader who took a few minutes to fill out the survey. Your feedback is the clearest signal we have &#8212; and this round gave us a lot to act on.</p><p>One piece of feedback came through clearly enough that we&#8217;d be wrong to gloss over it: several of you feel we&#8217;ve leaned too heavily into AI coverage. That&#8217;s a fair critique. So we did what data engineers do: we audited ourselves.</p><p>We reviewed 233 articles published across 25 issues, from July 2025 through April 2026. The audit made one thing clear &#8212; while the majority of our content has remained relevant to data engineers, we have not been as precise as we should be. That precision gap showed up in 44 articles, or 18.9% of the total, which we now classify as outside the newsletter&#8217;s editorial scope: prompting tutorials, generic agent framework walkthroughs, model-release commentary, and LLM inference economics that follow the AI news cycle more than the data engineer&#8217;s day-to-day reality. </p><p>The survey also returned encouraging numbers. Data Engineering Weekly earned a <em><strong>Net Promoter Score of +17.3, and 79.6% of respondents rated their experience as Satisfied or Very Satisfied</strong></em>. We don&#8217;t take those numbers for granted. They reflect real signals from practitioners who are selective about where they invest their attention. But the critique is what drove this post, and the critique is what we are acting on.</p><div><hr></div><h2>What the Data Says</h2><p>Across 233 articles published in 25 issues: Category Articles Share of the total 233 articles</p><ol><li><p>Core DE 99 42.5% </p></li><li><p>Adjacent but Relevant 60 25.8% </p></li><li><p>Not DE 44 18.9% </p></li><li><p>Context Engineering (DE extension) 30 12.9%</p></li></ol><p>The 44 out-of-scope articles are the problem we are fixing. The 189 in-scope articles are the standard we are holding ourselves to.</p><p>The survey revealed two truths at once: readers were right to call out drift, and we still believe that some of the work at the intersection of data infrastructure and AI belongs squarely within the future of data engineering. The rest of this post explains exactly where we draw that line &#8212; and what changes now.</p><p>You can read the complete analysis here<br><strong><a href="https://docs.google.com/spreadsheets/d/1kLq_touW4Z0SXDjCWEg4zt1DZzBoZFyTM2d2KJAR_tQ/edit?usp=sharing">https://docs.google.com/spreadsheets/d/1kLq_touW4Z0SXDjCWEg4zt1DZzBoZFyTM2d2KJAR_tQ/edit?usp=sharing</a></strong></p><div><hr></div><h2>Our Editorial Categories</h2><p>Our test is straightforward: if a working data engineer cannot reasonably connect a piece to the systems they build, operate, govern, or evolve, it does not belong in the newsletter.</p><p>We use four categories to apply that test. They are not new instincts &#8212; they formalize the editorial judgment we have always tried to apply, and they give us, and you, a shared framework for accountability.</p><h3>Core DE</h3><p><strong>Classical data engineering topics: ingestion, storage, orchestration, batch/streaming, table formats, query engines, data modeling, quality, governance, and platform reliability.</strong></p><p>This is the foundation. Articles in Core DE cover the systems data engineers&#8217; own every day &#8212; pipeline architecture, storage layer decisions, table format trade-offs, query engine internals, data quality frameworks, and the operational realities of running data infrastructure at scale.</p><h3>Context Engineering (DE extension)</h3><p><strong>Context/semantic layers, ontologies, knowledge graphs, NL-to-SQL, data agents, and other systems that turn enterprise data into governed machine-usable context.</strong></p><p>One category deserves a more explicit explanation, because it sits at the center of both the feedback we received and the direction this field is heading.</p><p>The reader survey did not change my view that Context Engineering belongs in Data Engineering Weekly. It forced a more disciplined articulation of that view &#8212; and a much stricter standard for what qualifies.</p><p>If you have read <strong><a href="https://www.dataengineeringweekly.com/p/etl-is-dead">ETL is Dead</a></strong> or <strong><a href="https://www.dataengineeringweekly.com/p/data-engineering-after-ai">Data Engineering After AI</a></strong>, you know where I stand. AI coding agents are reshaping how we work with data, and that shift is not temporary. The data engineer&#8217;s value is migrating from pipeline reliability to semantic reliability &#8212; from &#8220;the job ran&#8221; to &#8220;the meaning is right.&#8221; ETL is dead, the way landlines are dead: the pipelines still run, but nobody builds their data strategy around them anymore. The new leverage lies in designing systems that make AI-generated code trustworthy and grounded in a governed, machine-usable context. Extract, Contextualize, Link rather than Extract, Transform, Load.</p><p>I would not be doing justice to our Data Engineering Weekly community if I excluded the engineering questions that come with that shift. The issue was never whether DEW should cover AI-adjacent work. The issue was whether each piece remained grounded in the systems, semantics, governance, and infrastructure that data engineers actually own.</p><p>That is the line. In scope: systems that make enterprise data structured, governed, semantic, and machine-readable. Out of scope: prompting tactics, agent framework tutorials, model release commentary, and generic application-layer AI workflows.</p><h3>Adjacent but Relevant</h3><p><strong>AI/ML platform, feature store, search/retrieval infrastructure, eval/observability, or AI governance topics that materially touch DE skills or infrastructure without being pure DE.</strong></p><p>Data engineers do not work in isolation. The systems they build feed ML platforms, power search infrastructure, and underpin AI governance frameworks. We include this category selectively &#8212; only when the infrastructure implications are concrete, the engineering depth is substantial, and the connection to a working data engineer&#8217;s responsibilities is immediate rather than abstract. An article on feature store architecture belongs here. A think-piece on the AI market does not.</p><h3>Not DE</h3><p><strong>Prompting, generic agent frameworks, model-news / benchmark/market commentary, app-layer AI orchestration, pure ML modeling, or other pieces with no substantial data-infrastructure center of gravity.</strong></p><p>This is the content we are eliminating. It may be technically credible and widely read, but it is not written for data engineers and does not serve your work.</p><div><hr></div><h2>What Changes Now</h2><p>Here is what is different, starting with the next issue:</p><p><strong>Zero Not DE articles per issue.</strong> Every article will be evaluated against the four categories above before publication. If it does not clear the bar, it does not go in.</p><p><strong>A stricter standard for Adjacent but Relevant.</strong> The infrastructure angle must be concrete, the engineering substance must be real, and the relevance to a working data engineer must be direct &#8212; not abstract or aspirational.</p><p><strong>Strict vendor neutrality.</strong> Data Engineering Weekly will remain strictly vendor-neutral. We generally exclude content published primarily to promote a product, platform, or company, with rare exceptions for work of exceptional technical depth and broad practitioner value. The content we select is written by practitioners for practitioners and published on personal blogs, company engineering blogs, or open platforms, where the goal is to share real experience, not to drive adoption.</p><p><strong>Periodic transparency reports.</strong> We will share our category breakdown across recent issues so you can hold us to this standard.</p><div><hr></div><h2>Help Us Build the Signal</h2><p>Data Engineering Weekly is a community effort. The best issues are shaped by readers who surface work worth sharing.</p><p>If you have read something &#8212; a post, an engineering deep-dive, a company blog &#8212; that fits clearly within Core DE, Context Engineering, or Adjacent but Relevant, submit it. Contributions are reviewed weekly, and we read everything that comes in. The process is lightweight: open a pull request with the article title, link, author details, and topic tags using our <strong><a href="https://github.com/Data-Engineering-Weekly/dataengineeringweekly?tab=readme-ov-file#contributing-guide">Contributing Guide</a></strong>. The same editorial categories and vendor-neutrality standard described here apply to all submissions.</p><p>As we advance, each issue should feel sharper, more grounded, and more consistently useful to working data engineers. That is the standard we are setting for ourselves, and the standard we invite you to hold us to.</p><p>If you have feedback on this policy or on individual articles, reply directly to any issue. We read every response.</p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #265]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-265</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-265</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 13 Apr 2026 03:24:30 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>This week: Multi-Tenancy for Modern Data Platforms</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2erQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2erQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic 424w, https://substackcdn.com/image/fetch/$s_!2erQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic 848w, https://substackcdn.com/image/fetch/$s_!2erQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic 1272w, https://substackcdn.com/image/fetch/$s_!2erQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2erQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20580,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194026123?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2erQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic 424w, https://substackcdn.com/image/fetch/$s_!2erQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic 848w, https://substackcdn.com/image/fetch/$s_!2erQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic 1272w, https://substackcdn.com/image/fetch/$s_!2erQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fad9d9a-b67e-4e81-a00e-e22a0a9a7603_1920x1080.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Join Brooklyn Data Co. and Dagster Labs for a live deep dive on multi-tenancy for modern data platforms. We&#8217;ll cover:</p><p><br>- Code location isolation and project structure patterns<br>- Managing dependencies across tenants (including AI models)<br>- Operational strategies that scale with your organization<br>- Lessons learned from real production implementations<br><br>Save your spot for practical guidance that you can apply immediately.</p><p><strong><a href="https://dagster.io/events/multi-tenancy-for-modern-data-platforms?utm_campaign=39250561-26-04-WBNR_DEEP_DIVE_BROOKYLN_DATA&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_brooklyn_data&amp;utm_content=04_12_26_data_engineering_weekly">Reserve your spot now</a></strong></p><div><hr></div><h1>dbt: Semantic Layer vs. Text-to-SQL: 2026 Benchmark Update</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vWVf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vWVf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic 424w, https://substackcdn.com/image/fetch/$s_!vWVf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic 848w, https://substackcdn.com/image/fetch/$s_!vWVf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic 1272w, https://substackcdn.com/image/fetch/$s_!vWVf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vWVf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic" width="970" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:970,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12492,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194026123?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vWVf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic 424w, https://substackcdn.com/image/fetch/$s_!vWVf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic 848w, https://substackcdn.com/image/fetch/$s_!vWVf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic 1272w, https://substackcdn.com/image/fetch/$s_!vWVf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f305db0-e60f-4200-86d2-10e804d1ef7d_970x480.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI will write all the code generated in data engineering. It is the fundamental shift we all have to prepare for. dbt published its benchmark update, claiming GPT-5.3-Codex with Semantic Layer acheives 100.0% accuracy. </p><p><strong><a href="https://docs.getdbt.com/blog/semantic-layer-vs-text-to-sql-2026?version=1.10">https://docs.getdbt.com/blog/semantic-layer-vs-text-to-sql-2026?version=1.10</a></strong></p><div><hr></div><h1>Rill: Introducing Metrics SQL: A SQL-based semantic layer for humans and agents</h1><p>Staying on the metrics and the semantic layer, Rill introduces Metrics SQL to define logic once in YAML and expose it through standard SQL, automating aggregations, enforcing row-level security, and serving governed definitions to AI agents via an MCP server without exposing raw schemas. Deterministic metric resolution eliminates inconsistencies across consumers, while a semantic pushdown roadmap targets native MEASURE support in OLAP engines like ClickHouse and Snowflake.</p><p><strong><a href="https://www.rilldata.com/blog/introducing-metrics-sql-a-sql-based-semantic-layer-for-humans-and-agents">https://www.rilldata.com/blog/introducing-metrics-sql-a-sql-based-semantic-layer-for-humans-and-agents</a></strong></p><div><hr></div><h1>Meta: How Meta Used AI to Map Tribal Knowledge in Large-Scale Data Pipelines</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5FAg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5FAg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic 424w, https://substackcdn.com/image/fetch/$s_!5FAg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic 848w, https://substackcdn.com/image/fetch/$s_!5FAg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic 1272w, https://substackcdn.com/image/fetch/$s_!5FAg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5FAg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic" width="1456" height="1248" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1248,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23001,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194026123?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5FAg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic 424w, https://substackcdn.com/image/fetch/$s_!5FAg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic 848w, https://substackcdn.com/image/fetch/$s_!5FAg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic 1272w, https://substackcdn.com/image/fetch/$s_!5FAg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76224763-0a16-4b46-81a4-469a86ea5cef_1580x1354.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI coding assistants fail in proprietary codebases because tribal knowledge&#8212;implicit design decisions and legacy constraints&#8212;remains absent from training data and documentation. Meta Platforms deploys a swarm of 50 specialized agents to map a 4,100-file pipeline into concise context artifacts, using tiered explorer, analyst, critic, and fixer roles with automated decay detection. This system increases context coverage from 5% to 100%, captures over 50 non-obvious patterns, reduces tool-call volume by 40%, and cuts codebase research time from two days to 30 minutes.</p><p><strong><a href="https://engineering.fb.com/2026/04/06/developer-tools/how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines/">https://engineering.fb.com/2026/04/06/developer-tools/how-meta-used-ai-to-map-tribal-knowledge-in-large-scale-data-pipelines/</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=04_12_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xtWw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!xtWw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!xtWw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!xtWw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xtWw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14581,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=04_12_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194026123?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xtWw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!xtWw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!xtWw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!xtWw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F97b6788c-3818-4bff-b818-cf8000dba970_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=04_12_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Netflix: Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QyEH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QyEH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic 424w, https://substackcdn.com/image/fetch/$s_!QyEH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic 848w, https://substackcdn.com/image/fetch/$s_!QyEH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic 1272w, https://substackcdn.com/image/fetch/$s_!QyEH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QyEH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic" width="1400" height="774" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:774,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17702,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194026123?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QyEH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic 424w, https://substackcdn.com/image/fetch/$s_!QyEH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic 848w, https://substackcdn.com/image/fetch/$s_!QyEH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic 1272w, https://substackcdn.com/image/fetch/$s_!QyEH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0ec2667-cecb-4d9d-82fb-5f542ae3d2ce_1400x774.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Real-time dashboards with rolling windows invalidate traditional caches because shifting intervals cause repeated misses on otherwise unchanged historical data. Netflix builds an interval-aware caching proxy that decomposes Druid queries into one-minute buckets, serves historical segments from Cassandra, and fetches only the uncached tail from Druid using exponential TTLs ranging from 5 seconds to 1 hour. The system achieves 82% partial cache hit rates, reduces Druid query volume by 33%, improves P90 latency by 66%, and shifts the bottleneck from compute-heavy Druid to low-cost Cassandra storage.</p><p><strong><a href="https://netflixtechblog.com/stop-answering-the-same-question-twice-interval-aware-caching-for-druid-at-netflix-scale-22fadc9b840e">https://netflixtechblog.com/stop-answering-the-same-question-twice-interval-aware-caching-for-druid-at-netflix-scale-22fadc9b840e</a></strong></p><div><hr></div><h1>Booking.com: Scaling Experimentation Quality at Booking.com</h1><p>Underpowered experiments, premature peeking, and inconsistent reporting degrade decision quality as experimentation scales without statistical rigor. Booking.com embeds experimental quality across design, execution, and decision-making through data science ambassadors, peer-review practices, and tooling such as a Quality Tab that enforces power calculations and pre-registered hypotheses in real time. These changes increase the share of high-quality experiments, with the largest gains in design, where proper power improves the reliability of results and decision confidence.</p><p><strong><a href="https://booking.ai/scaling-experimentation-quality-at-booking-com-726152ee4ef0">https://booking.ai/scaling-experimentation-quality-at-booking-com-726152ee4ef0</a></strong></p><div><hr></div><h1>Andros Fenollosa: From zero to a RAG system: successes and failures</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ncCX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ncCX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic 424w, https://substackcdn.com/image/fetch/$s_!ncCX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic 848w, https://substackcdn.com/image/fetch/$s_!ncCX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic 1272w, https://substackcdn.com/image/fetch/$s_!ncCX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ncCX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic" width="1456" height="481" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:481,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12400,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/194026123?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ncCX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic 424w, https://substackcdn.com/image/fetch/$s_!ncCX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic 848w, https://substackcdn.com/image/fetch/$s_!ncCX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic 1272w, https://substackcdn.com/image/fetch/$s_!ncCX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87ae63e3-a099-4ae2-af0a-9add719a524a_1670x552.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Legacy engineering knowledge locked in unstructured simulation files and technical documents remains inaccessible when confidentiality requirements prohibit sending proprietary data to external LLM APIs. The author writes about building a local RAG system using Ollama, LlamaIndex, and ChromaDB&#8212;filtering out non-text files to reduce indexable load by 54% and serving source documents from Azure Blob Storage while staying within a 100GB disk constraint. The architecture delivers confidential retrieval over 1TB of legacy engineering data while establishing batch checkpointing and error-tolerant ingestion as the critical patterns for production RAG deployments at scale.</p><p><strong><a href="https://en.andros.dev/blog/aa31d744/from-zero-to-a-rag-system-successes-and-failures/">https://en.andros.dev/blog/aa31d744/from-zero-to-a-rag-system-successes-and-failures/</a></strong></p><div><hr></div><h1>All Things Distributed: S3 Files and the changing face of S3</h1><p>The introduction of S3 Files certainly generated a lot of interest on my reading list last week. I&#8217;m still studying the impact of S3 files on data pipeline engineering. One thing to note, S3 Files indeed breaks the read-on-write (Write in S3 Files, but read in S3) consistency model. I wonder if the data infrastructure really wants to go back to that world; nonetheless, this is an exciting blog to read to understand the thought process behind S3 Files. </p><p><strong><a href="https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html">https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html</a></strong></p><div><hr></div><h1>Apache Kafka: KIP-848: The Next Generation of the Consumer Rebalance Protocol</h1><p>Whether you like or dislike Apache Kafka, its KIPs are among the best learning materials for distributed systems. Consumer rebalancing is one of the hottest debated topics in the Kafka world. KIP-848 moves rebalance logic from consumer clients to the Group Coordinator&#8212;introducing a ConsumerGroupHeartbeat API, three-layered epochs for group, assignment, and member state, and server-side Range and Uniform assignors that drive incremental partition reconciliation without global synchronization barriers.</p><p><strong><a href="https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol">https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #264]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-264</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-264</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 06 Apr 2026 01:55:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/multi-tenancy-for-modern-data-platforms?utm_campaign=39250561-26-04-WBNR_DEEP_DIVE_BROOKYLN_DATA&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_brooklyn_data&amp;utm_content=04_05_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DOKS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic 424w, https://substackcdn.com/image/fetch/$s_!DOKS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic 848w, https://substackcdn.com/image/fetch/$s_!DOKS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic 1272w, https://substackcdn.com/image/fetch/$s_!DOKS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DOKS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72035,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/multi-tenancy-for-modern-data-platforms?utm_campaign=39250561-26-04-WBNR_DEEP_DIVE_BROOKYLN_DATA&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_brooklyn_data&amp;utm_content=04_05_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/193304081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DOKS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic 424w, https://substackcdn.com/image/fetch/$s_!DOKS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic 848w, https://substackcdn.com/image/fetch/$s_!DOKS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic 1272w, https://substackcdn.com/image/fetch/$s_!DOKS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69bf59c3-807b-49c0-9155-94987c87402c_2880x1620.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How data teams are solving multi-tenancy</h1><p>As data teams grow and serve multiple teams, clients, or business units from a shared platform, maintaining isolation and velocity without sacrificing either becomes a defining architectural challenge.<br><br>In this Deep Dive, Dagster Labs and Brooklyn Data Co. will cover the patterns, trade-offs, and real-world implementations behind multi-tenant data platforms built on Dagster. Attendees will leave this session with practical guidance they can take back to their own teams.</p><p><strong><a href="https://dagster.io/events/multi-tenancy-for-modern-data-platforms?utm_campaign=39250561-26-04-WBNR_DEEP_DIVE_BROOKYLN_DATA&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_brooklyn_data&amp;utm_content=04_05_26_data_engineering_weekly">Reserve your spot now</a></strong></p><div><hr></div><h1>Editorial Note: Help Us Make Data Engineering Weekly Better</h1><p>We&#8217;re working to make Data Engineering Weekly more useful, more relevant, and more worth your time every Sunday. If you have 2 minutes, please share your feedback through this short survey. Your input will directly shape what we cover, how we write, and where we improve next.</p><p><strong><a href="https://forms.gle/cgeww7czFAVBiVmV7">https://forms.gle/cgeww7czFAVBiVmV7</a></strong></p><div><hr></div><h1>Meta: Inside Meta&#8217;s Home-Grown AI Analytics Agent</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E2lG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E2lG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic 424w, https://substackcdn.com/image/fetch/$s_!E2lG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic 848w, https://substackcdn.com/image/fetch/$s_!E2lG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic 1272w, https://substackcdn.com/image/fetch/$s_!E2lG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E2lG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic" width="1136" height="1152" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1152,&quot;width&quot;:1136,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12310,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/193304081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E2lG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic 424w, https://substackcdn.com/image/fetch/$s_!E2lG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic 848w, https://substackcdn.com/image/fetch/$s_!E2lG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic 1272w, https://substackcdn.com/image/fetch/$s_!E2lG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd2ff2220-e21f-42f3-9ece-339ac8e89958_1136x1152.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Routine analytical queries dominate enterprise data science workloads, yet agents fail as warehouse scale grows without a bounded, structured context. Meta Platforms seeds per-user memory from historical query logs and organizes domain knowledge into cookbooks, recipes, and ingredients that encode validated analyst logic. This approach drives 77% weekly adoption within six months as community-authored recipes expand coverage across domains</p><p><strong><a href="https://medium.com/@AnalyticsAtMeta/inside-metas-home-grown-ai-analytics-agent-4ea6779acfb3">https://medium.com/@AnalyticsAtMeta/inside-metas-home-grown-ai-analytics-agent-4ea6779acfb3</a></strong></p><div><hr></div><h1>Michel Tricot: Beyond ETL - The Case for Context</h1><p>Agentic data infrastructure exposes a meaning gap that traditional ETL never addressed, as autonomous agents propagate poor context across queries at scale. The author validates the ECL framework through real-world failures and reframes the Context Store as a materialized view that pre-replicates SaaS data into versioned semantic structures for agent consumption. Existing data engineering primitives&#8212;incremental replication, schema normalization, and tenant isolation&#8212;support this model, shifting the data engineer&#8217;s role from data movement to context architecture.</p><p><strong><a href="https://agentblueprint.substack.com/p/beyond-etl-the-case-for-context">https://agentblueprint.substack.com/p/beyond-etl-the-case-for-context</a></strong></p><div><hr></div><h1>Chris Gambill: Medallion Architecture Isn&#8217;t As New As You Think</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H16B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H16B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic 424w, https://substackcdn.com/image/fetch/$s_!H16B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic 848w, https://substackcdn.com/image/fetch/$s_!H16B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic 1272w, https://substackcdn.com/image/fetch/$s_!H16B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H16B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28242,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/193304081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!H16B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic 424w, https://substackcdn.com/image/fetch/$s_!H16B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic 848w, https://substackcdn.com/image/fetch/$s_!H16B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic 1272w, https://substackcdn.com/image/fetch/$s_!H16B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6dba8a06-0dfc-4949-b164-3b2b16abbcaa_1456x794.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI tools amplify data-quality failures at scale when pipelines lack clear boundaries among the raw capture, transformation, and business-consumption layers. The author reframes Medallion Architecture as a disciplined evolution of staging and reporting models: Bronze preserves raw audit trails, Silver enforces schema contracts, and Gold delivers business-ready KPIs. This separation provides a reliable context for AI systems and reduces the downstream cost of bad data beyond incremental storage overhead.</p><p><strong><a href="https://gambilldataengineering.substack.com/p/medallion-architecture-isnt-as-new">https://gambilldataengineering.substack.com/p/medallion-architecture-isnt-as-new</a></strong></p><div><hr></div><h1>Sponsored: The Data Platform Fundamentals Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=04_05_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bwfu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Bwfu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Bwfu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Bwfu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bwfu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20296,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=04_05_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/193304081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bwfu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Bwfu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Bwfu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Bwfu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F116e691c-b991-4de6-8071-f6badff33555_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Learn the fundamental concepts to build a data platform in your organization.<br><br>- Tips and tricks for data modeling and data ingestion patterns<br>- Explore the benefits of an observation layer across your data pipelines<br>- Learn the key strategies for ensuring data quality for your organization</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=04_05_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Zapier: Lessons from using the outbox pattern at scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Q0Q6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic 424w, https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic 848w, https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic 1272w, https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic" width="1456" height="1029" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1029,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17976,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/193304081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic 424w, https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic 848w, https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic 1272w, https://substackcdn.com/image/fetch/$s_!Q0Q6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffab0180f-962f-4f77-a42e-82f5e768fae0_2266x1602.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>High-throughput event pipelines require durable buffering between producers and brokers to prevent data loss during failures or maintenance windows. Zapier implements a transactional outbox in its Go-based Events API using sharded SQLite on EBS-backed Kubernetes StatefulSets, with WAL mode, 50 shards per pod, and per-shard mutexes sustaining 15,000 events per second during Kafka outages. Operational limits from static sharding and StatefulSet constraints push a shift toward a sidecar pattern that writes to S3 on failure and replays via SQS.</p><p><strong><a href="https://zapier.com/blog/lessons-from-using-outbox-pattern-at-scale/">https://zapier.com/blog/lessons-from-using-outbox-pattern-at-scale/</a></strong></p><div><hr></div><h1>Lyft: Predicting Rider Conversion in Sparse Data Environments with Bayesian Trees</h1><p>Sparse contextual data causes standard ML models to overfit and generate unstable predictions across long-tail combinations of location, time, and demand. Lyft models rider conversion using a Bayesian Tree that organizes context hierarchically and applies Gaussian priors with L2 regularization to balance sparse leaf signals against stable parent trends. This approach delivers localized accuracy in dense data and degrades to broader signals in sparse regions, while enforcing monotonicity constraints to ensure consistent, interpretable predictions.</p><p><strong><a href="https://eng.lyft.com/predicting-rider-conversion-in-sparse-data-environments-with-bayesian-trees-07227ff92789">https://eng.lyft.com/predicting-rider-conversion-in-sparse-data-environments-with-bayesian-trees-07227ff92789</a></strong></p><div><hr></div><h1>LinkedIn: Building LinkedIn&#8217;s CTV Ads: Scaling professional reach to the big screen</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M-Xu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M-Xu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic 424w, https://substackcdn.com/image/fetch/$s_!M-Xu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic 848w, https://substackcdn.com/image/fetch/$s_!M-Xu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic 1272w, https://substackcdn.com/image/fetch/$s_!M-Xu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M-Xu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic" width="683" height="425" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:425,&quot;width&quot;:683,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7012,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/193304081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M-Xu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic 424w, https://substackcdn.com/image/fetch/$s_!M-Xu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic 848w, https://substackcdn.com/image/fetch/$s_!M-Xu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic 1272w, https://substackcdn.com/image/fetch/$s_!M-Xu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd419cb91-72e2-43d8-a4c7-ac3016c10661_683x425.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>B2B advertisers struggle to reach professional audiences on connected TV while maintaining the targeting precision and measurement fidelity of digital environments. LinkedIn extends its identity graph to CTV through private marketplace supply, cross-device household mapping, and transcoding pipelines that meet CBR encoding and native frame rate standards. The platform delivers 99% brand-safe inventory, achieves 11x cost efficiency over linear TV, and scales from manual deals to self-serve inventory via Campaign Manager.</p><p><strong><a href="https://www.linkedin.com/blog/engineering/marketing/building-linkedins-ctv-ads">https://www.linkedin.com/blog/engineering/marketing/building-linkedins-ctv-ads</a></strong></p><div><hr></div><h1>Netflix: Synchronizing the Senses: Powering Multimodal Intelligence for Video Search</h1><p>Video search across large productions requires unifying outputs from multiple ML models into a low-latency retrieval system that editors can query in real time. Netflix pipelines multimodal annotations through Cassandra for high-throughput ingestion, Kafka for temporal bucketing into one-second intervals, and Elasticsearch for hierarchical indexing that combines character, scene, and dialogue signals. The system enables semantic vector search via HNSW, supports match-phrase dialogue queries, and applies union&#8211;intersection logic to reconstruct scene boundaries across billions of data points.</p><p><strong><a href="https://netflixtechblog.com/powering-multimodal-intelligence-for-video-search-3e0020cf1202">https://netflixtechblog.com/powering-multimodal-intelligence-for-video-search-3e0020cf1202</a></strong></p><div><hr></div><h1>Salesforce: Inside Informatica&#8217;s Spark-Based Data Integration Platform: Running 250K Enterprise Pipelines Daily</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nnTZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nnTZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic 424w, https://substackcdn.com/image/fetch/$s_!nnTZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic 848w, https://substackcdn.com/image/fetch/$s_!nnTZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic 1272w, https://substackcdn.com/image/fetch/$s_!nnTZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nnTZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic" width="652" height="440" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:440,&quot;width&quot;:652,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12133,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/193304081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nnTZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic 424w, https://substackcdn.com/image/fetch/$s_!nnTZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic 848w, https://substackcdn.com/image/fetch/$s_!nnTZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic 1272w, https://substackcdn.com/image/fetch/$s_!nnTZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a42a2f-149f-476f-94db-cb5f5f4fa81e_652x440.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Enterprise data integration platforms struggle at the petabyte scale when single-node execution engines lack distributed compute and automated resource optimization. Informatica migrates CDI to Spark++ on Kubernetes, preserving graphical mappings while introducing row-level fault isolation, ephemeral, VPC-bound clusters, and automated FinOps tuners that optimize infrastructure and Spark parameters based on historical workloads. The distributed system supports 5,500 enterprise clients across 250,000 daily tasks, reduces infrastructure costs by 1.65x, and maintains 99.9% control plane availability.</p><p><strong><a href="https://engineering.salesforce.com/inside-informaticas-spark-based-data-integration-platform-running-250k-enterprise-pipelines-daily/">https://engineering.salesforce.com/inside-informaticas-spark-based-data-integration-platform-running-250k-enterprise-pipelines-daily/</a></strong></p><div><hr></div><h1>ZeroToOne: Taming S3 Shuffle at Scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d3xC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d3xC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic 424w, https://substackcdn.com/image/fetch/$s_!d3xC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic 848w, https://substackcdn.com/image/fetch/$s_!d3xC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic 1272w, https://substackcdn.com/image/fetch/$s_!d3xC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d3xC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic" width="1020" height="510" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b409518c-0636-4080-8238-160c2da22ac0_1020x510.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:510,&quot;width&quot;:1020,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9844,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/193304081?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d3xC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic 424w, https://substackcdn.com/image/fetch/$s_!d3xC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic 848w, https://substackcdn.com/image/fetch/$s_!d3xC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic 1272w, https://substackcdn.com/image/fetch/$s_!d3xC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb409518c-0636-4080-8238-160c2da22ac0_1020x510.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>S3-based Spark shuffle suffers from quadratic scaling of GET requests, prefix throttling, and executor hangs, which drive high API costs and instability at production scale. ZeroToOne reduces shuffle costs by 95% by coalescing map tasks and expanding S3 prefixes from 10 to 500, then hardens the shuffle plugin with ConcurrentHashMap-based atomic locking and prefetch iterator timeouts to eliminate race conditions and deadlocks. These changes stabilize spot instance execution and reduce per-stage API costs from $72 to near zero in large backfill workloads.</p><p><strong><a href="https://blog.platform.zerotoone.ai/blog/taming-s3-shuffle-at-scale/">https://blog.platform.zerotoone.ai/blog/taming-s3-shuffle-at-scale/</a></strong></p><div><hr></div><h1>Radim Marek: Production query plans without production data</h1><p>Just the other day, I was in a design discussion about building a routing engine for SQL query execution based on the query plan, and how to back this up in the CI pipeline to catch expensive queries earlier. It is one of the critical problems that I wish all data warehouses and Lakehouses would provide out of the box. </p><p><strong><a href="https://boringsql.com/posts/portable-stats/">https://boringsql.com/posts/portable-stats/</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[The Missing Interface in Data Platform Engineering]]></title><description><![CDATA[How data leaders should design the boundary between platforms and dependent teams.]]></description><link>https://www.dataengineeringweekly.com/p/the-missing-interface-in-data-platform</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/the-missing-interface-in-data-platform</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Thu, 02 Apr 2026 03:36:07 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!f16i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RZl6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RZl6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic 424w, https://substackcdn.com/image/fetch/$s_!RZl6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic 848w, https://substackcdn.com/image/fetch/$s_!RZl6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic 1272w, https://substackcdn.com/image/fetch/$s_!RZl6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RZl6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13817,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192920698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RZl6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic 424w, https://substackcdn.com/image/fetch/$s_!RZl6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic 848w, https://substackcdn.com/image/fetch/$s_!RZl6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic 1272w, https://substackcdn.com/image/fetch/$s_!RZl6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fae408719-1fee-45d6-8455-64c78acf6219_1536x1024.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A familiar pattern plays out inside many platform organizations. A data platform team ships what it sees as a milestone: a self-service stack with governed datasets, reusable pipelines, access automation, lineage, templates, and documentation. Leadership sees leverage. The platform team sees scale.</p><p>Then requests start arriving.</p><p>Can someone help model the first few datasets? Can someone validate the ownership setup? Can someone walk us through the abstractions? Can the platform team handle the initial rollout for this one use case?</p><p>Platform teams often call that resistance to self-service. The real problem is simpler: the interface is incomplete.</p><p>The tooling exists. The technical path exists. But the consumer team still cannot tell where responsibility begins, where support ends, what failure looks like, or what the team must operate independently.</p><p>The platform team sees a reusable capability. The consumer team sees a system that still depends on human interpretation.</p><p>Both teams are acting rationally. The platform team has built a technical interface. The consumer team is still looking for an operating interface.</p><p>That gap accounts for more platform friction than most platform strategies acknowledge.</p><p>Data platform engineering often gets framed as a systems problem: storage layers, compute engines, orchestration, metadata, governance, access control, and developer tooling. Those components matter. Once a platform becomes shared infrastructure, however, the harder problem shifts. The platform becomes a dependency surface across teams, applications, workflows, and operational responsibilities.</p><p>At that point, the key question shifts from &#8220;What did we build?&#8221; to &#8220;How should other teams depend on it?&#8221;</p><p>That question defines the real interface in data platform engineering.</p><h1><strong>Data Platform Engineering is Coordination Engineering</strong></h1><p>A mature data platform is not just a collection of capabilities. A mature data platform creates a shared system that other teams must trust, integrate with, and operate against. Every dependency on that platform carries assumptions: what stays stable, what can change, who responds when something breaks, how fast a team can expect support, what a consumer must understand, and what remains the platform team&#8217;s responsibility.</p><p>Teams carry those assumptions whether they write them down or not. When teams leave them implicit, engineers reconstruct them through tickets, Slack threads, tribal knowledge, escalations, and repeated misunderstandings. When teams make them explicit, the assumptions become part of the platform&#8217;s operating interface.</p><p>The operating interface has two parts.</p><p>One part defines the explicit rules that govern the relationship: schemas, APIs, freshness guarantees, ownership boundaries, compatibility expectations, escalation paths, and adoption responsibilities.</p><p>The other part defines the communication pattern through which teams use those rules: reactive ticketing, temporary embedding, joint execution, self-service federation, or community contribution.</p><p>Most platform failures begin when teams underdesign one or both parts.</p><p>The platform team thinks it has published a reusable capability. The consumer team experiences an ambiguous boundary. We could document the schema, but not the operational expectations. The self-service path may exist, but the adoption model does not. The API may be stable, but teams still negotiate failure semantics socially every time they matter.</p><p>Platform maturity, then, depends on more than better tooling. Platform maturity depends on how well teams design the dependency boundary between the platform and the groups that rely on it.</p><h1><strong>A contract is only one layer of the operating interface</strong></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yJJ5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yJJ5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic 424w, https://substackcdn.com/image/fetch/$s_!yJJ5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic 848w, https://substackcdn.com/image/fetch/$s_!yJJ5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic 1272w, https://substackcdn.com/image/fetch/$s_!yJJ5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yJJ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18015,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192920698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yJJ5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic 424w, https://substackcdn.com/image/fetch/$s_!yJJ5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic 848w, https://substackcdn.com/image/fetch/$s_!yJJ5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic 1272w, https://substackcdn.com/image/fetch/$s_!yJJ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed1b094b-cbaa-40b9-a01f-f791b5ea8c52_1536x1024.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Platform discussions often stall because engineers use the word <em>contract</em> too narrowly. In data work, many engineers hear &#8220;contract&#8221; and immediately think of a data contract: schema shape, field semantics, compatibility rules between producer and consumer, and perhaps a validation mechanism.</p><p>That category matters. That category does not cover the problem.</p><p>A stack of interface layers governs a data platform dependency, and each layer answers a different question.</p><h2><strong>1. Technical interface</strong></h2><p>The technical interface is the layer most teams already know how to discuss. It includes APIs, schemas, tables, events, payloads, SDKs, versioning rules, authentication mechanisms, and compatibility expectations. The technical interface defines the shape of interaction.</p><p>When people say a platform has a clear interface, they often mean only that layer.</p><p>Teams can still fail operationally even when the technical interface is clear.</p><h2><strong>2. Operational contract</strong></h2><p>The operational contract defines runtime expectations. How fresh should the data be? What latency matters for a given workflow? How should retries behave? What happens when a dependency degrades? Which failures does the platform absorb, and which failures propagate to consumers? Which SLOs, error budgets, or maintenance windows apply?</p><p>The operational contract separates descriptive interoperability from dependable interoperability.</p><p>Two teams may agree on a schema and still disagree completely on whether a six-hour delay is acceptable, whether we tolerate the stale reads, or whether a breaking change in behavior requires a coordinated rollout.</p><h2><strong>3. Ownership model</strong></h2><p>The ownership model defines authority and accountability. Who approves interface changes? Who owns backward compatibility? Who responds during incidents? Who decides when a consumer must migrate? Who can reject a new use case because it violates platform constraints?</p><p>Many recurring platform conflicts are ownership failures disguised as technical disputes.</p><p>A consumer team says, &#8220;The platform changed under us.&#8221; The platform team says, &#8220;You were never supposed to rely on that behavior.&#8221; In most cases, unclear ownership boundaries create the conflict long before the disagreement surfaces.</p><h2><strong>4. Adoption model</strong></h2><p>The adoption model defines what a consuming team must do to use the platform successfully. Is the platform truly self-service? Does first adoption require embedding? Must the consumer own pipeline logic, operational monitoring, data quality checks, and incident response? How much platform literacy must a team build before independence becomes realistic?</p><p>Most platform design documents ignore that layer even though it often determines whether adoption succeeds.</p><p>A workflow is not self-service because a platform engineer no longer types the commands. Self-service begins when a consumer team can understand, operate, and recover within the platform&#8217;s boundaries independently.</p><h2><strong>5. Communication pattern</strong></h2><p>Every platform also has a practical communication mode. Teams may collaborate through tickets, pairing, embedded work, shared planning, interfaces, or contribution models. Those patterns are not secondary to the platform. Those patterns determine how the platform behaves in practice.</p><p>When teams do not consciously design that layer, habits and local workarounds define it by default.</p><p>Together, those layers form the platform&#8217;s operating interface: the real boundary through which teams depend on one another.</p><h1><strong>Every Platform Already has a Communication Model</strong></h1><p>Platform teams often speak as though better tooling will eventually eliminate communication. Tooling never eliminates communication. Tooling only changes its shape.</p><p>A ticket queue is a communication system. An embedding is a communication system. An API with onboarding guides and escalation rules is a communication system. An internal RFC process also serves as a communication system.</p><p>For that reason, a platform maturity model is also a communication maturity model. The model describes not only what the platform team has built, but also how dependency information moves between the platform and its consumers.</p><p>Communication may remain human-mediated, partially codified, or increasingly interface-led. No single mode is always superior. The right mode depends on the capability&#8217;s maturity, the consumer&#8217;s readiness, and the complexity of the work.</p><h1><strong>Five ways platform teams and dependent teams actually work</strong></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!f16i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!f16i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic 424w, https://substackcdn.com/image/fetch/$s_!f16i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic 848w, https://substackcdn.com/image/fetch/$s_!f16i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic 1272w, https://substackcdn.com/image/fetch/$s_!f16i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!f16i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic" width="1456" height="1639" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1639,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27221,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192920698?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!f16i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic 424w, https://substackcdn.com/image/fetch/$s_!f16i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic 848w, https://substackcdn.com/image/fetch/$s_!f16i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic 1272w, https://substackcdn.com/image/fetch/$s_!f16i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdae8ea51-3898-4ca0-aa70-eb4580cce90e_1920x2161.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>No single collaboration pattern fits every internal platform. The right mode depends on the capability in question, the consuming team, and the type of dependency with them. Strong platform organizations usually operate across several modes at once.</p><p>Teams rarely fail because they sit at the &#8220;wrong&#8221; level. Teams fail because they assume every consumer can interact with the platform through the same interface, when reality clearly shows otherwise.</p><h2><strong>Level 1: Reactive &#8212; The service desk</strong></h2><p>At Level 1, the platform team operates primarily through request fulfillment. Teams file tickets. Platform engineers provision resources, troubleshoot access, define ingestion patterns, or implement parts of the first workflow manually&#8212;knowledge about how the platform works lives mostly in people&#8217;s heads.</p><p>Many people dismiss that mode as immaturity. That judgment misses the point. Level 1 is where many new platform capabilities should begin.</p><p>When a capability is still emerging, the platform team does not yet know what the stable interface ought to be. Manual repetition helps the team discover the pattern worth codifying. The first few onboarding efforts reveal which inputs remain stable, which edge cases arise frequently, which assumptions break under real-world workloads, and which abstractions are premature.</p><p>The real danger is not Level 1 itself. The real danger is staying there after repetition becomes obvious.</p><p>Level 1 fails when demand scales linearly. Every new consumer increases direct demand on the platform team. The team becomes a fulfillment bottleneck. Consumers experience the platform as a queue instead of a leverage point. Platform engineers spend more time context-switching than building reusable capabilities.</p><p>Teams should move beyond Level 1 when repetition becomes predictable. Once the same task repeats enough times to reveal a stable pattern, some part of the operating interface should move out of people&#8217;s heads and into a reusable form.</p><h2><strong>Level 2: Coordinated &#8212; The embedding</strong></h2><p>At Level 2, the platform team transfers capability through direct collaboration. A platform engineer temporarily works with a consumer team to bootstrap adoption, interpret abstractions, and help the team operate inside the intended boundary.</p><p>Level 2 is not just support. Level 2 is a deliberate adoption model.</p><p>Embedding works when the platform capability is ready enough to be reused but still requires high-context interpretation. Embedding also works when the platform team needs to learn from consumers before it can fully stabilize the interface. The interaction runs in both directions: the platform teaches the intended path, and the consumer exposes where the path is incomplete.</p><p>Level 2 fails when dependency persists. The embedded engineer becomes the permanent translator. The team learns to route every ambiguity to a familiar person rather than building independent platform fluency. Once the embedding ends, the team slides back into ticketing.</p><p>Teams should move beyond Level 2 when understanding becomes repeatable. Once the same questions keep recurring, lack of exposure is no longer the main problem. Interface clarity is the problem. The platform then needs better runbooks, better failure handling, clearer ownership boundaries, or a more legible self-service path.</p><h2><strong>Level 3: Partnership &#8212; The joint mission</strong></h2><p>At Level 3, platform and consumer teams align around a shared objective for a bounded period. Level 3 is not request fulfillment, nor is it simple enablement. Level 3 is a temporary joint execution model.</p><p>Level 3 works when the dependency boundary itself is part of the problem. Teams often need Level 3 when they launch a new real-time product feature that requires changes across ingestion, serving, governance, and application behavior; when they stand up an experimentation platform that affects both platform architecture and domain logic; or when they build a new cross-cutting data product whose responsibilities cannot yet be separated cleanly.</p><p>Level 3 creates speed under complexity. Instead of negotiating everything through a queue, the teams create a shared execution context.</p><p>Level 3 fails when the temporary mission becomes a permanent entanglement. What was supposed to be a time-boxed collaboration becomes a staffing model. The platform roadmap drifts toward one team&#8217;s local priorities. The consumer team stops building independent ownership.</p><p>Teams should move beyond Level 3 when reusable patterns emerge. Once joint work starts producing structures that other teams will need, the organization should ask which parts belong in a generalized operating interface rather than in a persistent bespoke relationship.</p><h2><strong>Level 4: Federation &#8212; The self-service operating interface</strong></h2><p>At Level 4, teams collaborate primarily through explicit interfaces rather than constant human mediation. The platform publishes technical interfaces, operational expectations, ownership rules, onboarding guidance, and support boundaries clearly enough that consuming teams can adopt capabilities independently.</p><p>Level 4 is where platform economics start to work.</p><p>The marginal cost of onboarding new teams drops because the interface does more of the teaching. The platform team shifts away from request fulfillment and toward maintaining compatibility, reliability, documentation, tooling, and interface evolution.</p><p>Many organizations fool themselves at Level 4.</p><p>A team can publish an API, a portal, or a template and claim self-service while leaving the actual operating interface incomplete. The consumer can create the resource, but does not know how to handle failure. The documentation describes the happy path, but not the migration path. The schema is versioned, but the escalation model is still social. The ownership boundary exists in theory, but not in behavior.</p><p>That condition is not federation. That condition is a ticket queue with better branding.</p><p>Level 4 fails when self-service arrives too early. The platform exposes an interface before it has done enough repeated work to understand which parts are stable and which parts still require human judgment. Consumers adopt the easy 60% and escalate the hard 40%, forcing the platform team to run Level 1 and Level 4 simultaneously.</p><p>A healthy Level 4 shows more than usage. A healthy Level 4 shows independent operation. A consuming team should be able to adopt the capability, reason about normal failure, understand the support model, and make routine changes without renegotiating the relationship each time.</p><h2><strong>Level 5: Ecosystem &#8212; The internal commons</strong></h2><p>At Level 5, teams not only consume the platform. Teams extend it. The platform becomes a stewarded commons with contribution pathways, governance, standards, RFCs, and shared maintenance expectations.</p><p>That operating model looks attractive in strategy decks because it suggests scale through internal open-source behavior. In practice, organizations struggle to sustain it.</p><p>Contribution requires more than technical maturity. Contribution requires governance maturity. Teams need clarity on how to adopt the contribution, how to review its quality, who maintains the artifact over time, how support obligations are assigned, and how the platform distinguishes production-grade extensions from abandoned experiments.</p><p>Level 5 fails when the commons turns into unmanaged sprawl. Shared repositories fill with unevenly maintained components. The boundary between the core platform and the contributed surface becomes blurry. Consumers cannot distinguish between governed and incidental capabilities.</p><p>For many organizations, Level 4 is the durable steady state. Level 5 becomes valuable only when culture, incentives, and governance can support shared stewardship.</p><h1><strong>The missing variable is contract literacy</strong></h1><p>Most discussions of platform maturity focus on the platform side. That view is incomplete.</p><p>A platform can be highly mature in its own design and still fail in practice because the consuming team is not ready to operate against that interface. A Level 4 platform paired with a Level 1 consumer often behaves like a Level 1 system.</p><p>Consumer readiness matters, but teams should define the term more precisely than &#8220;platform familiarity&#8221; alone.</p><p>Consumer readiness is really a form of contract literacy. Consumer readiness measures a team&#8217;s ability to understand the operating interface, interpret the support boundaries, reason about failure modes, absorb ownership, and use the self-service path without relying on informal rescue.</p><p>A mature platform with a new team often needs Level 2 interaction first. The capability may be stable, but the team lacks the context to operate within it.</p><p>An early platform with a strong consumer base may benefit from a Level 3 partnership. The capability is yet to enter production, but the team is strong enough to co-develop the future interface.</p><p>A mature platform with a mature consumer can operate effectively through Level 4 and, in some cases, Level 5.</p><p>An early platform with a new consumer should not pretend to be anything other than Level 1 for a while.</p><p>The diagnostic question is not &#8220;How mature is the platform?&#8221; The better question is, &#8220;How mature is the dependency relationship, given both sides of the interface?&#8221;</p><h1><strong>Why platforms fail even when the interface exists</strong></h1><p>Many platform incidents seem surprising only when the platform team mistakes the technical interface for the whole interface.</p><p>A schema can remain stable even as the operational contract breaks down. A consumer receives the expected fields but cannot tolerate the freshness lag introduced by the new implementation.</p><p>An API can be correct while the ownership model remains unclear. Both teams assume the other team is responsible for migration sequencing, and the rollout fails in the gap.</p><p>A portal can be self-service while the adoption model remains incomplete. The consumer can provision the resource, but does not know which observability, alerting, backfill policy, or quality checks now belong to the team.</p><p>Documentation can be extensive while the communication pattern remains reactive. The written material explains the happy path, but the only reliable way to get edge-case answers is still to message a platform engineer directly.</p><p>Each example points to the same problem. The platform appears mature on paper and unstable in practice because one layer of the operating interface is missing.</p><p>That pattern also explains why many arguments about &#8220;data contracts&#8221; feel unsatisfying. A schema contract may remove one class of ambiguity while leaving the dependency relationship fundamentally underdesigned. Platforms do not scale on descriptive clarity alone. Platforms scale when the operating interface becomes explicit enough for teams to coordinate predictably.</p><h1><strong>Why does this matter more in an agentic enterprise</strong></h1><p>As organizations move toward AI-mediated operations, autonomous workflows, and increasingly automated decision loops, the cost of implicit interfaces rises sharply.</p><p>Human teams can absorb ambiguity through judgment, relationships, escalation habits, and informal context. Human teams can often compensate for missing rules because they know whom to ask, which unwritten assumptions are in place, and when a local exception is acceptable.</p><p>Automated systems do not compensate in the same way. Automated systems require explicit state, explicit boundaries, and explicit expectations. A platform boundary that depends on tribal knowledge, undocumented ownership, or socially negotiated failure handling already strains human teams. That same boundary becomes structurally limiting when an organization wants automation to operate consistently across it.</p><p>No company needs a grand operating model of the enterprise to benefit from that insight. Organizations do need a simpler discipline. Teams need to make the dependency interface between systems and teams legible enough to operate without constant interpretation.</p><h2><strong>Maturity is reversible</strong></h2><p>We view the maturity models as a ladder. Real organizations behave more like shifting systems.</p><p>A platform that reached Level 4 can slide back toward Level 1 when documentation rots, examples stop working, and teams stop trusting the interface. An embedding that once worked can decay when the engineers who learned the system leave, and the knowledge never becomes fully externalized. A clear ownership model can collapse after a reorganization. A rearchitecture can reset operational assumptions so thoroughly that teams must return to manual coordination until the new boundary stabilizes.</p><p>Those events are not unusual. Those events are the normal dynamics of organizational life.</p><p>A maturity model is useful not because it promises to eliminate regression. A maturity model is useful because it helps teams name regression quickly and respond deliberately.</p><p>If a self-service platform has quietly reverted to a ticket queue, the problem is not just support volume. Some part of the operating interface has decayed: the technical surface, the operational contract, the ownership model, the adoption model, or the communication pattern.</p><p>Once teams name that decay, they can turn frustration back into design work.</p><h1><strong>The real platform interface is organizational.</strong></h1><p>The hardest part of a data platform is rarely the infrastructure itself. The hardest part is designing the boundary between the platform and the teams that depend on it.</p><p>That boundary cannot be reduced to schemas or APIs alone. The boundary includes operational expectations, support models, ownership rules, adoption assumptions, and the communication pattern through which those expectations become real.</p><p>When teams leave those layers implicit, they do not avoid design work. They push the work into tickets, escalations, workarounds, and repeated misunderstandings. When teams make those layers explicit, the platform becomes easier to trust, adopt, and scale.</p><p>Data platform engineering is not just infrastructure engineering. Data platform engineering is interface design at the level of teams, systems, and responsibilities.</p><p>A maturity model should not rank organizations morally or insist that every platform must reach some final stage. A maturity model should give teams a vocabulary for the dependency relationships they actually have, the interfaces they are really exposing, and the ones they need to design next.</p><p>The missing interface in data platform engineering is not another layer of tooling. The missing interface is the operating interface that defines how dependent teams rely on one another.</p><p>Until teams make that interface explicit, most platform scale remains performative.</p><p>Once teams make that interface explicit, platform scale becomes real.</p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #263]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-263</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-263</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 30 Mar 2026 02:18:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/multi-tenancy-for-modern-data-platforms?utm_campaign=39250561-26-04-WBNR_DEEP_DIVE_BROOKYLN_DATA&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_brooklyn_data&amp;utm_content=03_29_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aZgx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic 424w, https://substackcdn.com/image/fetch/$s_!aZgx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic 848w, https://substackcdn.com/image/fetch/$s_!aZgx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic 1272w, https://substackcdn.com/image/fetch/$s_!aZgx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aZgx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72035,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/multi-tenancy-for-modern-data-platforms?utm_campaign=39250561-26-04-WBNR_DEEP_DIVE_BROOKYLN_DATA&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_brooklyn_data&amp;utm_content=03_29_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aZgx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic 424w, https://substackcdn.com/image/fetch/$s_!aZgx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic 848w, https://substackcdn.com/image/fetch/$s_!aZgx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic 1272w, https://substackcdn.com/image/fetch/$s_!aZgx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac92e0d0-af5f-40dc-87dc-c959cb6157c3_2880x1620.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How data teams are solving multi-tenancy</h1><p>As data teams grow and serve multiple teams, clients, or business units from a shared platform, maintaining isolation and velocity without sacrificing either becomes a defining architectural challenge.<br><br>In this Deep Dive, Dagster Labs and Brooklyn Data Co. will cover the patterns, trade-offs, and real-world implementations behind multi-tenant data platforms built on Dagster. Attendees will leave this session with practical guidance they can take back to their own teams.</p><p><strong><a href="https://dagster.io/events/multi-tenancy-for-modern-data-platforms?utm_campaign=39250561-26-04-WBNR_DEEP_DIVE_BROOKYLN_DATA&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_brooklyn_data&amp;utm_content=03_29_26_data_engineering_weekly">Reserve your spot now</a>.</strong></p><div><hr></div><h1>Aurimas Grici&#363;nas: State of Context Engineering in 2026</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ewfo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ewfo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic 424w, https://substackcdn.com/image/fetch/$s_!ewfo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic 848w, https://substackcdn.com/image/fetch/$s_!ewfo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic 1272w, https://substackcdn.com/image/fetch/$s_!ewfo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ewfo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic" width="1456" height="826" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:826,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16223,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ewfo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic 424w, https://substackcdn.com/image/fetch/$s_!ewfo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic 848w, https://substackcdn.com/image/fetch/$s_!ewfo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic 1272w, https://substackcdn.com/image/fetch/$s_!ewfo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb3a1207-28b8-4518-adab-e7648764dd68_1456x826.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLM reasoning degrades with oversized context, forcing developers to manage attention through structured context engineering rather than scaling model size. The author outlines five patterns&#8212;<strong>progressive disclosure, compression, routing, agentic RAG, and tool management</strong>&#8212;that control how context is selected and applied. Layered orchestration across discovery, activation, and execution enables complex agent behavior within fixed context limits while preserving reasoning quality.</p><p><strong><a href="https://www.newsletter.swirlai.com/p/state-of-context-engineering-in-2026">https://www.newsletter.swirlai.com/p/state-of-context-engineering-in-2026</a></strong></p><div><hr></div><h1>Joe Reis: AI Is Here, But The Hard Parts Haven&#8217;t Changed</h1><p>AI is accelerating coding velocity, but it&#8217;s also exposing structural weaknesses that data teams have ignored for years&#8212;legacy systems, misaligned leadership, and poor business context modeling. Data from Joe Reis&#8217;s March 2026 survey reinforces the gap: teams are shipping code faster, yet many still lack clarity on production value, while data modeling and semantic layers are emerging as the next critical frontier. Data engineering now faces a reset moment&#8212;improving end-to-end delivery efficiency matters more than optimizing isolated pipelines, a direction I&#8217;ve been exploring in &#8220;<strong><a href="https://www.dataengineeringweekly.com/p/data-engineering-after-ai">Data Engineering After AI</a></strong>&#8221; and &#8220;<strong><a href="https://www.dataengineeringweekly.com/p/etl-is-dead">ETL is Dead</a>.</strong>&#8221;</p><p><strong><a href="https://joereis.substack.com/p/ai-is-here-but-the-hard-parts-havent">https://joereis.substack.com/p/ai-is-here-but-the-hard-parts-havent</a></strong></p><div><hr></div><h1>Hamel Husain: The Revenge of the Data Scientist</h1><p>LLM API accessibility enables rapid AI feature development but obscures reliability requirements grounded in evaluation and experimental design. The author argues modern AI development reuses core data science practices: analyzing production traces, validating LLM-as-judge with precision and recall, grounding test sets in real data, and using domain experts to define criteria. Teams that avoid synthetic benchmarks and over-automation focus on inspecting data to identify failure modes, reinforcing the role of data scientists as reliability gatekeepers.</p><p><strong><a href="https://hamel.dev/blog/posts/revenge/">https://hamel.dev/blog/posts/revenge/</a></strong></p><div><hr></div><h1>Sponsored: The Data Platform Fundamentals Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=03_29_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Nyg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!0Nyg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!0Nyg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!0Nyg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Nyg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22370,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=03_29_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Nyg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!0Nyg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!0Nyg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!0Nyg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F91567ab5-d5ff-469b-b90e-b288422e2c4a_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating under a single platform. <br><br>In this book, you&#8217;ll learn:<br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=03_29_26_data_engineering_weekly">Download your free copy now</a>.</strong></p><div><hr></div><h1>Figma: Redefining impact as a data scientist</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1EWS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1EWS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic 424w, https://substackcdn.com/image/fetch/$s_!1EWS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic 848w, https://substackcdn.com/image/fetch/$s_!1EWS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic 1272w, https://substackcdn.com/image/fetch/$s_!1EWS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1EWS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23211,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1EWS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic 424w, https://substackcdn.com/image/fetch/$s_!1EWS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic 848w, https://substackcdn.com/image/fetch/$s_!1EWS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic 1272w, https://substackcdn.com/image/fetch/$s_!1EWS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb670c688-fcae-4d85-b859-05f71a62feb6_2160x1215.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Data science impact in mission-critical systems like billing depends on domain expertise and observability rather than experimentation, shifting focus from models to correctness and clarity. The author describes Figma&#8217;s full-stack approach, where data scientists build consistency checks, create applications that explain system behavior, and define correctness criteria. Embedding these practices into operational systems scales their impact through tools rather than reports.</p><p><strong><a href="https://www.figma.com/blog/redefining-impact-as-a-data-scientist/">https://www.figma.com/blog/redefining-impact-as-a-data-scientist/</a></strong></p><div><hr></div><h1>BlaBlaCar: Beyond the dashboard: how BlaBlaCar PMs use AI to self-serve data</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UGBw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UGBw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic 424w, https://substackcdn.com/image/fetch/$s_!UGBw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic 848w, https://substackcdn.com/image/fetch/$s_!UGBw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic 1272w, https://substackcdn.com/image/fetch/$s_!UGBw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UGBw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic" width="1400" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22161,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UGBw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic 424w, https://substackcdn.com/image/fetch/$s_!UGBw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic 848w, https://substackcdn.com/image/fetch/$s_!UGBw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic 1272w, https://substackcdn.com/image/fetch/$s_!UGBw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc7f9a29-a7ed-4b76-8406-95f90bbb0ebb_1400x764.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Data analyst bottlenecks in fast-moving organizations require enabling non-technical users to self-serve without compromising data integrity or introducing hallucinations. BlaBlaCar evolves its approach from generic LLM usage to structured JSON schema documentation and few-shot learning on expert query histories, teaching the system to map natural language to business rules. A three-zone autonomy framework&#8212;safe, risky, and dead zones&#8212;combined with SQL literacy training for PMs reduces error rates from 32% to 15% and shifts analysts from reactive ticket handling to strategic work.</p><p><strong><a href="https://medium.com/blablacar/beyond-the-dashboard-how-blablacar-pms-use-ai-to-self-serve-data-95ccd33ab1f9">https://medium.com/blablacar/beyond-the-dashboard-how-blablacar-pms-use-ai-to-self-serve-data-95ccd33ab1f9</a></strong></p><div><hr></div><h1>Expedia: Operating Trino at Scale With Trino Gateway</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xi8G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xi8G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic 424w, https://substackcdn.com/image/fetch/$s_!xi8G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic 848w, https://substackcdn.com/image/fetch/$s_!xi8G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic 1272w, https://substackcdn.com/image/fetch/$s_!xi8G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xi8G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic" width="1038" height="583" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:583,&quot;width&quot;:1038,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12041,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xi8G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic 424w, https://substackcdn.com/image/fetch/$s_!xi8G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic 848w, https://substackcdn.com/image/fetch/$s_!xi8G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic 1272w, https://substackcdn.com/image/fetch/$s_!xi8G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf7f51d7-f538-486a-a47f-fa1b65e51922_1038x583.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Managing Trino at scale requires isolating workloads to prevent resource contention across analytical, ETL, and BI queries. Expedia writes about operating Trino Gateway&#8212;a fork of Lyft's Presto Gateway&#8212;as a single-endpoint proxy that routes queries to dedicated clusters using configurable rules. This design eliminates noisy-neighbor failures, supports zero-downtime deployments, and provides real-time visibility into cluster health.</p><p><strong><a href="https://medium.com/expedia-group-tech/operating-trino-at-scale-with-trino-gateway-41824af788de">https://medium.com/expedia-group-tech/operating-trino-at-scale-with-trino-gateway-41824af788de</a></strong></p><div><hr></div><h1>LangChain: How we build evals for Deep Agents</h1><p>Building reliable AI agents requires evals that target specific production behaviors rather than optimizing for aggregate benchmark scores. LangChain's Deep Agents harness defines behavior-first evals sourced from production errors, BFCL, and hand-written unit tests &#8212; then scores agents on correctness and Ideal Trajectory ratios for step and tool-call efficiency. Teams run tagged eval subsets via pytest in GitHub Actions and trace every run in LangSmith to isolate failure modes and control evaluation cost.</p><p><strong><a href="https://blog.langchain.com/how-we-build-evals-for-deep-agents/">https://blog.langchain.com/how-we-build-evals-for-deep-agents/</a></strong></p><div><hr></div><h1>LinkedIn: The LinkedIn Generative AI Application Tech Stack: Personalization with Cognitive Memory Agent</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7gFb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7gFb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic 424w, https://substackcdn.com/image/fetch/$s_!7gFb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic 848w, https://substackcdn.com/image/fetch/$s_!7gFb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic 1272w, https://substackcdn.com/image/fetch/$s_!7gFb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7gFb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic" width="1024" height="571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:571,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:8816,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7gFb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic 424w, https://substackcdn.com/image/fetch/$s_!7gFb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic 848w, https://substackcdn.com/image/fetch/$s_!7gFb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic 1272w, https://substackcdn.com/image/fetch/$s_!7gFb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbf2b6aa4-229c-4de3-af46-8398b3079b18_1024x571.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI agents lose personalization across sessions because they lack structured memory that separates conversational, episodic, semantic, and procedural signals. LinkedIn&#8217;s Cognitive Memory Agent ingests activity traces through streaming and batch pipelines, then uses an LLM-based orchestrator to retrieve and reason across all four memory layers. This architecture enables the Hiring Assistant to auto-populate role requirements and generate recruiter-specific insights from historical hiring activity.</p><p><strong><a href="https://www.linkedin.com/blog/engineering/ai/the-linkedin-generative-ai-application-tech-stack-personalization-with-cognitive-memory-agent">https://www.linkedin.com/blog/engineering/ai/the-linkedin-generative-ai-application-tech-stack-personalization-with-cognitive-memory-agent</a></strong></p><div><hr></div><h1>Ilia Gusev: Change Data Capture: Stop Copying 50M Rows to Move 5K Changes</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M3Z_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M3Z_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic 424w, https://substackcdn.com/image/fetch/$s_!M3Z_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic 848w, https://substackcdn.com/image/fetch/$s_!M3Z_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic 1272w, https://substackcdn.com/image/fetch/$s_!M3Z_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M3Z_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24582,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M3Z_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic 424w, https://substackcdn.com/image/fetch/$s_!M3Z_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic 848w, https://substackcdn.com/image/fetch/$s_!M3Z_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic 1272w, https://substackcdn.com/image/fetch/$s_!M3Z_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82f01620-35ea-4a36-a7c2-30d0ad5f6e3c_1456x813.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Nightly full-table copies introduce fragility and place increasing load on source databases as data volumes scale. The article contrasts timestamp, trigger, and log-based CDC approaches, recommending Debezium with Postgres WAL or MySQL binlog as the production standard for near-real-time replication without impacting OLTP performance. Log-based CDC captures hard deletes, handles DDL changes, and decouples replication throughput from transactional write load.</p><p><strong><a href="https://podostack.com/p/change-data-capture-cdc-intro">https://podostack.com/p/change-data-capture-cdc-intro</a></strong></p><div><hr></div><h1>Micheal Lanham: The Markdown File That Beat a $50M Vector Database</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bpVg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bpVg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic 424w, https://substackcdn.com/image/fetch/$s_!bpVg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic 848w, https://substackcdn.com/image/fetch/$s_!bpVg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic 1272w, https://substackcdn.com/image/fetch/$s_!bpVg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bpVg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24703,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/192563052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bpVg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic 424w, https://substackcdn.com/image/fetch/$s_!bpVg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic 848w, https://substackcdn.com/image/fetch/$s_!bpVg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic 1272w, https://substackcdn.com/image/fetch/$s_!bpVg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb38c0966-c4ad-48b6-a0d6-b58eda873aed_1376x768.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Agentic workflows expose the cost and operational overhead of managed vector databases when used for single-threaded memory and state management. The author shows how Manus, OpenClaw, and Claude Code converge on Markdown files as the primary memory layer, leveraging KV-cache efficiency, filesystem hierarchy for scoped retrieval, and sqlite-vec for lightweight semantic search. This file-first architecture reduces token costs by nearly 10x and defers vector database adoption to scenarios that require multi-user concurrency.</p><p><strong><a href="https://medium.com/@Micheal-Lanham/the-markdown-file-that-beat-a-50m-vector-database-38e1f5113cbe">https://medium.com/@Micheal-Lanham/the-markdown-file-that-beat-a-50m-vector-database-38e1f5113cbe</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #262]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-262</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-262</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 23 Mar 2026 01:31:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_22_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5Wb-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!5Wb-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!5Wb-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!5Wb-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5Wb-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23726,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_22_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191816580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5Wb-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!5Wb-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!5Wb-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!5Wb-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84708f0c-572e-443b-b9c6-432c2fc67219_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>This week: Orchestrating Databricks across multiple workspaces</h1><p>In this hands-on deep dive, you'll learn how to build a cross-workspace control plane for Databricks using Dagster &#8212; connecting multiple workspaces, dbt, and Fivetran into a single observable asset graph with zero code changes to get started.</p><p><strong><a href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_22_26_data_engineering_weekly">Register now</a></strong></p><div><hr></div><h1>Pinterest: Building an MCP Ecosystem at Pinterest</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EeQV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EeQV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic 424w, https://substackcdn.com/image/fetch/$s_!EeQV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic 848w, https://substackcdn.com/image/fetch/$s_!EeQV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic 1272w, https://substackcdn.com/image/fetch/$s_!EeQV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EeQV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic" width="1400" height="824" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:824,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11677,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191816580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EeQV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic 424w, https://substackcdn.com/image/fetch/$s_!EeQV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic 848w, https://substackcdn.com/image/fetch/$s_!EeQV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic 1272w, https://substackcdn.com/image/fetch/$s_!EeQV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F17e84bf7-acff-4f77-a138-5bb677b30f09_1400x824.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Agent tooling at scale requires decentralizing context across domain-specific servers while maintaining security, discoverability, and governance across production systems. Pinterest&#8217;s MCP ecosystem deploys specialized servers (Presto, Spark, Airflow) behind a central registry, routes requests via JWT end-user tokens and SPIFFE mesh identities, and enforces human approval for sensitive actions such as data overwrites. The system handles 66,000+ monthly invocations while saving engineers 7,000 hours monthly, validating decentralized tooling as the production pattern for agentic workflows.</p><p><strong><a href="https://medium.com/pinterest-engineering/building-an-mcp-ecosystem-at-pinterest-d881eb4c16f1">https://medium.com/pinterest-engineering/building-an-mcp-ecosystem-at-pinterest-d881eb4c16f1</a></strong></p><div><hr></div><h1>Julien Simon: Still Missing Critical Pieces</h1><p>Tool standardization protocols face re-fragmentation when architectural constraints&#8212;token overhead, stateless scaling, weak auth&#8212;force enterprises toward custom implementations for production workloads. The author argues that MCP won adoption but lacks enterprise readiness: Cloudflare's native MCP costs 244,000 tokens versus 1,000 in "Code Mode"; sticky routing defeats load balancers; and missing governance leaves security to individual teams. Companies like Perplexity and Cloudflare are abandoning MCP's tool-calling layer in favor of direct APIs and code generation, signaling that production-scale enterprises require deterministic execution patterns that MCP cannot provide.</p><p><strong><a href="https://julsimon.medium.com/still-missing-critical-pieces-7a78077235e5">https://julsimon.medium.com/still-missing-critical-pieces-7a78077235e5</a></strong></p><div><hr></div><h1>Databricks: Breaking the Microbatch Barrier: The Architecture of Apache Spark Real-Time Mode</h1><p>Real-time analytics infrastructure traditionally required separate engines for throughput (Spark) and sub-100ms latency (Flink), leading to duplicated tooling and operational complexity. Apache Spark 4.1's Real-Time Mode eliminates this trade-off by using longer epochs with boundary checkpointing, concurrent map-reduce stages, and non-blocking operators that emit results continuously rather than buffering them. The unified engine handles both massive ETL and low-latency fraud detection while preserving Spark's lineage-based fault tolerance, consolidating the data stack for single-engine architectures.</p><p><strong><a href="https://www.databricks.com/blog/breaking-microbatch-barrier-architecture-apache-spark-real-time-mode">https://www.databricks.com/blog/breaking-microbatch-barrier-architecture-apache-spark-real-time-mode</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_22_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tvJc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!tvJc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!tvJc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!tvJc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tvJc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:25914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_22_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191816580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tvJc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!tvJc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!tvJc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!tvJc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615b85af-451f-47b0-a8d2-6c215f0195d2_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_22_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Etsy: Making Ads Count: Using MMoE and Auxiliary Tasks to Better Connect Buyers &amp; Sellers</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pGk0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pGk0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic 424w, https://substackcdn.com/image/fetch/$s_!pGk0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic 848w, https://substackcdn.com/image/fetch/$s_!pGk0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic 1272w, https://substackcdn.com/image/fetch/$s_!pGk0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pGk0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic" width="346" height="658" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:658,&quot;width&quot;:346,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5716,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191816580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pGk0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic 424w, https://substackcdn.com/image/fetch/$s_!pGk0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic 848w, https://substackcdn.com/image/fetch/$s_!pGk0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic 1272w, https://substackcdn.com/image/fetch/$s_!pGk0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e02470a-a6c5-459b-908c-0d17a415dbab_346x658.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Multi-objective ranking in marketplaces faces metric conflicts&#8212;optimizing for clicks often degrades conversions&#8212;thereby requiring task-specific expert routing while managing data sparsity across event hierarchies. Etsy's MMoE architecture routes CTR and purchase prediction tasks through specialized experts with gated selection, then bridges sparse purchase signals using auxiliary add-to-cart tasks that correlate strongly with intent. The system achieved a 3.5% lift in purchase AUC and a 1% lift in click AUC while reducing inference cost through model pruning, enabling more accurate auto-bidding for sellers.</p><p><strong><a href="https://www.etsy.com/codeascraft/making-ads-count-using-mmoe-and-auxiliary-tasks-to-better-connect-buyers--sellers">https://www.etsy.com/codeascraft/making-ads-count-using-mmoe-and-auxiliary-tasks-to-better-connect-buyers--sellers</a></strong></p><div><hr></div><h1>Meta: Ranking Engineer Agent (REA): The Autonomous AI Agent Accelerating Meta&#8217;s Ads Ranking Innovation</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jBFq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jBFq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic 424w, https://substackcdn.com/image/fetch/$s_!jBFq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic 848w, https://substackcdn.com/image/fetch/$s_!jBFq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic 1272w, https://substackcdn.com/image/fetch/$s_!jBFq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jBFq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21461,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191816580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jBFq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic 424w, https://substackcdn.com/image/fetch/$s_!jBFq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic 848w, https://substackcdn.com/image/fetch/$s_!jBFq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic 1272w, https://substackcdn.com/image/fetch/$s_!jBFq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F338f85d1-de4a-4734-9722-fd66e1033277_1463x720.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>ML model iteration at production scale requires balancing hypothesis generation, resource constraints, and infrastructure resilience across multi-day experiment cycles. Meta&#8217;s Ranking Engineer Agent combines a Dual-Source Hypothesis Engine (historical experiments + novel ML research proposals) with autonomous debugging and cost-aware planning to execute long-horizon ranking workflows without human supervision. The system doubled average model accuracy across six models while enabling three engineers to maintain eight production models&#8212;a 5x productivity gain over traditional team structures.</p><p><strong><a href="https://engineering.fb.com/2026/03/17/developer-tools/ranking-engineer-agent-rea-autonomous-ai-system-accelerating-meta-ads-ranking-innovation/">https://engineering.fb.com/2026/03/17/developer-tools/ranking-engineer-agent-rea-autonomous-ai-system-accelerating-meta-ads-ranking-innovation/</a></strong></p><div><hr></div><h1>Rahul Garg: Context Anchoring</h1><p>AI-assisted development degrades over long sessions as models lose reasoning context ("why") despite retaining technical choices ("what"), trapping developers in single conversations to avoid context loss. The author proposes Context Anchoring&#8212;externalizing decision rationale, rejected alternatives, and constraints into lightweight Feature Documents outside the chat interface. Teams adopting external anchoring achieve warm starts in seconds, reduce token costs by 98%, enable multi-developer alignment, and validate logic through forced documentation, eliminating session anxiety as a design anti-pattern.</p><p><strong><a href="https://martinfowler.com/articles/reduce-friction-ai/context-anchoring.html">https://martinfowler.com/articles/reduce-friction-ai/context-anchoring.html</a></strong></p><div><hr></div><h1>Dropbox: How we optimized Dash&#8217;s relevance judge with DSPy</h1><p>Relevance scoring at scale faces a model-cost-quality trade-off: premium models like o3 are accurate but expensive, while cheaper open-weight models degrade without model-specific prompt tuning. Dropbox's DSPy-based optimization automates prompt refinement against human judgments using NMSE metrics and GEPA feedback loops, reducing manual tuning from weeks to 1&#8211;2 days. The system cut relevance error by 45%, eliminated JSON formatting failures by 97%, and enabled 10&#8211;100x data scaling by shifting to cheaper models while maintaining quality through systematic prompt compilation.</p><p><strong><a href="https://dropbox.tech/machine-learning/optimizing-dropbox-dash-relevance-judge-with-dspy">https://dropbox.tech/machine-learning/optimizing-dropbox-dash-relevance-judge-with-dspy</a></strong></p><div><hr></div><h1>Zalando: Search Quality Assurance with AI as a Judge</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mPcT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mPcT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic 424w, https://substackcdn.com/image/fetch/$s_!mPcT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic 848w, https://substackcdn.com/image/fetch/$s_!mPcT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic 1272w, https://substackcdn.com/image/fetch/$s_!mPcT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mPcT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic" width="1456" height="944" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:944,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22409,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191816580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mPcT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic 424w, https://substackcdn.com/image/fetch/$s_!mPcT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic 848w, https://substackcdn.com/image/fetch/$s_!mPcT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic 1272w, https://substackcdn.com/image/fetch/$s_!mPcT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3c2467-a469-46ab-8b86-d517403946cc_1500x973.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Search quality assurance in a new domain often lacks historical user data, forcing teams to rely on manual testing and reactive fixes post-launch. Zalando's framework automates evaluation by generating NER-clustered queries translated into target languages, then routes the results through GPT-4o as a multimodal judge that assesses product metadata and images against a 0&#8211;4 relevance scale. The system evaluates 1,500 search segments (37,500 results) in 3&#8211;5 hours for $250, enabling proactive root-cause identification across languages and ensuring Day 1 quality without local market expertise.</p><p><strong><a href="https://engineering.zalando.com/posts/2026/03/search-quality-assurance-with-llm-judge.html">https://engineering.zalando.com/posts/2026/03/search-quality-assurance-with-llm-judge.html</a></strong></p><div><hr></div><h1>Andrey Novitskiy: Volga - A Rust Rewrite of a Real-Time AI/ML Data Engine (DataFusion, Arrow, SlateDB) with a Chronon + OpenMLDB&#8211;Style Architecture</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6UyA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6UyA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic 424w, https://substackcdn.com/image/fetch/$s_!6UyA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic 848w, https://substackcdn.com/image/fetch/$s_!6UyA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic 1272w, https://substackcdn.com/image/fetch/$s_!6UyA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6UyA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic" width="1456" height="1069" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/feb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1069,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26732,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191816580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6UyA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic 424w, https://substackcdn.com/image/fetch/$s_!6UyA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic 848w, https://substackcdn.com/image/fetch/$s_!6UyA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic 1272w, https://substackcdn.com/image/fetch/$s_!6UyA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0b8f8-9c48-4fa9-b8f0-6cd21d900048_1456x1069.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Real-time ML feature computation requires unified streaming-batch execution with low-latency serving, forcing platforms to choose between specialized engines (Flink for latency, Spark for batch, Redis for serving). Volga's Rust implementation pairs DataFusion SQL execution with SlateDB (an embedded LSM on object storage) and Request Mode, embedding serving logic directly into operator state to eliminate external cache round trips. The system handles month-year windows via tiling, includes native ML aggregations (top-k, categorical sums), and achieves compute-storage separation by consolidating the feature pipeline, batch training, and real-time serving into a single Rust binary.</p><p><strong><a href="https://volgaai.substack.com/p/volga-a-rust-rewrite-of-a-real-time">https://volgaai.substack.com/p/volga-a-rust-rewrite-of-a-real-time</a></strong></p><div><hr></div><h1>Max Halford: Lower your warehouse costs via DuckDB transpilation</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k0yk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k0yk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic 424w, https://substackcdn.com/image/fetch/$s_!k0yk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic 848w, https://substackcdn.com/image/fetch/$s_!k0yk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic 1272w, https://substackcdn.com/image/fetch/$s_!k0yk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k0yk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic" width="1456" height="631" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:631,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68792,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191816580?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k0yk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic 424w, https://substackcdn.com/image/fetch/$s_!k0yk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic 848w, https://substackcdn.com/image/fetch/$s_!k0yk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic 1272w, https://substackcdn.com/image/fetch/$s_!k0yk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F992e1193-d2a8-4957-8af6-7c820a7acb1e_2151x932.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Cloud warehouse compute costs escalate during development and testing cycles, incentivizing hybrid approaches that separate cheap storage from expensive query execution. Max Halford&#8217;s &#8220;Quack Mode&#8221; transpiles warehouse SQL (BigQuery &#8594; DuckDB) using SQLGlot, pulls only upstream dependencies into local DuckDB instances, and executes transformations at near-zero cost, optionally pushing results back. The pattern dramatically reduces development compute spend while maintaining warehouse portability, though pulling tables &gt;100GB remains a bottleneck until zero-copy solutions like Iceberg are available.</p><p><strong><a href="https://maxhalford.github.io/blog/warehouse-cost-reduction-quack-mode/">https://maxhalford.github.io/blog/warehouse-cost-reduction-quack-mode/</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #261]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-261</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-261</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 16 Mar 2026 00:49:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_15_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lnqf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Lnqf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Lnqf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Lnqf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lnqf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24982,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_15_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191078036?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Lnqf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Lnqf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Lnqf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Lnqf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbfb4b54f-52d4-4bdd-a7ee-b8cfed5fee04_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How to Orchestrate Databricks Across Multiple Workspaces</h1><p>As Databricks deployments scale, a familiar pattern emerges: multiple workspaces, multiple teams, and no reliable way to manage the dependencies between them.<br><br>In this hands-on deep dive, we'll show you how to build a cross-workspace control plane using Dagster on top of your existing Databricks environment. Demo-heavy and practitioner-focused, you'll leave with working patterns you can apply to your own platform the same day.</p><p><strong><a href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_15_26_data_engineering_weekly">Register now</a></strong></p><div><hr></div><h1>Editor&#8217;s Note: Introducing Data Engineering After AI Podcast Series</h1><p>Lately, I&#8217;ve been thinking a lot about the intersection of data architecture and AI. To dig deeper into this, I&#8217;m launching a <strong>new podcast series</strong> called <strong><a href="https://www.dataengineeringweekly.com/p/data-engineering-after-ai">Data Engineering After AI.</a></strong></p><p>I&#8217;m looking for guests who are in the trenches. If you have strong opinions on where the industry is heading, or if you are actively building solutions in this space (either in-house or as a product), let&#8217;s talk.</p><p><strong>Please note:</strong> my goal is to foster an authentic discussion about how AI is reshaping data engineering from the ground up. This isn&#8217;t a space for promotional product pitches, and I want to keep the conversation strictly focused on the technology, the challenges, and the architectural shifts.</p><p>If you are passionate about the future of our field and want to share your insights, DM me on <strong><a href="https://www.linkedin.com/in/ananthdurai/">LinkedIn</a></strong>.</p><div><hr></div><h1>Joseph M. Hellerstein: AI and the Mixed-Consistency Future</h1><p>In my recent article, <strong><a href="https://www.dataengineeringweekly.com/p/etl-is-dead">ETL is dead</a></strong>, I projected that the data modeling techniques that got us here may not be sufficient for the AI era. The consistency model is one of the biggest gaps in the emerging file-based system design around the AI Agent. We have seen this shift from the Hadoop file system to the Lakehouse model. The author suggests that we may be entering the Mixed-Consistency future. </p><p><strong><a href="https://jhellerstein.github.io/blog/ai-mixed-consistency/">https://jhellerstein.github.io/blog/ai-mixed-consistency/</a></strong></p><div><hr></div><h1>Milan Mosny: Ontology, Taxonomy, Data Model, Context Graph &amp; Friends</h1><p>Context Engineering is the hot topic in the industry. I found the author did an excellent recap on ontology, taxonomy, data model &amp; context graph. As the famous saying goes, it is all data engineering. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YbOb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YbOb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic 424w, https://substackcdn.com/image/fetch/$s_!YbOb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic 848w, https://substackcdn.com/image/fetch/$s_!YbOb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic 1272w, https://substackcdn.com/image/fetch/$s_!YbOb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YbOb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic" width="590" height="202" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:202,&quot;width&quot;:590,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5085,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191078036?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YbOb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic 424w, https://substackcdn.com/image/fetch/$s_!YbOb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic 848w, https://substackcdn.com/image/fetch/$s_!YbOb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic 1272w, https://substackcdn.com/image/fetch/$s_!YbOb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78af57ca-d7bc-4843-a7b4-26f1009d92ae_590x202.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong><a href="https://medium.com/response42/ontology-taxonomy-data-model-context-graph-friends-56a605e14355">https://medium.com/response42/ontology-taxonomy-data-model-context-graph-friends-56a605e14355</a></strong></p><div><hr></div><h1>Jason Cui &amp; Jennifer Li: Your Data Agents Need Context</h1><p>Contextual grounding&#8212;standardized terminology, data lineage, operational semantics&#8212;determines whether natural language agents answer analytics questions reliably. The authors propose a &#8220;Context Layer&#8221; combining LLM-powered metadata construction with human refinement to map business knowledge onto warehouse schemas. Organizations adopting context-aware agent architectures unlock self-serve analytics without brittleness, enabling agents to reason consistently across disparate schemas.</p><p><strong><a href="https://www.a16z.news/p/your-data-agents-need-context">https://www.a16z.news/p/your-data-agents-need-context</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_15_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IM5z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!IM5z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!IM5z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!IM5z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IM5z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:25914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_15_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191078036?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IM5z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!IM5z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!IM5z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!IM5z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a3722ad-fd1c-4213-8232-bb4fff037001_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_15_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Robin Moffatt: Claude Code isn&#8217;t going to replace data engineers (yet)</h1><p>We see some degree of success with the Claude Code in software engineering. Is it ready for the prime data engineering? The author noted the gap in trust &amp; accuracy, silent data loss, non-determinism, technical flaws, and maintenance. There is a data engineering gap in building an efficient sandbox environment to bridge it, which is a must for brownfield projects. </p><p><strong><a href="https://rmoff.net/2026/03/11/claude-code-isnt-going-to-replace-data-engineers-yet/">https://rmoff.net/2026/03/11/claude-code-isnt-going-to-replace-data-engineers-yet/</a></strong></p><div><hr></div><h1>Snap: Agent Format: A Declarative Standard for AI Agents</h1><p>Speed and Correctness in execution always have their own trade-off. Snap writes about how different teams adopted different AI frameworks to move fast and focus on standard interface design to make everything work together. I believe as long as the pendulum swings between speed and efficiency, the software engineering is safe. We will always build the next best abstraction.</p><p><strong><a href="https://eng.snap.com/agent-format">https://eng.snap.com/agent-format  </a></strong></p><div><hr></div><h1>LinkedIn: Engineering the next generation of LinkedIn&#8217;s Feed</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V6EO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V6EO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic 424w, https://substackcdn.com/image/fetch/$s_!V6EO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic 848w, https://substackcdn.com/image/fetch/$s_!V6EO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic 1272w, https://substackcdn.com/image/fetch/$s_!V6EO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V6EO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic" width="459" height="510" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:510,&quot;width&quot;:459,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7915,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191078036?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V6EO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic 424w, https://substackcdn.com/image/fetch/$s_!V6EO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic 848w, https://substackcdn.com/image/fetch/$s_!V6EO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic 1272w, https://substackcdn.com/image/fetch/$s_!V6EO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05ec2daf-48a5-4f61-bc52-3180ecd05256_459x510.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Feed personalization at a massive scale requires unifying disparate retrieval signals into semantic representations while maintaining sub-second latency across billions of users. LinkedIn's architecture consolidates keyword matching, collaborative filtering, and engagement signals into a dual-encoder LLM retrieval paired with a Generative Recommender transformer that sequences 1,000+ historical interactions to capture professional trajectories. Custom infrastructure&#8212;Flash Attention variants, GPU-optimized data loaders, decoupled nearline pipelines&#8212;enables semantic ranking at sub-second latency for 1.3 billion members while reducing training memory by 37%.</p><p><strong><a href="https://www.linkedin.com/blog/engineering/feed/engineering-the-next-generation-of-linkedins-feed">https://www.linkedin.com/blog/engineering/feed/engineering-the-next-generation-of-linkedins-feed</a></strong></p><div><hr></div><h1>Spotify: Inside the Archive: The Tech Behind Your 2025 Wrapped Highlights</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WeU7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WeU7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic 424w, https://substackcdn.com/image/fetch/$s_!WeU7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic 848w, https://substackcdn.com/image/fetch/$s_!WeU7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic 1272w, https://substackcdn.com/image/fetch/$s_!WeU7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WeU7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17157,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191078036?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WeU7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic 424w, https://substackcdn.com/image/fetch/$s_!WeU7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic 848w, https://substackcdn.com/image/fetch/$s_!WeU7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic 1272w, https://substackcdn.com/image/fetch/$s_!WeU7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F82b0f593-056b-4088-a8de-ee34c04bb6ef_1920x1080.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Generating personalized narratives at a billion-scale requires balancing creative consistency, latency constraints, and data fidelity without requiring human review. Spotify's Wrapped Archive distills frontier LLM outputs into smaller production models via DPO, grounds narratives in heuristic-ranked "remarkable days" from distributed pipelines, and uses layered prompts to enforce tone while preventing hallucinations. Column-oriented storage with per-day qualifiers, pre-scaled compute, and automated Judge-model sampling of 165,000 reports enables 1.4 billion unique narratives at launch latency while catching systemic failures such as timezone bugs.</p><p><strong><a href="https://engineering.atspotify.com/2026/3/inside-the-archive-2025-wrapped">https://engineering.atspotify.com/2026/3/inside-the-archive-2025-wrapped</a></strong></p><div><hr></div><h1>LinkedIn: Driving data enhancement &amp; recruitment success with LinkedIn&#8217;s unified integrations</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iXUn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iXUn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic 424w, https://substackcdn.com/image/fetch/$s_!iXUn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic 848w, https://substackcdn.com/image/fetch/$s_!iXUn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic 1272w, https://substackcdn.com/image/fetch/$s_!iXUn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iXUn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic" width="1200" height="465" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:465,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9979,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191078036?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!iXUn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic 424w, https://substackcdn.com/image/fetch/$s_!iXUn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic 848w, https://substackcdn.com/image/fetch/$s_!iXUn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic 1272w, https://substackcdn.com/image/fetch/$s_!iXUn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F484e89d1-98e6-4217-8e01-22b4d4f8990f_1200x465.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Recruitment data fragmentation&#8212;disparate ATS schemas, semantic conflicts, and partner integration overhead&#8212;blocks AI agents from reliably reasoning across hiring pipelines. LinkedIn's unified platform standardizes partner data into canonical schemas via hybrid push/pull models (BuildIn for speed, BuildOut with Temporal orchestration for reliability), assigns stable Integration IDs to decouple identity, and reconciles multi-source conflicts into single-truth serving layers. The system cut onboarding from 12 months to 4, expanded job field coverage 1.8x, and dropped resume gaps below 10%, enabling agents to reason and act consistently across enterprise hiring systems.</p><p><strong><a href="https://www.linkedin.com/blog/engineering/talent/driving-data-enhancement-and-recruitment-success-with-linkedins-unified-integrations">https://www.linkedin.com/blog/engineering/talent/driving-data-enhancement-and-recruitment-success-with-linkedins-unified-integrations</a></strong></p><div><hr></div><h1>Uber: Transforming Ads Personalization with Sequential Modeling and Hetero-MMoE at Uber</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lZPM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lZPM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic 424w, https://substackcdn.com/image/fetch/$s_!lZPM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic 848w, https://substackcdn.com/image/fetch/$s_!lZPM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic 1272w, https://substackcdn.com/image/fetch/$s_!lZPM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lZPM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic" width="1456" height="707" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:707,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12616,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191078036?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lZPM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic 424w, https://substackcdn.com/image/fetch/$s_!lZPM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic 848w, https://substackcdn.com/image/fetch/$s_!lZPM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic 1272w, https://substackcdn.com/image/fetch/$s_!lZPM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdf070bf-2430-4223-b746-637eb1edc9e7_1536x746.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Ads ranking at scale requires capturing sequential user intent over long behavioral histories while simultaneously optimizing competing objectives such as clicks and conversions. Uber's system pairs target-aware transformers with Multi-Head Latent Attention (reducing sequence complexity from O(N&#178;) to O(N&#215;L)) to compress engagement histories, then routes the compressed signals through Hetero-MMoE&#8212;blending DCN and CIN experts to capture low- to high-order feature interactions across multimodal inputs. Online experiments yielded +0.93% AUC on predicted CTR and +0.66% AUC on predicted click-to-order, validating sequential modeling at the ranking scale.</p><p><strong><a href="https://www.uber.com/en-EG/blog/transforming-ads-personalization/">https://www.uber.com/en-EG/blog/transforming-ads-personalization/</a></strong></p><div><hr></div><h1>Databricks: LogSentinel: How Databricks uses Databricks for LLM-Powered PII Detection and Governance</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FI4x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FI4x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic 424w, https://substackcdn.com/image/fetch/$s_!FI4x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic 848w, https://substackcdn.com/image/fetch/$s_!FI4x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic 1272w, https://substackcdn.com/image/fetch/$s_!FI4x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FI4x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19202,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/191078036?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FI4x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic 424w, https://substackcdn.com/image/fetch/$s_!FI4x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic 848w, https://substackcdn.com/image/fetch/$s_!FI4x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic 1272w, https://substackcdn.com/image/fetch/$s_!FI4x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F294b3671-32e1-4898-813e-de0295ae45f2_1999x1250.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>PII discovery and compliance monitoring at a data warehouse scale requires automating label classification across schema evolution without manual audit cycles. Databricks&#8217; LogSentinel orchestrates multiple LLM &#8220;experts&#8221; in parallel&#8212;augmented with Vector Search context and AI-generated column comments&#8212;to classify data across 100+ granular, hierarchical, and residency labels, selecting predictions by confidence voting. The system achieves 92% precision and 95% recall while reducing manual review cycles from weeks to hours, enabling real-time governance as schemas drift.</p><p><strong><a href="https://www.databricks.com/blog/logsentinel-how-databricks-uses-databricks-llm-powered-pii-detection-and-governance">https://www.databricks.com/blog/logsentinel-how-databricks-uses-databricks-llm-powered-pii-detection-and-governance</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[ETL is Dead]]></title><description><![CDATA[Why the shift from human-operated to agent-operated data warehouses demands a new architecture]]></description><link>https://www.dataengineeringweekly.com/p/etl-is-dead</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/etl-is-dead</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Wed, 11 Mar 2026 14:42:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BI6v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>More ETL pipelines will run in 2027 than in any year in history. AI will generate more extraction jobs, more transformation logic, and more loading routines than any team of data engineers could write by hand. The volume of ETL will explode.</p><p>And ETL is still dead.</p><p>Not dead the way Latin is dead &#8212; no one speaks it. Dead, the way landlines are dead &#8212; they still work, millions exist, but nobody builds their communication strategy around one. ETL is dead as the defining work of data engineering. Dead as the thing we hire for, build careers around, and organize teams to do. The pipelines keep running. The professional identity built around them does not survive.</p><h1>The Warehouse Was Always a Metaphor. Now the Metaphor Is Breaking.</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BI6v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BI6v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 424w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 848w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 1272w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BI6v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic" width="1456" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26399,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BI6v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 424w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 848w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 1272w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We literally called it a data <em>warehouse</em>. And that wasn&#8217;t just naming &#8212; we replicated the entire physical warehouse operating model into the digital world. Racks became tables. Inventory management became catalogs. Forklifts became ETL pipelines. Floor workers became data engineers. Shift supervisors became analytics leads.</p><p>Every technique we built &#8212; <strong><a href="https://en.wikipedia.org/wiki/Dimensional_modeling">star schemas</a></strong>, slowly changing dimensions, <strong><a href="https://www.databricks.com/glossary/medallion-architecture">medallion architectures</a></strong>, conformed dimensions &#8212; served the same purpose as aisle markers and shelf labels in a physical warehouse: help a <em>human</em> walk in, find what they need, and carry it out.</p><p>Data modeling organizes information so humans can discover it. Data catalogs provided wayfinding to help humans navigate them. The medallion architecture created a pick-pack-ship assembly line where humans inspected and validated data at each station. Naming conventions &#8212; fact_orders, dim_customers &#8212; acted as signage so humans could read the shelves at a glance.</p><p>Every design decision is optimized for human cognition. And then the operator changed.</p><h1>What Happened When Robots Entered the Physical Warehouse</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YGa4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YGa4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 424w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 848w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 1272w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YGa4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic" width="1456" height="562" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24175,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YGa4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 424w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 848w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 1272w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When Amazon deployed <strong><a href="https://en.wikipedia.org/wiki/Amazon_Robotics">Kiva robots</a></strong>, they didn&#8217;t replace human tasks one-for-one. They <strong><a href="https://spectrum.ieee.org/amazon-ai-robotics">redesigned the entire warehouse</a></strong> around a different operator.</p><p>Physical warehouses built for humans had wide aisles because humans need space to walk. They grouped items logically because humans need to remember where things are. They placed high-demand products at eye level because humans have ergonomic constraints. They posted signage everywhere because humans need wayfinding.</p><p>Robotic warehouses <strong><a href="https://www.aboutamazon.com/news/operations/amazon-robotics-robots-fulfillment-center">threw all of that out</a></strong>. Aisles shrank because robots don&#8217;t need shoulder width. Shelving went floor-to-ceiling because robots don&#8217;t have ergonomic limits. Logical grouping became unnecessary because robots navigate by coordinates, not memory. Signage disappeared because robots don&#8217;t read signs &#8212; they read instructions.</p><p>But the biggest gains weren&#8217;t physical. They were <em>cognitive</em>. Human warehouse workers carried an enormous cognitive load &#8212; remembering locations, making routing decisions, prioritizing picks, and mentally handling exceptions. Robots eliminated that cognitive burden entirely. The warehouse didn&#8217;t just move faster. It became a fundamentally different system that could handle complexity no human floor operation could manage.</p><h1>The Data Warehouse Is Still Designed for Human Forklift Operators</h1><p>Now look at our data warehouse through this lens.</p><p><strong><a href="https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/">Star schemas and dimensional modeling</a></strong> exist so a human analyst can visualize how tables relate. A human needs to see the star &#8212; the fact table at the center, dimensions radiating outward. An agent doesn&#8217;t need a star. It needs a validated semantic definition of what each entity means and how entities connect.</p><p>Data catalogs are digital signage. We built them because humans need to browse and discover what&#8217;s in the warehouse. An agent doesn&#8217;t browse a catalog the way a human walks an aisle. It queries for a validated meaning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Ube!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Ube!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 424w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 848w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 1272w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Ube!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic" width="1456" height="464" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Ube!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 424w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 848w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 1272w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <strong><a href="https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion">medallion architecture</a></strong> &#8212; Bronze to Silver to Gold &#8212; is an assembly line designed for human inspection at each station. Raw data lands, gets progressively cleaned, and arrives ready for consumption. Each station assumes a human will inspect, validate, and pass the data forward. And at each handoff, context erodes &#8212; the original meaning collapses a little more, like a game of telephone played silently in the pipeline.</p><p>We optimized every layer of the data warehouse for human cognitive constraints. And just like the physical warehouse, those very optimizations become limitations when the operator changes.</p><h1>Where the Analogy Holds &#8212; and Where It Breaks</h1><p>I want to be precise about this, because imprecise analogies are how our industry ends up with decade-long hype cycles built on half-truths.</p><p>The analogy holds powerfully for <em>navigation and discovery</em>. Physical warehouses organized shelves for human wayfinding. Data warehouses organize tables for human querying. Robots don&#8217;t need aisle signs. Agents don&#8217;t need star schemas to find data. That part maps cleanly.</p><p>But here&#8217;s where it breaks: physical goods don&#8217;t change meaning based on how you store them. A box of shoes is a box of shoes, whether it sits on shelf A3 or shelf Z9. Data is different. How you structure data shapes what questions you can ask of it. A normalized schema enables different analytical patterns than a denormalized one. A slowly changing dimension preserves the temporal context that a snapshot table destroys.</p><p>Structure still matters for agent-operated data. It just serves a different purpose. Instead of organizing for human navigation &#8212; &#8220;how do I find the data?&#8221; &#8212; you organize for agent operation &#8212; &#8220;what data and context does this agent need for this task?&#8221; Think about how AI tools work with a scoped working folder. You don&#8217;t reorganize your filesystem into an agent-friendly layout. You give the agent a well-scoped boundary, and it operates within it. The structure shifts from navigational to operational &#8212; from shelf labels to access boundaries.</p><h1>The Thinking Survives. The Format May Not</h1><p>I took the last class Ralph Kimball taught before his retirement. I remember the vivid conversation around HBase (which was popular at the time) and the notion of versioning to handle slowly changing dimensions. I&#8217;ve internalized dimensional modeling deeply enough to know which parts are permanent and which parts are artifacts of their era.</p><p>Kimbal didn&#8217;t start the training with the star schema and slowly changing dimensions. Kimball&#8217;s <strong><a href="https://www.kimballgroup.com/wp-content/uploads/2013/08/2013.09-Kimball-Dimensional-Modeling-Techniques11.pdf">dimensional modeling process</a></strong> starts with two steps: <em><strong>identify the business process and select the grain</strong></em>. These steps ask the most fundamental questions in data engineering &#8212; what does the business actually do, and at what level of detail does it matter? Only after answering those do you design the dimensions, the facts, and the star schema.</p><p>Steps one and two are context architecture. They always were. Identifying the business process means understanding the semantic reality of what the organization does. Selecting the grain means choosing the level of meaning that matters. That thinking is more relevant today than it was in <strong><a href="https://www.wiley.com/en-us/The+Data+Warehouse+Toolkit:+The+Definitive+Guide+to+Dimensional+Modeling,+3rd+Edition-p-9781118530801">1996</a></strong>.</p><p>Steps three and four &#8212; the star schema, the dimension tables, the fact tables &#8212; were a rendering choice. They were the best output format for the consumer of that era: a human analyst writing SQL against a relational database. The star schema serialized business understanding into a structure that humans could query using the available tools.</p><p><em><strong>The consumer has changed or is changing.</strong></em> The rendering should too. When the consumer is an AI agent, the same analytical thinking about business processes and grain produces a Context Store entry &#8212; a validated, versioned, queryable semantic definition &#8212; not a fact table. The thinking survives. The format may not.</p><p>Dismissing dimensional modeling entirely would be ignorant. Clinging to its output format when the consumer has fundamentally changed would be equally so.</p><h1>The Pendulum</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dIWJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dIWJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 424w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 848w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 1272w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic" width="1456" height="836" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:836,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31547,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dIWJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 424w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 848w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 1272w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every era of data architecture has tried to solve the same tension: semantic precision versus operational flexibility.</p><p>The relational era chose precision. ERDs, primary keys, foreign keys, referential integrity, constraints &#8212; the schema <em>was</em> the semantic contract. <strong><a href="https://en.wikipedia.org/wiki/Bill_Inmon">Bill Inmon&#8217;s Corporate Information Factory</a></strong> formalized this into an enterprise architecture. It worked. It encoded business meaning directly into the physical structure. But it was rigid. I remember interviewing at a company in the pre-Hadoop era and asking what their current priority was. The interviewer told me they were working on implementing a schema change in a day rather than a month. That was the state of the art &#8212; a month to add a column, because the semantic contracts were so tightly welded to the physical structure that touching one meant touching everything.</p><p><strong><a href="https://www.databricks.com/discover/data-lakes/history">Hadoop&#8217;s</a></strong> answer was brute force. Sheer machine power, schema-on-read, commodity hardware &#8212; throw everything in and figure it out later. It broke the operational rigidity overnight. And it also broke every semantic contract the relational era had built. We traded meaning for speed and went too far. The data lake became a <strong><a href="https://cacm.acm.org/blogcacm/why-the-data-lake-is-really-a-data-swamp/">data swamp</a></strong> because nobody could remember what anything meant &#8212; the constraints that encoded that meaning were gone.</p><p>The lakehouse tried to find a middle ground. <strong><a href="https://iceberg.apache.org/">Iceberg</a>, <a href="https://delta.io/">Delta</a>, <a href="https://hudi.apache.org/">Hudi</a></strong> &#8212; the flexibility of the lake with some structure of the warehouse. Better. But the semantic layer remained an afterthought.</p><blockquote><p><em><strong>catalogs, documentation, and governance overlays that nobody maintained because nobody&#8217;s career depended on them being right.</strong></em> </p></blockquote><p>Even recent efforts like Snowflake&#8217;s <strong><a href="https://www.snowflake.com/en/blog/open-semantic-interchanges-specs-finalized/">Open Semantic Interchange</a></strong> initiative acknowledge the gap &#8212; the industry is only now trying to standardize how semantic meaning travels between tools.</p><p>Each swing of the pendulum traded one problem for another. Rigidity for meaninglessness. Meaninglessness for a partial structure. What none of them achieved was <em>decoupling</em> &#8212; semantic precision that doesn&#8217;t require physical rigidity. Context that travels alongside the data but isn&#8217;t welded to the table structure. Change the schema in seconds. The context updates through the Contextualize pipeline. The meaning stays current without the rigidity.</p><p>That decoupling is what ECL provides. It&#8217;s the first architecture that doesn&#8217;t force you to choose between knowing what your data means and being able to change it.</p><h1>The Graveyard of Good Intentions</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A88K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A88K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 424w, https://substackcdn.com/image/fetch/$s_!A88K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 848w, https://substackcdn.com/image/fetch/$s_!A88K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 1272w, https://substackcdn.com/image/fetch/$s_!A88K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A88K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic" width="1456" height="483" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:483,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19421,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A88K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 424w, https://substackcdn.com/image/fetch/$s_!A88K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 848w, https://substackcdn.com/image/fetch/$s_!A88K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 1272w, https://substackcdn.com/image/fetch/$s_!A88K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I know what the skeptics are thinking, because I&#8217;ve thought it myself: we&#8217;ve heard this before.</p><p>Bill Inmon literally wrote the book on this in 2007 &#8212; <strong><a href="https://www.goodreads.com/en/book/show/1982171">Business Metadata: Capturing Enterprise Knowledge</a> </strong>&#8212; which covers semantics, ontologies, business rules, and the capture of tacit knowledge. He laid out a complete methodology for capturing it. The methodology was sound. The economics weren&#8217;t there yet.</p><p>Business glossaries in the 2000s promised to capture institutional knowledge. They became static documents that nobody updated. Semantic layers in the 2010s promised a unified layer of meaning. They became another piece of middleware to maintain. Data catalogs promised discoverability and governance, but soon <strong><a href="https://www.dataengineeringweekly.com/p/data-catalog-a-broken-promise">proved to be useless</a></strong>. Many became expensive shelfware. Enterprise knowledge graphs <strong><a href="https://www.cutter.com/article/knowledge-graph-implementation-costs-obstacles">promised connected meaning</a></strong>. Most never made it past the proof-of-concept stage.</p><p>Every generation of data practitioners has pointed at the same north star: capture business meaning as a first-class artifact. Every generation has underestimated the organizational gravity that pulls teams back to &#8220;just get the data there, and we&#8217;ll figure out what it means later.&#8221;</p><blockquote><p><em><strong>So what makes this time structurally different? One thing: the consumer changed from forgiving to unforgiving.</strong></em></p></blockquote><p>When the consumer was a human analyst, missing context was inconvenient. The analyst would Slack a colleague, read the dbt code, ask in standup, and check the wiki. Humans are remarkably good at filling semantic gaps through social channels. Bad metadata produced frustrated analysts, not system failures.</p><p>When the consumer is an AI agent, missing context produces systematic errors at scale. The agent doesn&#8217;t Slack anyone. It doesn&#8217;t read tribal knowledge. It sees a column called rev_adj, makes its best inference, and acts &#8212; confidently, consistently, and potentially wrong across every downstream decision. Bad context doesn&#8217;t produce frustration. It produces hallucination at an enterprise scale.</p><p>For the first time, the cost of missing context exceeds the cost of maintaining it. That economic inversion is what none of the previous attempts had. Business glossaries failed because humans bore the cost of maintaining them, while the benefit was diffuse. The Context Store succeeds or fails based on whether agents produce reliable results &#8212; and that feedback loop is immediate, measurable, and impossible to ignore.</p><p>The graveyard is real. But the economics changed.</p><h1>What Replaces It</h1><p>ETL asked: Did the data land? ECL asks: Can the data be trusted? I introduced the <strong><a href="https://www.dataengineeringweekly.com/p/data-engineering-after-ai">ECL framework</a></strong> in my earlier article on data engineering after AI.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jOKr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jOKr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 424w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 848w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 1272w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jOKr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic" width="1456" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21454,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jOKr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 424w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 848w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 1272w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Extract remains. Data still moves from source systems to analytical environments. That work still requires engineering judgment about reliability, latency, and failure modes. AI handles more of the mechanical construction. Humans make the architectural decisions.</p><p>Contextualize is the new center of gravity. A dedicated, agentic pipeline that builds and maintains a living store of semantic context. It isn&#8217;t documentation. It isn&#8217;t a catalog. It&#8217;s an engineering artifact with its own trigger model, validation layer, and storage &#8212; the Context Store.</p><p>The Context Store holds two types of objects. Context objects capture long-lived semantic definitions &#8212; what &#8220;revenue&#8221; means, who validated that definition, when, and at what confidence level. These compounds increase in value over time. Decision objects capture what agents produce when they act on context &#8212; which definitions they used, what they inferred, and what they recommended. These create the audit trail.</p><p>Link connects entities across the data landscape &#8212; and emerging standards like <strong><a href="https://www.anthropic.com/news/model-context-protocol">Model Context Protocol (MCP)</a></strong> are starting to standardize how agents access data without moving it. Not just table joins &#8212; semantic relationships between business entities across systems. A customer in CRM is linked to a user in your product, linked to a session in your support tool. Whether you implement that as a graph, a mapping table, or a markdown file matters less than whether the linkage is validated and the semantic relationship is explicit.</p><p>And because data is inherently social in nature, you don&#8217;t build this all at once. You start with one business flow. One critical table. Early bind where you control the data and can hold producers accountable for meaning. Late bind where data comes from outside your accountability boundary &#8212; third-party feeds, undocumented internal systems, legacy data where the person who knew what the fields meant left five years ago. Even one table, well contextualized, starts compounding as you connect it to the next one, and the next one.</p><h1>Long Live the Context Architect</h1><p>The physical warehouse workers who resisted robotics didn&#8217;t save their jobs. They delayed their own transition. Those who moved into robotics coordination, system design, and exception architecture found themselves more valued, more strategic, and more central to the operation than they were when driving forklifts.</p><p>Data engineers who built their identity around moving data from one bucket to another have felt that identity under pressure for a while now. That pressure isn&#8217;t going away. AI will write your Spark jobs. AI will generate your dbt models. <strong><a href="https://www.elitebrains.com/blog/aI-generated-code-statistics-2025">AI will build more pipelines</a></strong> in a year than your team could build in a decade.</p><p>But AI cannot decide what &#8220;revenue&#8221; means for your organization. It cannot negotiate data contracts between producing and consuming teams. It cannot design the appropriate level of context for an agent addressing a specific business problem. It cannot build the organizational agreements that make semantic definitions stick. That work requires institutional knowledge, cross-functional coordination, and architectural judgment. That work is context architecture.</p><p>The data engineer&#8217;s value migrates from pipeline reliability to semantic reliability. From &#8220;the job ran&#8221; to &#8220;the meaning is right.&#8221; From operating the warehouse floor to designing the system that makes robotic operation trustworthy.</p><p>The frontier is genuinely open. Nobody has this figured out yet. The practitioners who invest in the architecture of meaning &#8212; not just the mechanics of movement &#8212; will define this discipline for the next decade.</p><div class="pullquote"><p><strong>ETL is dead. Long live the Context Architect.</strong></p></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #260]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-260</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-260</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 09 Mar 2026 04:31:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_08_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bayI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!bayI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!bayI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!bayI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bayI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30259,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_08_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bayI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!bayI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!bayI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!bayI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Best practices for orchestrating Databricks at scale</h1><p>As Databricks deployments scale, a familiar pattern emerges: multiple workspaces, multiple teams, and no reliable way to manage the dependencies between them.<br><br>In this hands-on deep dive, we'll show you how to build a cross-workspace control plane using Dagster on top of your existing Databricks environment. Demo-heavy and practitioner-focused, you'll leave with working patterns you can apply to your own platform the same day.</p><p><strong><a href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_08_26_data_engineering_weekly">Save your spot now</a></strong></p><div><hr></div><h1>underCurrent: <a href="https://current.confluent.io/data-engineers?utm_campaign=tm.devx_cd.underCurrent&amp;utm_source=newsletter&amp;utm_medium=dew">A one-day conference for data engineers and architects</a></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YgTh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YgTh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 424w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 848w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 1272w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YgTh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16850,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YgTh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 424w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 848w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 1272w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Confluent is hosting a free one-day conference with a catch: there&#8217;s no catch. It&#8217;s a single-track event with no sponsors and no product pitches&#8212;just technical talks for data engineers and architects.<br><br>&#127897;&#65039; Speakers include <strong>Joe Reis, Holden Karau, and Max Beauchemin</strong><br>&#128683; No vendors. No sales pitches<br>&#10024; 100% free to attend <br>&#128197; <strong>March 26</strong> <br>&#128205; San Francisco<br>&#127903;&#65039; <strong>Limited to 100 seats</strong> &#8212; <strong><a href="https://current.confluent.io/data-engineers?utm_campaign=tm.devx_cd.underCurrent&amp;utm_source=newsletter&amp;utm_medium=dew">register for free here</a></strong></p><div><hr></div><h1>Vinoth Govindarajan: OpenClaw Architecture</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P3Mc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P3Mc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 424w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 848w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 1272w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12218,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P3Mc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 424w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 848w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 1272w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Production AI agents fail at scale because uncontrolled state mutations corrupt execution and create unpredictable behavior. In &#8220;The Agent Stack,&#8221; Vinoth Govindarajan outlines OpenClaw&#8217;s architecture, in which isolated execution contexts and strict invariants prevent state leakage, while sessions enable async pause-resume semantics. The pattern standardizes how teams decouple short-term context from persistent state, ensuring agents reliably rehydrate their mental model and enforce authorization boundaries that gate tool access to user privilege levels.</p><p><strong><a href="https://theagentstack.substack.com/p/openclaw-architecture-part-1-control">Part 1</a>, <a href="https://theagentstack.substack.com/p/openclaw-architecture-part-2-concurrency">Part 2</a>, <a href="https://openclawunboxed.com/p/openclaw-architecture-part-3-memory">Part 3.1</a>, <a href="https://theagentstack.substack.com/p/openclaw-architecture-part-3-memory">Part 3.2</a>, <a href="https://theagentstack.substack.com/p/openclaw-architecture-part-4-security">Part 4</a></strong></p><div><hr></div><h1>Pinterest: Unified Context-Intent Embeddings for Scalable Text-to-SQL</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EtBR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EtBR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 424w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 848w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 1272w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EtBR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic" width="1400" height="655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:655,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24113,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EtBR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 424w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 848w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 1272w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Navigating sprawling data warehouses forces analysts to choose between slow manual exploration and unreliable keyword-based search. Pinterest Engineering built a production Analytics Agent that embeds historical SQL queries as semantic intent signatures, injecting business glossary terms and extracting structural patterns (join keys, filters, usage signals) to retrieve contextually relevant tables at scale. The system reached 40% internal adoption within two months by standardizing discovery through an asset-first pattern, converting years of institutional SQL knowledge into a searchable, governance-aware library.</p><p><strong><a href="https://medium.com/pinterest-engineering/unified-context-intent-embeddings-for-scalable-text-to-sql-793635e60aac">https://medium.com/pinterest-engineering/unified-context-intent-embeddings-for-scalable-text-to-sql-793635e60aac</a></strong></p><div><hr></div><h1>Francesca Lazzeri: AI evals platforms: A comparative guide for production AI systems</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kftj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kftj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 424w, https://substackcdn.com/image/fetch/$s_!kftj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 848w, https://substackcdn.com/image/fetch/$s_!kftj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 1272w, https://substackcdn.com/image/fetch/$s_!kftj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kftj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic" width="753" height="330" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:330,&quot;width&quot;:753,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kftj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 424w, https://substackcdn.com/image/fetch/$s_!kftj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 848w, https://substackcdn.com/image/fetch/$s_!kftj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 1272w, https://substackcdn.com/image/fetch/$s_!kftj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Production AI systems fail silently in ways demos never expose, forcing teams to replace manual testing with automated evaluation as the enterprise LLM market scales toward $71.1 billion by 2034. A comparative analysis of six leading eval platforms reveals a consolidation around open standards (OpenTelemetry, OpenInference) and specialized architectures&#8212;Microsoft AI Foundry embeds red-teaming agents into Azure workflows, while Galileo replaces expensive LLM judges with smaller consensus models (Luna) to reduce eval latency. The shift standardizes safety as a structural property of development, enabling teams to catch jailbreaks and data leaks early while choosing platform fit based on stack priorities: simulation-first, research rigor, or ecosystem depth.</p><p><strong><a href="https://medium.com/data-science-at-microsoft/how-do-you-know-your-ai-actually-works-b1a380a07825">https://medium.com/data-science-at-microsoft/how-do-you-know-your-ai-actually-works-b1a380a07825</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_08_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uymw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!uymw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!uymw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!uymw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uymw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18459,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_08_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uymw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!uymw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!uymw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!uymw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_08_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Netflix: MediaFM - The Multimodal AI Foundation for Media Understanding at Netflix</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p3o3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p3o3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 424w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 848w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 1272w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p3o3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic" width="1400" height="1172" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1172,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16754,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p3o3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 424w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 848w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 1272w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Understanding content at scale requires machine-readable representations that capture narrative structure, not just visual features&#8212;a challenge intensified as streaming catalogs exceed tens of thousands of titles. Netflix built MediaFM, a tri-modal transformer that fuses video frames, audio (wav2vec2), and subtitles into shot-level embeddings using Masked Shot Modeling, with a [GLOBAL] token injecting title-level context (synopsis, genre) to ground each segment. The model powers ad placement, clip ranking, content tagging, and cold-start recommendations by contextualizing shots within narrative sequence, outperforming external benchmarks and enabling machine-readable understanding across Netflix's entire catalog.</p><p><strong><a href="https://netflixtechblog.com/mediafm-the-multimodal-ai-foundation-for-media-understanding-at-netflix-e8c28df82e2d">https://netflixtechblog.com/mediafm-the-multimodal-ai-foundation-for-media-understanding-at-netflix-e8c28df82e2d</a></strong></p><div><hr></div><h1>Nabin Debnath: Building a Least-Privilege AI Agent Gateway for Infrastructure Automation with MCP, OPA, and Ephemeral Runners</h1><p>AI agents in infrastructure automation bypass traditional guardrails by making runtime decisions without human validation, risking silent resource destruction or credential exfiltration at scale. The author writes about the Agent Gateway to treat the agents as untrusted requesters, layering Model Context Protocol (MCP) for tool discovery, Open Policy Agent (OPA) for intent-based authorization, and ephemeral Kubernetes runners for isolated execution. The pattern enforces least privilege by mediating all API calls through policy code, validates plan integrity against immutable hashes, and surfaces decision reasoning via OpenTelemetry&#8212;standardizing agent governance with SLO targets (100ms policy decisions, 5s runner startup) that prevent silent bypasses.</p><p><strong><a href="https://www.infoq.com/articles/building-ai-agent-gateway-mcp/">https://www.infoq.com/articles/building-ai-agent-gateway-mcp/</a></strong></p><div><hr></div><h1>Dropbox: Using LLMs to amplify human labeling and improve Dash search relevance</h1><p>Enterprise search ranking requires massive labeled datasets, but traditional human annotation is prohibitively slow and cannot scale to sensitive content across billions of internal documents. Dropbox Dash uses LLMs as labeling force multipliers by calibrating a small human-labeled set to generate millions of relevance judgments offline, then training lightweight production models (XGBoost) on synthetic labels at scale. The pattern standardizes judgment consistency by pairing contextual research tools (for acronyms and ambiguous queries) with programmatic prompt optimization (DSPy), enabling continuous ranking improvements while keeping human oversight as the ground truth rather than replacing it.</p><p><strong><a href="https://dropbox.tech/machine-learning/llm-human-labeling-improving-search-relevance-dropbox-dash">https://dropbox.tech/machine-learning/llm-human-labeling-improving-search-relevance-dropbox-dash</a></strong></p><div><hr></div><h1>Zalando: Why We Ditched Flink Table API Joins: Cutting State by 75% with DataStream Unions</h1><p>Declarative SQL joins in Flink multiply state across operators, forcing teams to choose between snapshot overhead or operational instability&#8212;a scaling bottleneck for pipelines enriching millions of real-time product records. Zalando replaced chained Table API joins with a custom KeyedProcessFunction that unions all streams into a single keyed DataStream, storing each product&#8217;s enriched state once in RocksDB instead of redundantly across join operators. The shift cut state size by 75% (235GB to 56GB), reduced snapshot time by 77% (11 minutes to 2.5 minutes), and lowered AWS costs by 13%&#8212;demonstrating how imperative control over stream topology recovers efficiency when declarative abstractions misalign with physical execution.</p><p><strong><a href="https://engineering.zalando.com/posts/2026/03/why-we-ditched-flink-table-api-joins-cutting-state.html">https://engineering.zalando.com/posts/2026/03/why-we-ditched-flink-table-api-joins-cutting-state.html</a></strong></p><div><hr></div><h1>Aihua Xu &amp; Andrew Lamb: Variant Type in Apache Parquet for Semi-Structured Data</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4lQG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4lQG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 424w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 848w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 1272w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4lQG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic" width="1024" height="633" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea671ece-9073-414d-adc3-952731dc5248_1024x633.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:633,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53802,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4lQG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 424w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 848w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 1272w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Semi-structured data in columnar formats forces a choice between slow JSON parsing or rigid schemas that block evolution, creating friction in pipelines handling heterogeneous records. Apache Parquet&#8217;s new Variant type uses binary-encoded metadata plus value fields, enabling direct nested field access without full-document parsing while preserving native types (timestamps, integers) that JSON loses. The type standardizes schema flexibility through &#8220;shredding&#8221;&#8212;extracting hot fields into strongly-typed columns for predicate pushdown and pruning&#8212;allowing heterogeneous records to coexist in one column, reducing migration overhead and accelerating adoption across DuckDB, Spark 4.0, and Snowflake.</p><p><strong><a href="https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/">https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/</a></strong></p><div><hr></div><h1>Pranav Mehta: Silent Data Loss in ClickHouse: 3 Reasons Your Distributed Queue Keeps Growing</h1><p>ClickHouse distributed inserts silently fail when coordination services downtime, execution timeouts, or concurrency limits block the async flush pipeline, leaving data trapped in on-disk queues while clients receive no error signals. The author identifies three failure modes: <em>Keeper/ZooKeeper downtime forcing ReplicatedMergeTree read-only, oversized insert blocks exceeding max_execution_time that cork sequential queue processing, and exhausted user concurrency slots starving background INSERT workers</em>. The pattern demands proactive monitoring of DistributedFilesToInsert (alert at 50+ files), debugging via system.distribution_queue.last_exception, and inode-aware filesystem choice (XFS over ext4) to prevent silent data loss and system crashes from queue explosion.</p><p><strong><a href="https://medium.com/@pranavmehta94/silent-data-loss-in-clickhouse-3-reasons-your-distributed-queue-keeps-growing-9bf6b8af88e5">https://medium.com/@pranavmehta94/silent-data-loss-in-clickhouse-3-reasons-your-distributed-queue-keeps-growing-9bf6b8af88e5</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #259]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-259</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-259</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 02 Mar 2026 03:57:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_01_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OlRC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OlRC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24006,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_01_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OlRC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>AI is moving fast. Is your data platform ready?</h1><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_01_26_data_engineering_weekly">Download the AI Modernization Guide</a></strong></p><div><hr></div><h1>underCurrent: <a href="https://current.confluent.io/data-engineers?utm_campaign=tm.devx_cd.underCurrent&amp;utm_source=newsletter&amp;utm_medium=dew">A one-day conference for data engineers and architects</a></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Ad5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Ad5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 424w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 848w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 1272w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16850,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Ad5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 424w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 848w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 1272w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Confluent is hosting a free one-day conference with a catch: there&#8217;s no catch. It&#8217;s a single-track event with no sponsors and no product pitches&#8212;just technical talks for data engineers and architects.<br><br>&#127897;&#65039; Speakers include <strong>Joe Reis</strong>, <strong>Holden Karau</strong>, and <strong>Max Beauchemin</strong><br>&#128683; No vendors. No sales pitches<br>&#10024; 100% free to attend <br>&#128205; San Francisco &#128197; March 26 <br>&#127903;&#65039; Limited to 100 seats &#8212; register for free <strong><a href="https://current.confluent.io/data-engineers?utm_campaign=tm.devx_cd.underCurrent&amp;utm_source=newsletter&amp;utm_medium=dew">here</a></strong></p><div><hr></div><h1>Netflix: DataJunction as Netflix&#8217;s answer to the missing piece of the modern data stack</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VaGI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VaGI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 424w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 848w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 1272w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VaGI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp" width="512" height="354" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9726,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VaGI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 424w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 848w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 1272w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Metric inconsistency and definition sprawl across distributed teams create onboarding bottlenecks and fragment analytics workflows. Netflix built DataJunction, an open-source semantic layer that decouples metric definitions from compute through a graph-based metadata model and SQL generation engine. This standardizes metrics across the experimentation platform, reducing onboarding from weeks to hours, while enabling expansion across all business verticals and LLM integration for auditable metric lineage.</p><p><strong><a href="https://netflixtechblog.medium.com/datajunction-as-netflixs-answer-to-the-missing-piece-of-the-modern-data-stack-92af926b40a5">https://netflixtechblog.medium.com/datajunction-as-netflixs-answer-to-the-missing-piece-of-the-modern-data-stack-92af926b40a5</a></strong></p><div><hr></div><h1>Benoit Pimpaud: Specs Should Be Equations, Not Essays</h1><p>As AI automates code generation, the engineering bottleneck shifts from writing implementation to defining precise specifications. the author argues that natural language specifications create compounding ambiguity when parsed by LLMs and proposes layered specifications that combine text, diagrams, and mathematical notation as constraint definitions for AI iteration. Mathematical specs eliminate interpretation drift, enabling AI agents to generate correct programs by satisfying invariants rather than reconstructing intent from prose.</p><p><strong><a href="https://fromanengineersight.substack.com/p/specs-should-be-equations-not-essays">https://fromanengineersight.substack.com/p/specs-should-be-equations-not-essays</a></strong></p><div><hr></div><h1>Notion: Balancing cost and reliability for Spark on Kubernetes</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8jTD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8jTD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 424w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 848w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 1272w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8jTD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic" width="616" height="316" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:316,&quot;width&quot;:616,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8jTD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 424w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 848w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 1272w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Spark clusters on Kubernetes face a fundamental tension between aggressive cost optimization through spot instances and job reliability during capacity interruptions. Notion reduced compute costs by 60&#8211;90% through EKS migration with Karpenter bin-packing, then open-sourced Spot Balancer&#8212;a Kubernetes webhook that enforces stable spot-to-on-demand ratios per job, preventing cascade failures during AWS termination windows. Spot Balancer abstracts infrastructure trade-offs into developer-friendly stability tiers, enabling teams to optimize costs without sacrificing job completion rates.</p><p><strong><a href="https://www.notion.com/blog/balancing-cost-and-reliability-for-spark-on-kubernetes">https://www.notion.com/blog/balancing-cost-and-reliability-for-spark-on-kubernetes</a></strong></p><div><hr></div><h1>Sponsored: Building a Cross-Workspace Control Plane for Databricks</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_01_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uy7d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uy7d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24982,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_01_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uy7d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As Databricks deployments scale, a familiar pattern emerges: multiple workspaces, multiple teams, and no reliable way to manage the dependencies between them.<br>In this hands-on deep dive, we'll show you how to build a cross-workspace control plane using Dagster on top of your existing Databricks environment. Demo-heavy and practitioner-focused, you'll leave with working patterns you can apply to your own platform the same day.</p><p><strong><a href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_01_26_data_engineering_weekly">Register now</a></strong></p><div><hr></div><h1>Apache Iceberg: Introducing the Apache Iceberg File Format API</h1><p>It is indeed an exciting development in Iceberg to support a plugable file format API spec. As we increasingly handle unstructured data, this will significantly enhance data management practices through unified governance and compliance. Interestingly, Apache Hudi&#8217;s <strong><a href="https://github.com/apache/hudi/issues/14127">RFC-100</a></strong> is, in fact, the feature request to support the Lance File Format. </p><p><strong><a href="https://iceberg.apache.org/blog/apache-iceberg-file-format-api/">https://iceberg.apache.org/blog/apache-iceberg-file-format-api/</a></strong></p><div><hr></div><h1>Delta Lake: The next evolution of Delta - Catalog-Managed Tables</h1><blockquote><p><em>We went through the full cycle, from exposing the files directly through Hadoop to Snowflake-style cloud data warehouses, to Iceberg-style direct file access, back to catalog-managed tables. </em></p></blockquote><p>Nonetheless, it will be interesting to watch DuckLake-style catalog-managed tables vs object-store-style managed tables. </p><p><strong><a href="https://delta.io/blog/2026-02-02-delta-catalog-managed-tables/">https://delta.io/blog/2026-02-02-delta-catalog-managed-tables/</a></strong></p><div><hr></div><h1>Microsoft Fabric: Under the hood: an introduction to the Native Execution Engine for Microsoft Fabric</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ah5O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ah5O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 424w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 848w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 1272w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ah5O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic" width="496" height="465" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:465,&quot;width&quot;:496,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7782,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ah5O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 424w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 848w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 1272w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Apache Gluten project is continually making an impact on the Spark ecosystem, bringing unique optimization and efficiency. Microsoft Fabric writes an under-the-hood story of adopting Apache Gluten in its Fabric platform. </p><p><strong><a href="https://blog.fabric.microsoft.com/en-us/blog/under-the-hood-an-introduction-to-the-native-execution-engine-for-microsoft-fabric/">https://blog.fabric.microsoft.com/en-us/blog/under-the-hood-an-introduction-to-the-native-execution-engine-for-microsoft-fabric/</a></strong></p><div><hr></div><h1>Pinterest: Piqama - Pinterest Quota Management Ecosystem</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WV0P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WV0P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 424w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 848w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 1272w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WV0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic" width="1400" height="701" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:701,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17308,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WV0P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 424w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 848w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 1272w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As companies scale, manual and static quota systems become bottlenecks, forcing engineers to choose between over-provisioning resources and managing brittle enforcement logic. Pinterest developed Piqama, a unified quota platform that dynamically right-sizes limits using historical data stored in Apache Iceberg, then applies custom enforcement strategies across batch schedulers and online services. Piqama centralizes resource governance across hardware and service metrics, enabling teams to optimize capacity allocation while linking consumption directly to financial costs.</p><p><strong><a href="https://medium.com/pinterest-engineering/piqama-pinterest-quota-management-ecosystem-dc7881433bf5">https://medium.com/pinterest-engineering/piqama-pinterest-quota-management-ecosystem-dc7881433bf5</a></strong></p><div><hr></div><h1>LinkedIn: Engineering LinkedIn&#8217;s job ingestion system at scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ee5n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ee5n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 424w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 848w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic" width="1456" height="601" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:601,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ee5n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 424w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 848w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Ingestion systems struggle to scale source onboarding&#8212;hard-coded extraction logic creates engineering bottlenecks that slow integration of new data partners. LinkedIn shifted extraction logic from code to configuration files called Sitemaps, enabling AI tools and browser plugins to onboard sources without engineering deployments. At the same time, a transactional state machine enforces precise failure boundaries across parallel mining tasks. The configuration-driven approach reduces onboarding time from weeks to hours, allowing LinkedIn to ingest 20TB daily across thousands of global sources. </p><p><strong><a href="https://www.linkedin.com/blog/engineering/infrastructure/engineering-linkedins-job-ingestion-system-at-scale">https://www.linkedin.com/blog/engineering/infrastructure/engineering-linkedins-job-ingestion-system-at-scale</a></strong></p><div><hr></div><h1>Shopify: The generative recommender behind Shopify&#8217;s commerce engine</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jh6b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jh6b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 424w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 848w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 1272w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jh6b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic" width="1456" height="886" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:886,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13795,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jh6b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 424w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 848w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 1272w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Recommendation systems traditionally treat purchases as isolated events, missing the temporal and causal structure that shapes buyer journeys across millions of products. Shopify transitioned to an autoregressive sequence model that treats commerce journeys as token sequences, implementing RoPE-inspired rotary encoding combined with relative attention bias to capture temporal gaps and seasonality across its catalog. The time-aware attention mechanism drove +0.94% order growth and +0.71% conversion lift while achieving 7.3x training speedup through optimized CUDA kernels, enabling Shopify to integrate richer context into a unified generative framework.</p><p><strong><a href="https://shopify.engineering/generative-recommendations">https://shopify.engineering/generative-recommendations</a></strong></p><div><hr></div><h1>Alibaba: PostgreSQL Blink-tree Implementation</h1><p>As we increasingly use AI to code, understanding database internals is more critical than ever. Alibaba Cloud engineers break down how PostgreSQL utilizes the <strong><a href="https://pages.cs.wisc.edu/~yxy/cs764-f22/slides/L15.pdf">Blink-tree </a></strong>architecture to achieve massive concurrency. By adding link pointers to sibling nodes and high keys to mark boundaries, PostgreSQL allows searches to proceed without lock-coupling. This enables the system to gracefully handle concurrent page splits&#8212;following links when data exceeds old boundaries&#8212;and significantly outperforms the more rigid <strong><a href="https://kernelmaker.github.io/MySQL-Lock-1">lock-subtree approach</a></strong> used in MySQL&#8217;s InnoDB.</p><p><strong><a href="https://www.alibabacloud.com/blog/postgresql-blink-tree-implementation_602913">https://www.alibabacloud.com/blog/postgresql-blink-tree-implementation_602913</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering After AI]]></title><description><![CDATA[Moving Data Was Never the Point. Meaning It Is.]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-after-ai</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-after-ai</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Tue, 24 Feb 2026 03:03:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2da46ceb-78fd-4718-9ccb-7afb113096ec_1154x486.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few days back, I ran a LinkedIn poll asking what stays core to software engineering as AI increasingly writes the code. 53% said architecture and trade-offs. 20% said quality and ownership, and 25% said product and problem discovery.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uwq8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uwq8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 424w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 848w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 1272w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uwq8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png" width="948" height="610" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:610,&quot;width&quot;:948,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89112,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uwq8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 424w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 848w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 1272w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The poll wasn&#8217;t specifically about data engineering, but the answer it yielded applies directly to us. When AI can generate a pipeline as fluently as a senior engineer, the question isn&#8217;t whether our toolbox is changing &#8212; it clearly is. The question is: what kind of thinking has always been too important to automate, and why we let it get buried under the more mechanical work in the first place.</p><p>My answer is that the irreducible work was never about moving data. It was always about meaning. And the framework we&#8217;ve been using &#8212; ETL &#8212; was never really designed to capture meaning.</p><div><hr></div><h1>The ETL Era and Why It&#8217;s Ending</h1><p>Extract, Transform, Load made sense as a job description for a specific historical moment. Source systems were siloed, formats were inconsistent, and somebody had to write the code that moved data from where it lived to where it could be used. The data engineer was that somebody.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KwLr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KwLr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 424w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 848w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 1272w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KwLr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png" width="1228" height="346" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:1228,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:497725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KwLr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 424w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 848w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 1272w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But if we&#8217;re honest, the transformation step was always the most brittle part. Teams encoded business rules as SQL logic or Python functions, buried them in pipeline code, version-controlled them alongside infrastructure, but rarely treated them with the same rigor as application code. When the definition of &#8220;active user&#8221; changed &#8212; and it always changed &#8212; someone had to find every place that definition lived and update it, hoping they caught them all.</p><p>AI is now competent at generating this kind of code. Not perfect, but competent enough that the mechanical work of pipeline construction is no longer a meaningful differentiator. If your professional identity is built around being good at writing transformation logic, that identity is under pressure.</p><p>But this isn&#8217;t a story about loss. It&#8217;s a story about clarity. The mechanical work was always obscuring the more important work underneath it. AI forcing that reckoning is, in a strange way, a gift.</p><div><hr></div><h1>Introducing ECL &#8212; Extract, Contextualize, Link</h1><p>The framework emerging as a replacement isn&#8217;t a technical architecture so much as a reorientation of purpose. Instead of Extract, Transform, Load, think Extract, Contextualize, Link.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gXAy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gXAy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 424w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 848w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 1272w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gXAy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png" width="1280" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:972606,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gXAy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 424w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 848w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 1272w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Extract remains. Data still needs to move from source systems to analytical environments, and that work still requires engineering judgment &#8212; about reliability, latency, volume, and failure modes. AI will increasingly handle the mechanical parts, but the architectural decisions about what to extract, when, and how belong to people who understand both the source systems and the downstream consequences.</p><p>Contextualize is where the real shift happens. This is the work of giving data semantic meaning &#8212; understanding that &#8220;revenue&#8221; is calculated differently by Finance and Sales, that a timestamp in a clickstream event means something different than a timestamp in a billing record, that a null value in one system represents the absence of information while in another it represents an explicit user choice. AI can draft this work at scale &#8212; inferring field definitions, classifying entities, and mapping relationships across a data landscape that no human team could manually annotate in full. What AI cannot do is be accountable for itself. The judgment of whether an inference is correct, the organizational authority to declare a definition, the decision to formalize a discovered pattern into an enforced contract &#8212; that belongs to humans. Contextualize is where AI inference and human judgment meet, structured by a pipeline built specifically for that purpose.</p><p>Link is about entity relationships across the data landscape &#8212; connecting a customer record in your CRM to a user record in your product database, linking an event in your analytics system to a session in your support tool. As AI generates more of the code that consumes data, the ability to reason about how entities relate across systems becomes more valuable, not less. Linkage is what makes context portable &#8212; what allows the meaning built in one part of the landscape to be grounded in its relationships to the rest.</p><p>The rest of this article discusses how ECL works at the architectural level, not as three abstract concepts, but as three concrete pipelines &#8212; and why you need all of them.</p><div><hr></div><h1>Early Binding &#8212; Contracts as Executable Constraints</h1><p>The first technique is early binding: capturing semantic intent at the point of data production, before the data moves.</p><p>Data contracts are the practical implementation of this idea. At their core, contracts are agreements between data producers and their consumers &#8212; specifying schema, data quality expectations, ownership, and the semantic meaning of each field.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g3D-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g3D-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 424w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 848w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 1272w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g3D-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png" width="1234" height="404" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ddbb651c-b309-4861-ba49-0e142c836729_1234x404.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:404,&quot;width&quot;:1234,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:752776,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g3D-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 424w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 848w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 1272w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Data Engineering Weekly identified this gap precisely in their piece <em><strong><a href="https://www.dataengineeringweekly.com/p/data-contracts-a-missed-opportunity">Data Contracts: A Missed Opportunity</a></strong></em>. While the data industry was debating what contracts were and drafting governance frameworks to describe them, software engineering had quietly converged on a different organizing principle: treating specifications as executable constraints with real failure semantics. The data industry treated contracts as documentation. Software engineers treated them as interfaces &#8212; things that could break, that had versioning implications, that enforced behavior rather than merely describing it.</p><p>A data contract that lives in a wiki and gets updated when someone remembers is the documentation. A data contract that is enforced at the point of production &#8212; that fails a pipeline when a schema changes without notice, that alerts a consumer when quality thresholds are violated, that an AI agent can reason about deterministically &#8212; that is architecture.</p><p>This matters more in an AI-heavy world, not less. When AI agents generate transformation code, bad contracts are amplified at scale. The agent will faithfully implement whatever logic it&#8217;s given; if the contract governing its inputs is ambiguous or unenforced, the errors it produces will be systematic rather than isolated. Early binding is the mechanism by which human intent gets formalized into something AI can actually work with.</p><p>But early binding alone has a fundamental limitation. And understanding that limitation is what makes the Contextualize pipeline necessary.</p><div><hr></div><h1>The Problem Early Binding Alone Can&#8217;t Solve</h1><p>Consider what happens to a well-contracted dataset as it moves through a modern Medallion architecture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vwwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vwwM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 424w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 848w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 1272w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vwwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png" width="1237" height="321" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:321,&quot;width&quot;:1237,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:448249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vwwM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 424w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 848w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 1272w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At the Bronze layer, data lands close to its source &#8212; raw, minimally transformed, the contract&#8217;s guarantees still largely intact. Silver applies conformance rules: deduplication, type casting, and light standardization. By the time data reaches Gold, the pipeline has made a series of editorial decisions on the data&#8217;s behalf. Aggregations collapse granular events into metrics. Engineers bake business logic into the shape of the table. The Gold layer is an artifact optimized for a specific set of questions &#8212; the ones that seemed important when the pipeline was built.</p><p>Early binding contracts help at the source, but they can&#8217;t prevent this erosion at every subsequent hop &#8212; especially when those contracts are treated as descriptive rather than executable. If there&#8217;s no enforcement mechanism preventing meaning from drifting across transformations, the telephone game plays out silently in your pipeline. By the time a consumer queries the Gold layer, they&#8217;re working with an artifact whose original intent may be several editorial decisions removed from the contract.</p><p>This is the problem that early binding alone cannot solve. Each transformation layer progressively collapses the context captured at the source. You need a complementary approach&#8212;one that preserves the ability to recover context when it&#8217;s actually needed.</p><div><hr></div><h1>Late Binding &#8212; The Agentic Contextualized Pipeline</h1><p>Traditional late binding deferred the <em>application</em> of business rules to query time. What it didn&#8217;t defer was the <em>definition</em> of those rules &#8212; domain experts still had to specify them upfront, just applied through a semantic layer rather than baked into a physical table. In complex domains, that knowledge engineering process was its own bottleneck.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C5NB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C5NB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 424w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 848w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 1272w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C5NB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png" width="1300" height="378" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:378,&quot;width&quot;:1300,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:733215,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C5NB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 424w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 848w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 1272w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The more forward-looking approach is to defer definition itself &#8212; and hand that work to a dedicated pipeline.</p><p>The Contextualize pipeline is a separate, agentic pipeline that runs alongside your data infrastructure. Its job is singular: build and maintain a living, validated store of semantic context. It isn&#8217;t part of the Extract pipeline. It isn&#8217;t a query-time process. It&#8217;s a first-class engineering artifact with its own triggering model, validation layer, and storage.</p><p>The trigger is event-driven, not scheduled. Every new dataset that lands automatically kicks off the pipeline. For existing datasets, continuous profiling monitors for meaningful changes &#8212; a new column appears, a column is dropped, a data distribution shifts in ways that suggest something changed upstream. Any of these events re-triggers the pipeline for the affected entities. Semantic context isn&#8217;t a one-time annotation exercise. It tracks the data as it evolves.</p><p>The pipeline itself is agentic. An AI agent analyzes the incoming data &#8212; schema, sample values, statistical profiles, lineage &#8212; and infers semantic meaning. What does this field represent? What business entity does it belong to? What relationships exist between it and other data in the landscape? It produces structured, versioned context artifacts: inferences about meaning that didn&#8217;t require a domain expert to pre-specify every scenario.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2K_z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2K_z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 424w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 848w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 1272w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2K_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png" width="1129" height="464" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:1129,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:646894,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2K_z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 424w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 848w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 1272w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Those inferences don&#8217;t automatically commit. They route to a validation layer that works like a labeling workflow &#8212; because structurally, it is one. An LLM-as-Judge validates high-confidence inferences before any human review triggers. Medium-confidence ones surface to domain experts for labeling. The pipeline flags low-confidence or contested inferences for deeper investigation. The humans aren&#8217;t reviewing every artifact; they&#8217;re reviewing the uncertain ones. Every labeling automation technique that works in ML pipelines applies here.</p><p>Validated artifacts land in a Context Store &#8212; a dedicated, versioned, queryable store of semantic definitions, entity classifications, and relationship maps. This is the new infrastructure component that ECL requires. Downstream agents don&#8217;t query raw data and infer meaning on the fly. They query the Context Store first, ground their understanding in validated context, and then query the data. The context is stable, reusable, and auditable &#8212; the opposite of ephemeral query-time inference.</p><div><hr></div><h1>Early Binding vs Late Binding &#8212; When to Choose What</h1><p>The decision criterion isn&#8217;t about semantic maturity or how well-understood a domain is. It&#8217;s about where the data comes from relative to your accountability boundary.</p><p>When a dataset originates within a controlled environment &#8212; produced by a team or system within your organization&#8217;s sphere of accountability &#8212; early binding is the right tool. The producer and consumer share an organizational context. Contracts can be negotiated, enforced, and held to. The producing team can be made accountable for the schema they declare and the semantics they commit to. Prescribed context is possible because the relationship that makes it enforceable exists.</p><p>When a dataset originates outside that boundary &#8212; third-party feeds, partner data, public datasets, marketplace sources &#8212; that relationship doesn&#8217;t exist. You cannot hold an external provider to a data contract. The schema can change without notice. The semantics are inferred, not declared. This is where the Contextualize pipeline earns its place. Discovered context is the only kind available.</p><p>But the boundary is not purely organizational. Poorly governed internal data &#8212; produced by a team with no accountability to its consumers, with undocumented schemas and inconsistent definitions &#8212; is effectively uncontrolled even if it sits within the same organization. The real test is not position on an org chart. It is accountability. Early bind where accountability exists. Let the Contextualize pipeline discover where it doesn&#8217;t.</p><p>The feedback loop holds in both directions. Discovered context built up through repeated profiling, inference, and validation can graduate into a prescribed context over time. An external dataset that your organization ingests consistently enough to profile, validate, and republish as an internal data product crosses the boundary from uncontrolled to controlled at that point. The Contextualize pipeline is what makes that transition possible &#8212; and makes the resulting contract trustworthy rather than assumed.</p><p>A data environment that treats all data as early-bindable is brittle. It can only contract what it already understands, and it has no mechanism for the uncontrolled data that makes up a growing share of the analytical landscape. A data environment that treats all data as requiring discovery never formalizes what it learns into enforceable guarantees. The architecture that works reads the accountability boundary correctly and applies the right technique on both sides.</p><div><hr></div><h1>Context Propagation &#8212; The Relay, Not the Pipeline</h1><p>With three pipelines now in play, the question becomes: how does context actually travel through the architecture without getting lost?</p><p>The conventional mental model is wrong. Context doesn&#8217;t travel <em>through</em> the data pipeline&#8212;if it did, it would be lost at every transformation step, which is precisely the Medallion erosion problem. Context travels <em>alongside</em> the pipeline, as metadata, lineage records, and contract provenance. The transformations change the data; the metadata preserves the meaning.</p><p>The relay works like this. Early binding stamps prescribed context at the point of origin &#8212; schema, field-level semantics, producing team ownership, quality thresholds &#8212; as an executable contract living in metadata, not column values. Lineage tooling propagates this through Bronze, Silver, and Gold, maintaining a record of the transformations applied and the contract that governed the data at each stage. The Contextualize pipeline reads that lineage as part of its inference process &#8212; understanding not just what a field looks like today, but also the history of how it arrived and the commitments made about it at the source. Validated inferences land in the Context Store, which becomes the relay&#8217;s destination: a durable, queryable record of what the data means, grounded in both original contract and accumulated lineage.</p><p>The analogy that makes this concrete is git. A file can be modified heavily across dozens of commits &#8212; refactored, renamed, moved, rewritten &#8212; but the context of how it got there is never lost, because it lives in the commit history, not in the file itself. The Gold layer is the latest commit. The lineage graph is the git log. The Context Store is the understanding you build by reading that log systematically rather than hoping the current file tells the whole story.</p><p>This reframe &#8212; from pipeline to relay &#8212; changes what data engineers are actually responsible for building. The transformations are increasingly automatable. The metadata infrastructure, the lineage graph, the Contextualize pipeline that reads it, the Context Store that accumulates from it &#8212; that is the engineering surface that requires sustained human judgment.</p><div><hr></div><h1>The Context Store as the New Engineering Surface</h1><p>Which brings us to where the most consequential engineering work has migrated.</p><p>The Context Store is where business definitions live &#8212; not as documentation in a wiki, not as logic engineers have baked into a Gold table, but as versioned, validated artifacts that downstream systems can query and trust. This is where the validation workflow resolves the competing interpretations of &#8220;revenue&#8221; from Finance and Sales &#8212; not organizational politics, but a confidence-based process that determines which inference earns formalization. Where AI consumers find the grounded, stable context they need to act reliably rather than reverting to ad hoc inference.</p><p>This surface distinguishes queryable data from trustworthy data. A table can be perfectly partitioned, indexed, and replicated while being semantically wrong &#8212; built on a definition that drifted from its source contract three transformations ago and never caught because no Contextualize pipeline was watching. The Context Store is where that failure mode gets closed.</p><p>As AI generates more transformation code and AI agents consume more data at scale, the stakes of this surface rise. An agent operating on a stale or conflicting context artifact produces systematic errors rather than recoverable ones. The engineering work that governs trustworthiness &#8212; designing the trigger model for the Contextualize pipeline, structuring the labeling workflow, deciding what validation confidence threshold earns formalization, and versioning context artifacts as definitions evolve &#8212; requires human judgment at every step.</p><p>Practitioners are still working out the patterns for doing this at scale. The tooling is maturing. How organizations govern ownership of the Context Store, adjudicate conflicts between teams, and manage the graduation from discovered to prescribed context are genuinely open questions. This is where the frontier actually is.</p><div><hr></div><h1>The New Data Engineer &#8212; Context Architect</h1><p>Return to the poll. 53% said architecture and trade-offs are what remain irreducibly human. In the data engineering context, ECL is what that looks like in practice.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_X4x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_X4x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 424w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 848w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 1272w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_X4x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png" width="1154" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1154,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:784728,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_X4x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 424w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 848w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 1272w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The data engineer of the next decade owns the architecture of meaning. They design the contractual foundations at the source&#8212;executable, enforceable, versioned. They build the lineage infrastructure that carries context through every transformation layer without losing it. They design and govern the Contextualize pipeline and the Context Store &#8212; the infrastructure where inferences get built, validated, and formalized into the definitions that everything downstream depends on. They understand when to prescribe context upfront and when to let it be discovered, and they build the systems that make both possible.</p><p>But this is not only a technical role. Context erosion is as much an organizational failure as a technical one. Teams don&#8217;t share semantic definitions because no ownership model incentivizes them to do so. Nobody enforces contracts because producing teams have no accountability to the consumers they serve. In this new frame, the data engineer is the person who builds both the technical system and the organizational agreement that holds it together. They sit at the intersection of architecture and coordination &#8212; the two things the poll respondents correctly identified as irreducibly human.</p><p>The title &#8220;Data Engineer&#8221; may need an update. What we are actually describing is a Context Architect &#8212; someone whose primary material is not data movement but data meaning, not pipelines but provenance, not transformation logic but the semantic infrastructure that makes transformation logic trustworthy.</p><div><hr></div><h1>An Open Frontier</h1><p>I want to be honest about what ECL is and what it isn&#8217;t. It is a reorientation &#8212; a way of thinking about what the work actually is, now that AI is handling more of what the work used to look like. It is not a finished methodology. The tooling that links early binding contracts to the Contextualize pipeline and Context Store is still maturing. The organizational patterns for governing who owns the Context Store, how conflicts between teams get adjudicated, and how discovered context earns formalization don&#8217;t yet have established templates. Practitioners are working out the engineering patterns for building contextual pipelines that operate reliably at scale in production environments right now, figuring it out as they go.</p><p>That&#8217;s precisely what makes this moment worth paying close attention to. The frontier is genuinely open. The practitioners who invest in the architectural and organizational work of context &#8212; who treat contracts as executable infrastructure, who build lineage as a first-class engineering concern, who govern the Contextualize pipeline and Context Store as seriously as they once owned the ETL pipeline &#8212; will define the discipline for the decade ahead.</p><p>The 53% who said architecture and trade-offs are irreducibly human were right. We didn&#8217;t yet know which architecture, or which trade-offs.</p><p>Now we do.</p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #258]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-257-19d</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-257-19d</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 23 Feb 2026 01:00:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_22_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pcFF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pcFF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28626,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_22_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pcFF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>AI is moving fast. Is your data platform ready?</h1><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_22_26_data_engineering_weekly">Download the AI Modernization Guide</a></strong></p><div><hr></div><h1>Garry Tan: Half the AI Agent Market Is One Category. The Rest Is Wide Open</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pr4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pr4c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 424w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 848w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 1272w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic" width="1200" height="718" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16453,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pr4c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 424w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 848w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 1272w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI Agents thrive in RL environments with a verifiable target and quick feedback. Software manufacturing is a perfect model for such an environment, but the challenge persists in other categories. It will be an exciting decade for software engineering as we build new infrastructure that we never imagined.  </p><p><strong><a href="https://garryslist.org/posts/half-the-ai-agent-market-is-one-category-the-rest-is-wide-open">https://garryslist.org/posts/half-the-ai-agent-market-is-one-category-the-rest-is-wide-open</a></strong></p><div><hr></div><h1>LangChain: How to Use Memory in Agent Builder</h1><p>Agents fail to improve over time when they treat every conversation as stateless and discard learned preferences or workflows. The article explains how LangChain&#8217;s Agent Builder implements short-term and long-term memory as a filesystem of Markdown files, enabling persistent instructions and reusable skills. Explicit memory updates, modular skill loading, and direct file editing enable agents to reliably evolve behavior without increasing core prompt complexity.</p><p><strong><a href="https://blog.langchain.com/how-to-use-memory-in-agent-builder/">https://blog.langchain.com/how-to-use-memory-in-agent-builder/</a></strong></p><div><hr></div><h1>LinkedIn: Scaling LLM-Based ranking systems with SGLang at LinkedIn</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9eIl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9eIl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 424w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 848w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 1272w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9eIl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic" width="1024" height="571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:571,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12896,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9eIl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 424w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 848w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 1272w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLM-based ranking systems face strict latency and concurrency constraints because they score thousands of items per query without requiring text generation. The article explains how LinkedIn optimized SGLang for prefill-only ranking through batching improvements, scoring-only execution paths, prefix KV reuse, and Python runtime parallelization.</p><p><strong><a href="https://www.linkedin.com/blog/engineering/ai/scaling-llm-based-ranking-systems-with-sglang-at-linkedin">https://www.linkedin.com/blog/engineering/ai/scaling-llm-based-ranking-systems-with-sglang-at-linkedin</a></strong></p><div><hr></div><h1>Sponsored: The Scaling Data Teams Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=02_22_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uaud!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uaud!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:25368,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=02_22_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uaud!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>More datasets. More pipelines. More AI demands. The old way of doing things doesn&#8217;t work at this scale.<br>This free eBook walks through how teams actually scale sustainably with roles, responsibilities, automation, and patterns that work.</p><p><strong><a href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=02_22_26_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Spotify: Our Multi-Agent Architecture for Smarter Advertising</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dXi7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dXi7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 424w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 848w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 1272w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dXi7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dXi7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 424w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 848w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 1272w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Fragmented decision logic across buying channels prevented Spotify from translating high-level campaign goals into unified execution plans. The article explains how Spotify built Ads AI, a multi-agent orchestration layer with intent routing, specialized resolution agents, and data-grounded media planning using real-time tool integration. The architecture reduced campaign setup time from minutes to seconds, simplified user inputs, and grounded recommendations in historical performance data.</p><p><strong><a href="https://engineering.atspotify.com/2026/2/our-multi-agent-architecture-for-smarter-advertising">https://engineering.atspotify.com/2026/2/our-multi-agent-architecture-for-smarter-advertising</a></strong></p><div><hr></div><h1>Uber: Database Federation: Decentralized and ACL-Compliant Hive&#8482; Databases</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0mK-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0mK-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 424w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 848w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 1272w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0mK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic" width="1456" height="797" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:797,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22398,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0mK-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 424w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 848w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 1272w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Monolithic Hive warehouses create shared-fate outages, resource contention, and weak governance when thousands of datasets share a single database. The article explains how Uber implemented database federation by reorganizing datasets into domain-specific units, updating Hive Metastore pointers to avoid data duplication, and deploying both real-time and batch synchronizers to maintain consistency. The decentralized architecture improves ACL compliance, strengthens resource isolation, and reclaims storage while enabling zero-downtime migration at the petabyte scale.</p><p><strong><a href="https://www.uber.com/en-IN/blog/database-federation/">https://www.uber.com/en-IN/blog/database-federation/</a></strong></p><div><hr></div><h1>Anton Borisov: AutoMQ: Shared Storage Architecture Deep Dive</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r-r5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r-r5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 424w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 848w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 1272w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r-r5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic" width="1400" height="635" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:635,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r-r5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 424w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 848w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 1272w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Kafka&#8217;s shared-nothing architecture imposes high replication costs, slow failover, and tight coupling between storage and compute. The article explains how AutoMQ replaces local disk replication with S3-backed shared storage, using layered abstractions, WAL batching, metadata-driven ownership, and epoch fencing to enable stateless brokers and zero-copy failover. AutoMQ design eliminates the 3x replication tax and simplifies scaling to &#8220;add compute,&#8221; while accepting higher cold-read latency from object storage.</p><p><strong><a href="https://medium.com/fresha-data-engineering/automq-shared-storage-architecture-deep-dive-043c5226847e">https://medium.com/fresha-data-engineering/automq-shared-storage-architecture-deep-dive-043c5226847e</a></strong></p><div><hr></div><h1>Pinterest: Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B_ZY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B_ZY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 424w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 848w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 1272w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic" width="1400" height="515" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:515,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24764,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B_ZY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 424w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 848w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 1272w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>OOM in Spark jobs is an infamous issue across data processing, creating operational overhead and inefficient cluster utilization. The article explains how Auto Memory Retries dynamically adjusts executor resources by retrying failed tasks with higher CPU allocation or larger executors through modified Spark resource profiles. The elastic strategy reduced OOM failures by 96%, lowered infrastructure costs by avoiding over-provisioning, and improved overall pipeline reliability.</p><p><strong><a href="https://medium.com/pinterest-engineering/drastically-reducing-out-of-memory-errors-in-apache-spark-at-pinterest-c55d7dac2257">https://medium.com/pinterest-engineering/drastically-reducing-out-of-memory-errors-in-apache-spark-at-pinterest-c55d7dac2257</a></strong></p><div><hr></div><h1>StarTree: Consistent, Scalable Compaction for Real-Time Upserts in Apache Pinot</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R5Mt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R5Mt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 424w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 848w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 1272w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic" width="1301" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1301,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14034,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R5Mt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 424w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 848w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 1272w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Near-Real-Time upsert is my favorite subject to study, and I've worked with many OLAP engines. Apache Pinot always stands out for its flexible indexing and fast upsert capabilities. The article explains how StarTree&#8217;s SegmentRefreshTask compacts segments in the background by merging only valid records and ensuring atomic visibility with bitmap-based consistency controls. The approach reduces storage costs, supports sustained high ingestion rates, and maintains predictable query latency at a billion-key scale.</p><p><strong><a href="https://startree.ai/resources/upserts-compaction-in-apache-pinot-startree/">https://startree.ai/resources/upserts-compaction-in-apache-pinot-startree/</a></strong></p><div><hr></div><h1>Zepto: Debezium at Scale: An Open Source CDC Story from Zepto</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yR5T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yR5T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 424w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 848w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 1272w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yR5T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic" width="1050" height="285" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:285,&quot;width&quot;:1050,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11155,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yR5T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 424w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 848w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 1272w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>High-velocity CDC pipelines can overwhelm downstream databases due to redundant updates and MVCC-induced write amplification. The article explains how Zepto optimized Debezium by introducing an in-memory reduction buffer to collapse duplicate updates and a Postgres UNNEST-based batching strategy to reduce parsing overhead. These improvements stabilized CPU and I/O usage, eliminated replication lag during peak traffic, and ensured the database processes only final record states.</p><p><strong><a href="https://blog.zeptonow.com/debezium-at-scale-an-open-source-cdc-story-from-zepto-aa4b12e32bf7">https://blog.zeptonow.com/debezium-at-scale-an-open-source-cdc-story-from-zepto-aa4b12e32bf7</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item></channel></rss>