<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Data Engineering Weekly]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com</link><image><url>https://substackcdn.com/image/fetch/$s_!AdQk!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png</url><title>Data Engineering Weekly</title><link>https://www.dataengineeringweekly.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 14 Mar 2026 11:34:11 GMT</lastBuildDate><atom:link href="https://www.dataengineeringweekly.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Ananth Packkildurai]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[dataengineeringweekly@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[dataengineeringweekly@substack.com]]></itunes:email><itunes:name><![CDATA[Ananth Packkildurai]]></itunes:name></itunes:owner><itunes:author><![CDATA[Ananth Packkildurai]]></itunes:author><googleplay:owner><![CDATA[dataengineeringweekly@substack.com]]></googleplay:owner><googleplay:email><![CDATA[dataengineeringweekly@substack.com]]></googleplay:email><googleplay:author><![CDATA[Ananth Packkildurai]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[ETL is Dead]]></title><description><![CDATA[Why the shift from human-operated to agent-operated data warehouses demands a new architecture]]></description><link>https://www.dataengineeringweekly.com/p/etl-is-dead</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/etl-is-dead</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Wed, 11 Mar 2026 14:42:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BI6v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>More ETL pipelines will run in 2027 than in any year in history. AI will generate more extraction jobs, more transformation logic, and more loading routines than any team of data engineers could write by hand. The volume of ETL will explode.</p><p>And ETL is still dead.</p><p>Not dead the way Latin is dead &#8212; no one speaks it. Dead, the way landlines are dead &#8212; they still work, millions exist, but nobody builds their communication strategy around one. ETL is dead as the defining work of data engineering. Dead as the thing we hire for, build careers around, and organize teams to do. The pipelines keep running. The professional identity built around them does not survive.</p><h1>The Warehouse Was Always a Metaphor. Now the Metaphor Is Breaking.</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BI6v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BI6v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 424w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 848w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 1272w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BI6v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic" width="1456" height="578" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:578,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26399,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BI6v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 424w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 848w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 1272w, https://substackcdn.com/image/fetch/$s_!BI6v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5feea529-8813-4b35-b17c-a8580d63201a_2232x886.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We literally called it a data <em>warehouse</em>. And that wasn&#8217;t just naming &#8212; we replicated the entire physical warehouse operating model into the digital world. Racks became tables. Inventory management became catalogs. Forklifts became ETL pipelines. Floor workers became data engineers. Shift supervisors became analytics leads.</p><p>Every technique we built &#8212; <strong><a href="https://en.wikipedia.org/wiki/Dimensional_modeling">star schemas</a></strong>, slowly changing dimensions, <strong><a href="https://www.databricks.com/glossary/medallion-architecture">medallion architectures</a></strong>, conformed dimensions &#8212; served the same purpose as aisle markers and shelf labels in a physical warehouse: help a <em>human</em> walk in, find what they need, and carry it out.</p><p>Data modeling organizes information so humans can discover it. Data catalogs provided wayfinding to help humans navigate them. The medallion architecture created a pick-pack-ship assembly line where humans inspected and validated data at each station. Naming conventions &#8212; fact_orders, dim_customers &#8212; acted as signage so humans could read the shelves at a glance.</p><p>Every design decision is optimized for human cognition. And then the operator changed.</p><h1>What Happened When Robots Entered the Physical Warehouse</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YGa4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YGa4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 424w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 848w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 1272w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YGa4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic" width="1456" height="562" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24175,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YGa4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 424w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 848w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 1272w, https://substackcdn.com/image/fetch/$s_!YGa4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff4ee87f2-b944-4306-b222-399c99f16632_2182x842.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When Amazon deployed <strong><a href="https://en.wikipedia.org/wiki/Amazon_Robotics">Kiva robots</a></strong>, they didn&#8217;t replace human tasks one-for-one. They <strong><a href="https://spectrum.ieee.org/amazon-ai-robotics">redesigned the entire warehouse</a></strong> around a different operator.</p><p>Physical warehouses built for humans had wide aisles because humans need space to walk. They grouped items logically because humans need to remember where things are. They placed high-demand products at eye level because humans have ergonomic constraints. They posted signage everywhere because humans need wayfinding.</p><p>Robotic warehouses <strong><a href="https://www.aboutamazon.com/news/operations/amazon-robotics-robots-fulfillment-center">threw all of that out</a></strong>. Aisles shrank because robots don&#8217;t need shoulder width. Shelving went floor-to-ceiling because robots don&#8217;t have ergonomic limits. Logical grouping became unnecessary because robots navigate by coordinates, not memory. Signage disappeared because robots don&#8217;t read signs &#8212; they read instructions.</p><p>But the biggest gains weren&#8217;t physical. They were <em>cognitive</em>. Human warehouse workers carried an enormous cognitive load &#8212; remembering locations, making routing decisions, prioritizing picks, and mentally handling exceptions. Robots eliminated that cognitive burden entirely. The warehouse didn&#8217;t just move faster. It became a fundamentally different system that could handle complexity no human floor operation could manage.</p><h1>The Data Warehouse Is Still Designed for Human Forklift Operators</h1><p>Now look at our data warehouse through this lens.</p><p><strong><a href="https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/books/data-warehouse-dw-toolkit/">Star schemas and dimensional modeling</a></strong> exist so a human analyst can visualize how tables relate. A human needs to see the star &#8212; the fact table at the center, dimensions radiating outward. An agent doesn&#8217;t need a star. It needs a validated semantic definition of what each entity means and how entities connect.</p><p>Data catalogs are digital signage. We built them because humans need to browse and discover what&#8217;s in the warehouse. An agent doesn&#8217;t browse a catalog the way a human walks an aisle. It queries for a validated meaning.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Ube!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Ube!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 424w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 848w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 1272w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Ube!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic" width="1456" height="464" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Ube!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 424w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 848w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 1272w, https://substackcdn.com/image/fetch/$s_!7Ube!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06c6557-89ed-45d5-b20c-e36a700aa888_2078x662.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <strong><a href="https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion">medallion architecture</a></strong> &#8212; Bronze to Silver to Gold &#8212; is an assembly line designed for human inspection at each station. Raw data lands, gets progressively cleaned, and arrives ready for consumption. Each station assumes a human will inspect, validate, and pass the data forward. And at each handoff, context erodes &#8212; the original meaning collapses a little more, like a game of telephone played silently in the pipeline.</p><p>We optimized every layer of the data warehouse for human cognitive constraints. And just like the physical warehouse, those very optimizations become limitations when the operator changes.</p><h1>Where the Analogy Holds &#8212; and Where It Breaks</h1><p>I want to be precise about this, because imprecise analogies are how our industry ends up with decade-long hype cycles built on half-truths.</p><p>The analogy holds powerfully for <em>navigation and discovery</em>. Physical warehouses organized shelves for human wayfinding. Data warehouses organize tables for human querying. Robots don&#8217;t need aisle signs. Agents don&#8217;t need star schemas to find data. That part maps cleanly.</p><p>But here&#8217;s where it breaks: physical goods don&#8217;t change meaning based on how you store them. A box of shoes is a box of shoes, whether it sits on shelf A3 or shelf Z9. Data is different. How you structure data shapes what questions you can ask of it. A normalized schema enables different analytical patterns than a denormalized one. A slowly changing dimension preserves the temporal context that a snapshot table destroys.</p><p>Structure still matters for agent-operated data. It just serves a different purpose. Instead of organizing for human navigation &#8212; &#8220;how do I find the data?&#8221; &#8212; you organize for agent operation &#8212; &#8220;what data and context does this agent need for this task?&#8221; Think about how AI tools work with a scoped working folder. You don&#8217;t reorganize your filesystem into an agent-friendly layout. You give the agent a well-scoped boundary, and it operates within it. The structure shifts from navigational to operational &#8212; from shelf labels to access boundaries.</p><h1>The Thinking Survives. The Format May Not</h1><p>I took the last class Ralph Kimball taught before his retirement. I remember the vivid conversation around HBase (which was popular at the time) and the notion of versioning to handle slowly changing dimensions. I&#8217;ve internalized dimensional modeling deeply enough to know which parts are permanent and which parts are artifacts of their era.</p><p>Kimbal didn&#8217;t start the training with the star schema and slowly changing dimensions. Kimball&#8217;s <strong><a href="https://www.kimballgroup.com/wp-content/uploads/2013/08/2013.09-Kimball-Dimensional-Modeling-Techniques11.pdf">dimensional modeling process</a></strong> starts with two steps: <em><strong>identify the business process and select the grain</strong></em>. These steps ask the most fundamental questions in data engineering &#8212; what does the business actually do, and at what level of detail does it matter? Only after answering those do you design the dimensions, the facts, and the star schema.</p><p>Steps one and two are context architecture. They always were. Identifying the business process means understanding the semantic reality of what the organization does. Selecting the grain means choosing the level of meaning that matters. That thinking is more relevant today than it was in <strong><a href="https://www.wiley.com/en-us/The+Data+Warehouse+Toolkit:+The+Definitive+Guide+to+Dimensional+Modeling,+3rd+Edition-p-9781118530801">1996</a></strong>.</p><p>Steps three and four &#8212; the star schema, the dimension tables, the fact tables &#8212; were a rendering choice. They were the best output format for the consumer of that era: a human analyst writing SQL against a relational database. The star schema serialized business understanding into a structure that humans could query using the available tools.</p><p><em><strong>The consumer has changed or is changing.</strong></em> The rendering should too. When the consumer is an AI agent, the same analytical thinking about business processes and grain produces a Context Store entry &#8212; a validated, versioned, queryable semantic definition &#8212; not a fact table. The thinking survives. The format may not.</p><p>Dismissing dimensional modeling entirely would be ignorant. Clinging to its output format when the consumer has fundamentally changed would be equally so.</p><h1>The Pendulum</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dIWJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dIWJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 424w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 848w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 1272w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic" width="1456" height="836" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:836,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31547,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dIWJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 424w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 848w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 1272w, https://substackcdn.com/image/fetch/$s_!dIWJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F087d9551-a494-41bd-a648-57493f9bf87f_2242x1288.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every era of data architecture has tried to solve the same tension: semantic precision versus operational flexibility.</p><p>The relational era chose precision. ERDs, primary keys, foreign keys, referential integrity, constraints &#8212; the schema <em>was</em> the semantic contract. <strong><a href="https://en.wikipedia.org/wiki/Bill_Inmon">Bill Inmon&#8217;s Corporate Information Factory</a></strong> formalized this into an enterprise architecture. It worked. It encoded business meaning directly into the physical structure. But it was rigid. I remember interviewing at a company in the pre-Hadoop era and asking what their current priority was. The interviewer told me they were working on implementing a schema change in a day rather than a month. That was the state of the art &#8212; a month to add a column, because the semantic contracts were so tightly welded to the physical structure that touching one meant touching everything.</p><p><strong><a href="https://www.databricks.com/discover/data-lakes/history">Hadoop&#8217;s</a></strong> answer was brute force. Sheer machine power, schema-on-read, commodity hardware &#8212; throw everything in and figure it out later. It broke the operational rigidity overnight. And it also broke every semantic contract the relational era had built. We traded meaning for speed and went too far. The data lake became a <strong><a href="https://cacm.acm.org/blogcacm/why-the-data-lake-is-really-a-data-swamp/">data swamp</a></strong> because nobody could remember what anything meant &#8212; the constraints that encoded that meaning were gone.</p><p>The lakehouse tried to find a middle ground. <strong><a href="https://iceberg.apache.org/">Iceberg</a>, <a href="https://delta.io/">Delta</a>, <a href="https://hudi.apache.org/">Hudi</a></strong> &#8212; the flexibility of the lake with some structure of the warehouse. Better. But the semantic layer remained an afterthought.</p><blockquote><p><em><strong>catalogs, documentation, and governance overlays that nobody maintained because nobody&#8217;s career depended on them being right.</strong></em> </p></blockquote><p>Even recent efforts like Snowflake&#8217;s <strong><a href="https://www.snowflake.com/en/blog/open-semantic-interchanges-specs-finalized/">Open Semantic Interchange</a></strong> initiative acknowledge the gap &#8212; the industry is only now trying to standardize how semantic meaning travels between tools.</p><p>Each swing of the pendulum traded one problem for another. Rigidity for meaninglessness. Meaninglessness for a partial structure. What none of them achieved was <em>decoupling</em> &#8212; semantic precision that doesn&#8217;t require physical rigidity. Context that travels alongside the data but isn&#8217;t welded to the table structure. Change the schema in seconds. The context updates through the Contextualize pipeline. The meaning stays current without the rigidity.</p><p>That decoupling is what ECL provides. It&#8217;s the first architecture that doesn&#8217;t force you to choose between knowing what your data means and being able to change it.</p><h1>The Graveyard of Good Intentions</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A88K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A88K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 424w, https://substackcdn.com/image/fetch/$s_!A88K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 848w, https://substackcdn.com/image/fetch/$s_!A88K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 1272w, https://substackcdn.com/image/fetch/$s_!A88K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A88K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic" width="1456" height="483" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:483,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19421,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A88K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 424w, https://substackcdn.com/image/fetch/$s_!A88K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 848w, https://substackcdn.com/image/fetch/$s_!A88K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 1272w, https://substackcdn.com/image/fetch/$s_!A88K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe5acb56-0d07-4548-8b58-cc641124d250_2178x722.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I know what the skeptics are thinking, because I&#8217;ve thought it myself: we&#8217;ve heard this before.</p><p>Bill Inmon literally wrote the book on this in 2007 &#8212; <strong><a href="https://www.goodreads.com/en/book/show/1982171">Business Metadata: Capturing Enterprise Knowledge</a> </strong>&#8212; which covers semantics, ontologies, business rules, and the capture of tacit knowledge. He laid out a complete methodology for capturing it. The methodology was sound. The economics weren&#8217;t there yet.</p><p>Business glossaries in the 2000s promised to capture institutional knowledge. They became static documents that nobody updated. Semantic layers in the 2010s promised a unified layer of meaning. They became another piece of middleware to maintain. Data catalogs promised discoverability and governance, but soon <strong><a href="https://www.dataengineeringweekly.com/p/data-catalog-a-broken-promise">proved to be useless</a></strong>. Many became expensive shelfware. Enterprise knowledge graphs <strong><a href="https://www.cutter.com/article/knowledge-graph-implementation-costs-obstacles">promised connected meaning</a></strong>. Most never made it past the proof-of-concept stage.</p><p>Every generation of data practitioners has pointed at the same north star: capture business meaning as a first-class artifact. Every generation has underestimated the organizational gravity that pulls teams back to &#8220;just get the data there, and we&#8217;ll figure out what it means later.&#8221;</p><blockquote><p><em><strong>So what makes this time structurally different? One thing: the consumer changed from forgiving to unforgiving.</strong></em></p></blockquote><p>When the consumer was a human analyst, missing context was inconvenient. The analyst would Slack a colleague, read the dbt code, ask in standup, and check the wiki. Humans are remarkably good at filling semantic gaps through social channels. Bad metadata produced frustrated analysts, not system failures.</p><p>When the consumer is an AI agent, missing context produces systematic errors at scale. The agent doesn&#8217;t Slack anyone. It doesn&#8217;t read tribal knowledge. It sees a column called rev_adj, makes its best inference, and acts &#8212; confidently, consistently, and potentially wrong across every downstream decision. Bad context doesn&#8217;t produce frustration. It produces hallucination at an enterprise scale.</p><p>For the first time, the cost of missing context exceeds the cost of maintaining it. That economic inversion is what none of the previous attempts had. Business glossaries failed because humans bore the cost of maintaining them, while the benefit was diffuse. The Context Store succeeds or fails based on whether agents produce reliable results &#8212; and that feedback loop is immediate, measurable, and impossible to ignore.</p><p>The graveyard is real. But the economics changed.</p><h1>What Replaces It</h1><p>ETL asked: Did the data land? ECL asks: Can the data be trusted? I introduced the <strong><a href="https://www.dataengineeringweekly.com/p/data-engineering-after-ai">ECL framework</a></strong> in my earlier article on data engineering after AI.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jOKr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jOKr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 424w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 848w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 1272w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jOKr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic" width="1456" height="663" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:663,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21454,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190620055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jOKr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 424w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 848w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 1272w, https://substackcdn.com/image/fetch/$s_!jOKr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde2f9d7-0220-4897-943c-2673213e39c2_2270x1034.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Extract remains. Data still moves from source systems to analytical environments. That work still requires engineering judgment about reliability, latency, and failure modes. AI handles more of the mechanical construction. Humans make the architectural decisions.</p><p>Contextualize is the new center of gravity. A dedicated, agentic pipeline that builds and maintains a living store of semantic context. It isn&#8217;t documentation. It isn&#8217;t a catalog. It&#8217;s an engineering artifact with its own trigger model, validation layer, and storage &#8212; the Context Store.</p><p>The Context Store holds two types of objects. Context objects capture long-lived semantic definitions &#8212; what &#8220;revenue&#8221; means, who validated that definition, when, and at what confidence level. These compounds increase in value over time. Decision objects capture what agents produce when they act on context &#8212; which definitions they used, what they inferred, and what they recommended. These create the audit trail.</p><p>Link connects entities across the data landscape &#8212; and emerging standards like <strong><a href="https://www.anthropic.com/news/model-context-protocol">Model Context Protocol (MCP)</a></strong> are starting to standardize how agents access data without moving it. Not just table joins &#8212; semantic relationships between business entities across systems. A customer in CRM is linked to a user in your product, linked to a session in your support tool. Whether you implement that as a graph, a mapping table, or a markdown file matters less than whether the linkage is validated and the semantic relationship is explicit.</p><p>And because data is inherently social in nature, you don&#8217;t build this all at once. You start with one business flow. One critical table. Early bind where you control the data and can hold producers accountable for meaning. Late bind where data comes from outside your accountability boundary &#8212; third-party feeds, undocumented internal systems, legacy data where the person who knew what the fields meant left five years ago. Even one table, well contextualized, starts compounding as you connect it to the next one, and the next one.</p><h1>Long Live the Context Architect</h1><p>The physical warehouse workers who resisted robotics didn&#8217;t save their jobs. They delayed their own transition. Those who moved into robotics coordination, system design, and exception architecture found themselves more valued, more strategic, and more central to the operation than they were when driving forklifts.</p><p>Data engineers who built their identity around moving data from one bucket to another have felt that identity under pressure for a while now. That pressure isn&#8217;t going away. AI will write your Spark jobs. AI will generate your dbt models. <strong><a href="https://www.elitebrains.com/blog/aI-generated-code-statistics-2025">AI will build more pipelines</a></strong> in a year than your team could build in a decade.</p><p>But AI cannot decide what &#8220;revenue&#8221; means for your organization. It cannot negotiate data contracts between producing and consuming teams. It cannot design the appropriate level of context for an agent addressing a specific business problem. It cannot build the organizational agreements that make semantic definitions stick. That work requires institutional knowledge, cross-functional coordination, and architectural judgment. That work is context architecture.</p><p>The data engineer&#8217;s value migrates from pipeline reliability to semantic reliability. From &#8220;the job ran&#8221; to &#8220;the meaning is right.&#8221; From operating the warehouse floor to designing the system that makes robotic operation trustworthy.</p><p>The frontier is genuinely open. Nobody has this figured out yet. The practitioners who invest in the architecture of meaning &#8212; not just the mechanics of movement &#8212; will define this discipline for the next decade.</p><div class="pullquote"><p><strong>ETL is dead. Long live the Context Architect.</strong></p></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #260]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-260</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-260</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 09 Mar 2026 04:31:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_08_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bayI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!bayI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!bayI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!bayI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bayI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30259,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_08_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bayI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!bayI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!bayI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!bayI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64ae4f4-c033-4325-b742-4775a4ba9a3c_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Best practices for orchestrating Databricks at scale</h1><p>As Databricks deployments scale, a familiar pattern emerges: multiple workspaces, multiple teams, and no reliable way to manage the dependencies between them.<br><br>In this hands-on deep dive, we'll show you how to build a cross-workspace control plane using Dagster on top of your existing Databricks environment. Demo-heavy and practitioner-focused, you'll leave with working patterns you can apply to your own platform the same day.</p><p><strong><a href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_08_26_data_engineering_weekly">Save your spot now</a></strong></p><div><hr></div><h1>underCurrent: <a href="https://current.confluent.io/data-engineers?utm_campaign=tm.devx_cd.underCurrent&amp;utm_source=newsletter&amp;utm_medium=dew">A one-day conference for data engineers and architects</a></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YgTh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YgTh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 424w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 848w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 1272w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YgTh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16850,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YgTh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 424w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 848w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 1272w, https://substackcdn.com/image/fetch/$s_!YgTh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F335fca03-45f7-4633-b9bd-be08f47f71c2_2400x1254.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Confluent is hosting a free one-day conference with a catch: there&#8217;s no catch. It&#8217;s a single-track event with no sponsors and no product pitches&#8212;just technical talks for data engineers and architects.<br><br>&#127897;&#65039; Speakers include <strong>Joe Reis, Holden Karau, and Max Beauchemin</strong><br>&#128683; No vendors. No sales pitches<br>&#10024; 100% free to attend <br>&#128197; <strong>March 26</strong> <br>&#128205; San Francisco<br>&#127903;&#65039; <strong>Limited to 100 seats</strong> &#8212; <strong><a href="https://current.confluent.io/data-engineers?utm_campaign=tm.devx_cd.underCurrent&amp;utm_source=newsletter&amp;utm_medium=dew">register for free here</a></strong></p><div><hr></div><h1>Vinoth Govindarajan: OpenClaw Architecture</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!P3Mc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!P3Mc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 424w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 848w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 1272w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic" width="1456" height="717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:717,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12218,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!P3Mc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 424w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 848w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 1272w, https://substackcdn.com/image/fetch/$s_!P3Mc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75673d5b-29a1-476f-91ff-1b822acd8f08_1456x717.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Production AI agents fail at scale because uncontrolled state mutations corrupt execution and create unpredictable behavior. In &#8220;The Agent Stack,&#8221; Vinoth Govindarajan outlines OpenClaw&#8217;s architecture, in which isolated execution contexts and strict invariants prevent state leakage, while sessions enable async pause-resume semantics. The pattern standardizes how teams decouple short-term context from persistent state, ensuring agents reliably rehydrate their mental model and enforce authorization boundaries that gate tool access to user privilege levels.</p><p><strong><a href="https://theagentstack.substack.com/p/openclaw-architecture-part-1-control">Part 1</a>, <a href="https://theagentstack.substack.com/p/openclaw-architecture-part-2-concurrency">Part 2</a>, <a href="https://openclawunboxed.com/p/openclaw-architecture-part-3-memory">Part 3.1</a>, <a href="https://theagentstack.substack.com/p/openclaw-architecture-part-3-memory">Part 3.2</a>, <a href="https://theagentstack.substack.com/p/openclaw-architecture-part-4-security">Part 4</a></strong></p><div><hr></div><h1>Pinterest: Unified Context-Intent Embeddings for Scalable Text-to-SQL</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EtBR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EtBR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 424w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 848w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 1272w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EtBR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic" width="1400" height="655" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:655,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24113,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EtBR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 424w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 848w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 1272w, https://substackcdn.com/image/fetch/$s_!EtBR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9528d97e-84de-44dc-bbd5-17be18a55cf4_1400x655.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Navigating sprawling data warehouses forces analysts to choose between slow manual exploration and unreliable keyword-based search. Pinterest Engineering built a production Analytics Agent that embeds historical SQL queries as semantic intent signatures, injecting business glossary terms and extracting structural patterns (join keys, filters, usage signals) to retrieve contextually relevant tables at scale. The system reached 40% internal adoption within two months by standardizing discovery through an asset-first pattern, converting years of institutional SQL knowledge into a searchable, governance-aware library.</p><p><strong><a href="https://medium.com/pinterest-engineering/unified-context-intent-embeddings-for-scalable-text-to-sql-793635e60aac">https://medium.com/pinterest-engineering/unified-context-intent-embeddings-for-scalable-text-to-sql-793635e60aac</a></strong></p><div><hr></div><h1>Francesca Lazzeri: AI evals platforms: A comparative guide for production AI systems</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kftj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kftj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 424w, https://substackcdn.com/image/fetch/$s_!kftj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 848w, https://substackcdn.com/image/fetch/$s_!kftj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 1272w, https://substackcdn.com/image/fetch/$s_!kftj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kftj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic" width="753" height="330" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:330,&quot;width&quot;:753,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12090,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kftj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 424w, https://substackcdn.com/image/fetch/$s_!kftj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 848w, https://substackcdn.com/image/fetch/$s_!kftj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 1272w, https://substackcdn.com/image/fetch/$s_!kftj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa98a2a07-932c-4b96-8027-479d5bbc9456_753x330.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Production AI systems fail silently in ways demos never expose, forcing teams to replace manual testing with automated evaluation as the enterprise LLM market scales toward $71.1 billion by 2034. A comparative analysis of six leading eval platforms reveals a consolidation around open standards (OpenTelemetry, OpenInference) and specialized architectures&#8212;Microsoft AI Foundry embeds red-teaming agents into Azure workflows, while Galileo replaces expensive LLM judges with smaller consensus models (Luna) to reduce eval latency. The shift standardizes safety as a structural property of development, enabling teams to catch jailbreaks and data leaks early while choosing platform fit based on stack priorities: simulation-first, research rigor, or ecosystem depth.</p><p><strong><a href="https://medium.com/data-science-at-microsoft/how-do-you-know-your-ai-actually-works-b1a380a07825">https://medium.com/data-science-at-microsoft/how-do-you-know-your-ai-actually-works-b1a380a07825</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_08_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uymw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!uymw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!uymw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!uymw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uymw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18459,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_08_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uymw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!uymw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!uymw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!uymw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c0d34d0-9f2d-40a3-94e1-f596c56b03df_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_08_26_data_engineering_weekly">Download the free guide</a></strong></p><div><hr></div><h1>Netflix: MediaFM - The Multimodal AI Foundation for Media Understanding at Netflix</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p3o3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p3o3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 424w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 848w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 1272w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p3o3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic" width="1400" height="1172" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1172,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16754,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p3o3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 424w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 848w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 1272w, https://substackcdn.com/image/fetch/$s_!p3o3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6c8b569-8c07-4d7d-924e-1cb9ccc84975_1400x1172.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Understanding content at scale requires machine-readable representations that capture narrative structure, not just visual features&#8212;a challenge intensified as streaming catalogs exceed tens of thousands of titles. Netflix built MediaFM, a tri-modal transformer that fuses video frames, audio (wav2vec2), and subtitles into shot-level embeddings using Masked Shot Modeling, with a [GLOBAL] token injecting title-level context (synopsis, genre) to ground each segment. The model powers ad placement, clip ranking, content tagging, and cold-start recommendations by contextualizing shots within narrative sequence, outperforming external benchmarks and enabling machine-readable understanding across Netflix's entire catalog.</p><p><strong><a href="https://netflixtechblog.com/mediafm-the-multimodal-ai-foundation-for-media-understanding-at-netflix-e8c28df82e2d">https://netflixtechblog.com/mediafm-the-multimodal-ai-foundation-for-media-understanding-at-netflix-e8c28df82e2d</a></strong></p><div><hr></div><h1>Nabin Debnath: Building a Least-Privilege AI Agent Gateway for Infrastructure Automation with MCP, OPA, and Ephemeral Runners</h1><p>AI agents in infrastructure automation bypass traditional guardrails by making runtime decisions without human validation, risking silent resource destruction or credential exfiltration at scale. The author writes about the Agent Gateway to treat the agents as untrusted requesters, layering Model Context Protocol (MCP) for tool discovery, Open Policy Agent (OPA) for intent-based authorization, and ephemeral Kubernetes runners for isolated execution. The pattern enforces least privilege by mediating all API calls through policy code, validates plan integrity against immutable hashes, and surfaces decision reasoning via OpenTelemetry&#8212;standardizing agent governance with SLO targets (100ms policy decisions, 5s runner startup) that prevent silent bypasses.</p><p><strong><a href="https://www.infoq.com/articles/building-ai-agent-gateway-mcp/">https://www.infoq.com/articles/building-ai-agent-gateway-mcp/</a></strong></p><div><hr></div><h1>Dropbox: Using LLMs to amplify human labeling and improve Dash search relevance</h1><p>Enterprise search ranking requires massive labeled datasets, but traditional human annotation is prohibitively slow and cannot scale to sensitive content across billions of internal documents. Dropbox Dash uses LLMs as labeling force multipliers by calibrating a small human-labeled set to generate millions of relevance judgments offline, then training lightweight production models (XGBoost) on synthetic labels at scale. The pattern standardizes judgment consistency by pairing contextual research tools (for acronyms and ambiguous queries) with programmatic prompt optimization (DSPy), enabling continuous ranking improvements while keeping human oversight as the ground truth rather than replacing it.</p><p><strong><a href="https://dropbox.tech/machine-learning/llm-human-labeling-improving-search-relevance-dropbox-dash">https://dropbox.tech/machine-learning/llm-human-labeling-improving-search-relevance-dropbox-dash</a></strong></p><div><hr></div><h1>Zalando: Why We Ditched Flink Table API Joins: Cutting State by 75% with DataStream Unions</h1><p>Declarative SQL joins in Flink multiply state across operators, forcing teams to choose between snapshot overhead or operational instability&#8212;a scaling bottleneck for pipelines enriching millions of real-time product records. Zalando replaced chained Table API joins with a custom KeyedProcessFunction that unions all streams into a single keyed DataStream, storing each product&#8217;s enriched state once in RocksDB instead of redundantly across join operators. The shift cut state size by 75% (235GB to 56GB), reduced snapshot time by 77% (11 minutes to 2.5 minutes), and lowered AWS costs by 13%&#8212;demonstrating how imperative control over stream topology recovers efficiency when declarative abstractions misalign with physical execution.</p><p><strong><a href="https://engineering.zalando.com/posts/2026/03/why-we-ditched-flink-table-api-joins-cutting-state.html">https://engineering.zalando.com/posts/2026/03/why-we-ditched-flink-table-api-joins-cutting-state.html</a></strong></p><div><hr></div><h1>Aihua Xu &amp; Andrew Lamb: Variant Type in Apache Parquet for Semi-Structured Data</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4lQG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4lQG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 424w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 848w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 1272w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4lQG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic" width="1024" height="633" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ea671ece-9073-414d-adc3-952731dc5248_1024x633.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:633,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53802,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/190348672?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4lQG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 424w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 848w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 1272w, https://substackcdn.com/image/fetch/$s_!4lQG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fea671ece-9073-414d-adc3-952731dc5248_1024x633.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Semi-structured data in columnar formats forces a choice between slow JSON parsing or rigid schemas that block evolution, creating friction in pipelines handling heterogeneous records. Apache Parquet&#8217;s new Variant type uses binary-encoded metadata plus value fields, enabling direct nested field access without full-document parsing while preserving native types (timestamps, integers) that JSON loses. The type standardizes schema flexibility through &#8220;shredding&#8221;&#8212;extracting hot fields into strongly-typed columns for predicate pushdown and pruning&#8212;allowing heterogeneous records to coexist in one column, reducing migration overhead and accelerating adoption across DuckDB, Spark 4.0, and Snowflake.</p><p><strong><a href="https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/">https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/</a></strong></p><div><hr></div><h1>Pranav Mehta: Silent Data Loss in ClickHouse: 3 Reasons Your Distributed Queue Keeps Growing</h1><p>ClickHouse distributed inserts silently fail when coordination services downtime, execution timeouts, or concurrency limits block the async flush pipeline, leaving data trapped in on-disk queues while clients receive no error signals. The author identifies three failure modes: <em>Keeper/ZooKeeper downtime forcing ReplicatedMergeTree read-only, oversized insert blocks exceeding max_execution_time that cork sequential queue processing, and exhausted user concurrency slots starving background INSERT workers</em>. The pattern demands proactive monitoring of DistributedFilesToInsert (alert at 50+ files), debugging via system.distribution_queue.last_exception, and inode-aware filesystem choice (XFS over ext4) to prevent silent data loss and system crashes from queue explosion.</p><p><strong><a href="https://medium.com/@pranavmehta94/silent-data-loss-in-clickhouse-3-reasons-your-distributed-queue-keeps-growing-9bf6b8af88e5">https://medium.com/@pranavmehta94/silent-data-loss-in-clickhouse-3-reasons-your-distributed-queue-keeps-growing-9bf6b8af88e5</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #259]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-259</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-259</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 02 Mar 2026 03:57:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_01_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OlRC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OlRC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24006,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_01_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OlRC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!OlRC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06da4e96-6d18-41c1-95b5-bd22999807a9_2400x1260.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>AI is moving fast. Is your data platform ready?</h1><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br><br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=03_01_26_data_engineering_weekly">Download the AI Modernization Guide</a></strong></p><div><hr></div><h1>underCurrent: <a href="https://current.confluent.io/data-engineers?utm_campaign=tm.devx_cd.underCurrent&amp;utm_source=newsletter&amp;utm_medium=dew">A one-day conference for data engineers and architects</a></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Ad5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Ad5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 424w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 848w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 1272w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16850,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Ad5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 424w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 848w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 1272w, https://substackcdn.com/image/fetch/$s_!2Ad5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffe2f1aec-0512-4572-8eee-f14bf7b735ee_2400x1254.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Confluent is hosting a free one-day conference with a catch: there&#8217;s no catch. It&#8217;s a single-track event with no sponsors and no product pitches&#8212;just technical talks for data engineers and architects.<br><br>&#127897;&#65039; Speakers include <strong>Joe Reis</strong>, <strong>Holden Karau</strong>, and <strong>Max Beauchemin</strong><br>&#128683; No vendors. No sales pitches<br>&#10024; 100% free to attend <br>&#128205; San Francisco &#128197; March 26 <br>&#127903;&#65039; Limited to 100 seats &#8212; register for free <strong><a href="https://current.confluent.io/data-engineers?utm_campaign=tm.devx_cd.underCurrent&amp;utm_source=newsletter&amp;utm_medium=dew">here</a></strong></p><div><hr></div><h1>Netflix: DataJunction as Netflix&#8217;s answer to the missing piece of the modern data stack</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VaGI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VaGI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 424w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 848w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 1272w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VaGI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp" width="512" height="354" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9726,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VaGI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 424w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 848w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 1272w, https://substackcdn.com/image/fetch/$s_!VaGI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0868d85-4037-430d-95f6-47be9cb1fba8_512x354.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Metric inconsistency and definition sprawl across distributed teams create onboarding bottlenecks and fragment analytics workflows. Netflix built DataJunction, an open-source semantic layer that decouples metric definitions from compute through a graph-based metadata model and SQL generation engine. This standardizes metrics across the experimentation platform, reducing onboarding from weeks to hours, while enabling expansion across all business verticals and LLM integration for auditable metric lineage.</p><p><strong><a href="https://netflixtechblog.medium.com/datajunction-as-netflixs-answer-to-the-missing-piece-of-the-modern-data-stack-92af926b40a5">https://netflixtechblog.medium.com/datajunction-as-netflixs-answer-to-the-missing-piece-of-the-modern-data-stack-92af926b40a5</a></strong></p><div><hr></div><h1>Benoit Pimpaud: Specs Should Be Equations, Not Essays</h1><p>As AI automates code generation, the engineering bottleneck shifts from writing implementation to defining precise specifications. the author argues that natural language specifications create compounding ambiguity when parsed by LLMs and proposes layered specifications that combine text, diagrams, and mathematical notation as constraint definitions for AI iteration. Mathematical specs eliminate interpretation drift, enabling AI agents to generate correct programs by satisfying invariants rather than reconstructing intent from prose.</p><p><strong><a href="https://fromanengineersight.substack.com/p/specs-should-be-equations-not-essays">https://fromanengineersight.substack.com/p/specs-should-be-equations-not-essays</a></strong></p><div><hr></div><h1>Notion: Balancing cost and reliability for Spark on Kubernetes</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8jTD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8jTD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 424w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 848w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 1272w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8jTD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic" width="616" height="316" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:316,&quot;width&quot;:616,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8jTD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 424w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 848w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 1272w, https://substackcdn.com/image/fetch/$s_!8jTD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F209221f3-17f2-4695-93d1-fafd90f239c4_616x316.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Spark clusters on Kubernetes face a fundamental tension between aggressive cost optimization through spot instances and job reliability during capacity interruptions. Notion reduced compute costs by 60&#8211;90% through EKS migration with Karpenter bin-packing, then open-sourced Spot Balancer&#8212;a Kubernetes webhook that enforces stable spot-to-on-demand ratios per job, preventing cascade failures during AWS termination windows. Spot Balancer abstracts infrastructure trade-offs into developer-friendly stability tiers, enabling teams to optimize costs without sacrificing job completion rates.</p><p><strong><a href="https://www.notion.com/blog/balancing-cost-and-reliability-for-spark-on-kubernetes">https://www.notion.com/blog/balancing-cost-and-reliability-for-spark-on-kubernetes</a></strong></p><div><hr></div><h1>Sponsored: Building a Cross-Workspace Control Plane for Databricks</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_01_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uy7d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uy7d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24982,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_01_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uy7d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!uy7d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f39374-3d44-46e6-a142-145a1b94fd31_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As Databricks deployments scale, a familiar pattern emerges: multiple workspaces, multiple teams, and no reliable way to manage the dependencies between them.<br>In this hands-on deep dive, we'll show you how to build a cross-workspace control plane using Dagster on top of your existing Databricks environment. Demo-heavy and practitioner-focused, you'll leave with working patterns you can apply to your own platform the same day.</p><p><strong><a href="https://dagster.io/events/deep-dive-building-a-cross-workspace-control-plane-for-databricks?utm_campaign=39250579-26-03-WBNR_Deep_DIVE_DATABRICKS&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_databricks&amp;utm_content=03_01_26_data_engineering_weekly">Register now</a></strong></p><div><hr></div><h1>Apache Iceberg: Introducing the Apache Iceberg File Format API</h1><p>It is indeed an exciting development in Iceberg to support a plugable file format API spec. As we increasingly handle unstructured data, this will significantly enhance data management practices through unified governance and compliance. Interestingly, Apache Hudi&#8217;s <strong><a href="https://github.com/apache/hudi/issues/14127">RFC-100</a></strong> is, in fact, the feature request to support the Lance File Format. </p><p><strong><a href="https://iceberg.apache.org/blog/apache-iceberg-file-format-api/">https://iceberg.apache.org/blog/apache-iceberg-file-format-api/</a></strong></p><div><hr></div><h1>Delta Lake: The next evolution of Delta - Catalog-Managed Tables</h1><blockquote><p><em>We went through the full cycle, from exposing the files directly through Hadoop to Snowflake-style cloud data warehouses, to Iceberg-style direct file access, back to catalog-managed tables. </em></p></blockquote><p>Nonetheless, it will be interesting to watch DuckLake-style catalog-managed tables vs object-store-style managed tables. </p><p><strong><a href="https://delta.io/blog/2026-02-02-delta-catalog-managed-tables/">https://delta.io/blog/2026-02-02-delta-catalog-managed-tables/</a></strong></p><div><hr></div><h1>Microsoft Fabric: Under the hood: an introduction to the Native Execution Engine for Microsoft Fabric</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ah5O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ah5O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 424w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 848w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 1272w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ah5O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic" width="496" height="465" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:465,&quot;width&quot;:496,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7782,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ah5O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 424w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 848w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 1272w, https://substackcdn.com/image/fetch/$s_!ah5O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F962b9eef-68ea-4abd-914f-bf0319e0ec6e_496x465.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The Apache Gluten project is continually making an impact on the Spark ecosystem, bringing unique optimization and efficiency. Microsoft Fabric writes an under-the-hood story of adopting Apache Gluten in its Fabric platform. </p><p><strong><a href="https://blog.fabric.microsoft.com/en-us/blog/under-the-hood-an-introduction-to-the-native-execution-engine-for-microsoft-fabric/">https://blog.fabric.microsoft.com/en-us/blog/under-the-hood-an-introduction-to-the-native-execution-engine-for-microsoft-fabric/</a></strong></p><div><hr></div><h1>Pinterest: Piqama - Pinterest Quota Management Ecosystem</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WV0P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WV0P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 424w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 848w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 1272w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WV0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic" width="1400" height="701" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:701,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17308,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WV0P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 424w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 848w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 1272w, https://substackcdn.com/image/fetch/$s_!WV0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7dbf8e98-63fe-4f9c-b006-22a2778b6ed6_1400x701.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>As companies scale, manual and static quota systems become bottlenecks, forcing engineers to choose between over-provisioning resources and managing brittle enforcement logic. Pinterest developed Piqama, a unified quota platform that dynamically right-sizes limits using historical data stored in Apache Iceberg, then applies custom enforcement strategies across batch schedulers and online services. Piqama centralizes resource governance across hardware and service metrics, enabling teams to optimize capacity allocation while linking consumption directly to financial costs.</p><p><strong><a href="https://medium.com/pinterest-engineering/piqama-pinterest-quota-management-ecosystem-dc7881433bf5">https://medium.com/pinterest-engineering/piqama-pinterest-quota-management-ecosystem-dc7881433bf5</a></strong></p><div><hr></div><h1>LinkedIn: Engineering LinkedIn&#8217;s job ingestion system at scale</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ee5n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ee5n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 424w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 848w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic" width="1456" height="601" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:601,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ee5n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 424w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 848w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ee5n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7676ea75-a146-4fd6-8136-bf59759a6822_1920x792.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Ingestion systems struggle to scale source onboarding&#8212;hard-coded extraction logic creates engineering bottlenecks that slow integration of new data partners. LinkedIn shifted extraction logic from code to configuration files called Sitemaps, enabling AI tools and browser plugins to onboard sources without engineering deployments. At the same time, a transactional state machine enforces precise failure boundaries across parallel mining tasks. The configuration-driven approach reduces onboarding time from weeks to hours, allowing LinkedIn to ingest 20TB daily across thousands of global sources. </p><p><strong><a href="https://www.linkedin.com/blog/engineering/infrastructure/engineering-linkedins-job-ingestion-system-at-scale">https://www.linkedin.com/blog/engineering/infrastructure/engineering-linkedins-job-ingestion-system-at-scale</a></strong></p><div><hr></div><h1>Shopify: The generative recommender behind Shopify&#8217;s commerce engine</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jh6b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jh6b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 424w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 848w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 1272w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jh6b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic" width="1456" height="886" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:886,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13795,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/189610818?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jh6b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 424w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 848w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 1272w, https://substackcdn.com/image/fetch/$s_!jh6b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff7f4c783-688f-493d-bd0b-e232f8014d34_1965x1196.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Recommendation systems traditionally treat purchases as isolated events, missing the temporal and causal structure that shapes buyer journeys across millions of products. Shopify transitioned to an autoregressive sequence model that treats commerce journeys as token sequences, implementing RoPE-inspired rotary encoding combined with relative attention bias to capture temporal gaps and seasonality across its catalog. The time-aware attention mechanism drove +0.94% order growth and +0.71% conversion lift while achieving 7.3x training speedup through optimized CUDA kernels, enabling Shopify to integrate richer context into a unified generative framework.</p><p><strong><a href="https://shopify.engineering/generative-recommendations">https://shopify.engineering/generative-recommendations</a></strong></p><div><hr></div><h1>Alibaba: PostgreSQL Blink-tree Implementation</h1><p>As we increasingly use AI to code, understanding database internals is more critical than ever. Alibaba Cloud engineers break down how PostgreSQL utilizes the <strong><a href="https://pages.cs.wisc.edu/~yxy/cs764-f22/slides/L15.pdf">Blink-tree </a></strong>architecture to achieve massive concurrency. By adding link pointers to sibling nodes and high keys to mark boundaries, PostgreSQL allows searches to proceed without lock-coupling. This enables the system to gracefully handle concurrent page splits&#8212;following links when data exceeds old boundaries&#8212;and significantly outperforms the more rigid <strong><a href="https://kernelmaker.github.io/MySQL-Lock-1">lock-subtree approach</a></strong> used in MySQL&#8217;s InnoDB.</p><p><strong><a href="https://www.alibabacloud.com/blog/postgresql-blink-tree-implementation_602913">https://www.alibabacloud.com/blog/postgresql-blink-tree-implementation_602913</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering After AI]]></title><description><![CDATA[Moving Data Was Never the Point. Meaning It Is.]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-after-ai</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-after-ai</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Tue, 24 Feb 2026 03:03:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2da46ceb-78fd-4718-9ccb-7afb113096ec_1154x486.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few days back, I ran a LinkedIn poll asking what stays core to software engineering as AI increasingly writes the code. 53% said architecture and trade-offs. 20% said quality and ownership, and 25% said product and problem discovery.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uwq8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uwq8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 424w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 848w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 1272w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uwq8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png" width="948" height="610" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:610,&quot;width&quot;:948,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:89112,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uwq8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 424w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 848w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 1272w, https://substackcdn.com/image/fetch/$s_!uwq8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e82120a-d802-4411-8a58-914f27f6ef24_948x610.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The poll wasn&#8217;t specifically about data engineering, but the answer it yielded applies directly to us. When AI can generate a pipeline as fluently as a senior engineer, the question isn&#8217;t whether our toolbox is changing &#8212; it clearly is. The question is: what kind of thinking has always been too important to automate, and why we let it get buried under the more mechanical work in the first place.</p><p>My answer is that the irreducible work was never about moving data. It was always about meaning. And the framework we&#8217;ve been using &#8212; ETL &#8212; was never really designed to capture meaning.</p><div><hr></div><h1>The ETL Era and Why It&#8217;s Ending</h1><p>Extract, Transform, Load made sense as a job description for a specific historical moment. Source systems were siloed, formats were inconsistent, and somebody had to write the code that moved data from where it lived to where it could be used. The data engineer was that somebody.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KwLr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KwLr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 424w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 848w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 1272w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KwLr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png" width="1228" height="346" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:1228,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:497725,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KwLr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 424w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 848w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 1272w, https://substackcdn.com/image/fetch/$s_!KwLr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffa2bc94f-81e7-4ad2-94fe-b09ee17e9668_1228x346.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But if we&#8217;re honest, the transformation step was always the most brittle part. Teams encoded business rules as SQL logic or Python functions, buried them in pipeline code, version-controlled them alongside infrastructure, but rarely treated them with the same rigor as application code. When the definition of &#8220;active user&#8221; changed &#8212; and it always changed &#8212; someone had to find every place that definition lived and update it, hoping they caught them all.</p><p>AI is now competent at generating this kind of code. Not perfect, but competent enough that the mechanical work of pipeline construction is no longer a meaningful differentiator. If your professional identity is built around being good at writing transformation logic, that identity is under pressure.</p><p>But this isn&#8217;t a story about loss. It&#8217;s a story about clarity. The mechanical work was always obscuring the more important work underneath it. AI forcing that reckoning is, in a strange way, a gift.</p><div><hr></div><h1>Introducing ECL &#8212; Extract, Contextualize, Link</h1><p>The framework emerging as a replacement isn&#8217;t a technical architecture so much as a reorientation of purpose. Instead of Extract, Transform, Load, think Extract, Contextualize, Link.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gXAy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gXAy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 424w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 848w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 1272w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gXAy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png" width="1280" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:972606,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gXAy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 424w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 848w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 1272w, https://substackcdn.com/image/fetch/$s_!gXAy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F500e7461-2805-49c0-9833-d0b54e04421e_1280x528.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Extract remains. Data still needs to move from source systems to analytical environments, and that work still requires engineering judgment &#8212; about reliability, latency, volume, and failure modes. AI will increasingly handle the mechanical parts, but the architectural decisions about what to extract, when, and how belong to people who understand both the source systems and the downstream consequences.</p><p>Contextualize is where the real shift happens. This is the work of giving data semantic meaning &#8212; understanding that &#8220;revenue&#8221; is calculated differently by Finance and Sales, that a timestamp in a clickstream event means something different than a timestamp in a billing record, that a null value in one system represents the absence of information while in another it represents an explicit user choice. AI can draft this work at scale &#8212; inferring field definitions, classifying entities, and mapping relationships across a data landscape that no human team could manually annotate in full. What AI cannot do is be accountable for itself. The judgment of whether an inference is correct, the organizational authority to declare a definition, the decision to formalize a discovered pattern into an enforced contract &#8212; that belongs to humans. Contextualize is where AI inference and human judgment meet, structured by a pipeline built specifically for that purpose.</p><p>Link is about entity relationships across the data landscape &#8212; connecting a customer record in your CRM to a user record in your product database, linking an event in your analytics system to a session in your support tool. As AI generates more of the code that consumes data, the ability to reason about how entities relate across systems becomes more valuable, not less. Linkage is what makes context portable &#8212; what allows the meaning built in one part of the landscape to be grounded in its relationships to the rest.</p><p>The rest of this article discusses how ECL works at the architectural level, not as three abstract concepts, but as three concrete pipelines &#8212; and why you need all of them.</p><div><hr></div><h1>Early Binding &#8212; Contracts as Executable Constraints</h1><p>The first technique is early binding: capturing semantic intent at the point of data production, before the data moves.</p><p>Data contracts are the practical implementation of this idea. At their core, contracts are agreements between data producers and their consumers &#8212; specifying schema, data quality expectations, ownership, and the semantic meaning of each field.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!g3D-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!g3D-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 424w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 848w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 1272w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!g3D-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png" width="1234" height="404" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ddbb651c-b309-4861-ba49-0e142c836729_1234x404.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:404,&quot;width&quot;:1234,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:752776,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!g3D-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 424w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 848w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 1272w, https://substackcdn.com/image/fetch/$s_!g3D-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddbb651c-b309-4861-ba49-0e142c836729_1234x404.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Data Engineering Weekly identified this gap precisely in their piece <em><strong><a href="https://www.dataengineeringweekly.com/p/data-contracts-a-missed-opportunity">Data Contracts: A Missed Opportunity</a></strong></em>. While the data industry was debating what contracts were and drafting governance frameworks to describe them, software engineering had quietly converged on a different organizing principle: treating specifications as executable constraints with real failure semantics. The data industry treated contracts as documentation. Software engineers treated them as interfaces &#8212; things that could break, that had versioning implications, that enforced behavior rather than merely describing it.</p><p>A data contract that lives in a wiki and gets updated when someone remembers is the documentation. A data contract that is enforced at the point of production &#8212; that fails a pipeline when a schema changes without notice, that alerts a consumer when quality thresholds are violated, that an AI agent can reason about deterministically &#8212; that is architecture.</p><p>This matters more in an AI-heavy world, not less. When AI agents generate transformation code, bad contracts are amplified at scale. The agent will faithfully implement whatever logic it&#8217;s given; if the contract governing its inputs is ambiguous or unenforced, the errors it produces will be systematic rather than isolated. Early binding is the mechanism by which human intent gets formalized into something AI can actually work with.</p><p>But early binding alone has a fundamental limitation. And understanding that limitation is what makes the Contextualize pipeline necessary.</p><div><hr></div><h1>The Problem Early Binding Alone Can&#8217;t Solve</h1><p>Consider what happens to a well-contracted dataset as it moves through a modern Medallion architecture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vwwM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vwwM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 424w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 848w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 1272w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vwwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png" width="1237" height="321" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:321,&quot;width&quot;:1237,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:448249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vwwM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 424w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 848w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 1272w, https://substackcdn.com/image/fetch/$s_!vwwM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ca03c22-9a4d-4a7d-a244-279af04bee7f_1237x321.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>At the Bronze layer, data lands close to its source &#8212; raw, minimally transformed, the contract&#8217;s guarantees still largely intact. Silver applies conformance rules: deduplication, type casting, and light standardization. By the time data reaches Gold, the pipeline has made a series of editorial decisions on the data&#8217;s behalf. Aggregations collapse granular events into metrics. Engineers bake business logic into the shape of the table. The Gold layer is an artifact optimized for a specific set of questions &#8212; the ones that seemed important when the pipeline was built.</p><p>Early binding contracts help at the source, but they can&#8217;t prevent this erosion at every subsequent hop &#8212; especially when those contracts are treated as descriptive rather than executable. If there&#8217;s no enforcement mechanism preventing meaning from drifting across transformations, the telephone game plays out silently in your pipeline. By the time a consumer queries the Gold layer, they&#8217;re working with an artifact whose original intent may be several editorial decisions removed from the contract.</p><p>This is the problem that early binding alone cannot solve. Each transformation layer progressively collapses the context captured at the source. You need a complementary approach&#8212;one that preserves the ability to recover context when it&#8217;s actually needed.</p><div><hr></div><h1>Late Binding &#8212; The Agentic Contextualized Pipeline</h1><p>Traditional late binding deferred the <em>application</em> of business rules to query time. What it didn&#8217;t defer was the <em>definition</em> of those rules &#8212; domain experts still had to specify them upfront, just applied through a semantic layer rather than baked into a physical table. In complex domains, that knowledge engineering process was its own bottleneck.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C5NB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C5NB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 424w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 848w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 1272w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C5NB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png" width="1300" height="378" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:378,&quot;width&quot;:1300,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:733215,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C5NB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 424w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 848w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 1272w, https://substackcdn.com/image/fetch/$s_!C5NB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3e69cc6d-3401-44c7-b2d7-9be8a00d3380_1300x378.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The more forward-looking approach is to defer definition itself &#8212; and hand that work to a dedicated pipeline.</p><p>The Contextualize pipeline is a separate, agentic pipeline that runs alongside your data infrastructure. Its job is singular: build and maintain a living, validated store of semantic context. It isn&#8217;t part of the Extract pipeline. It isn&#8217;t a query-time process. It&#8217;s a first-class engineering artifact with its own triggering model, validation layer, and storage.</p><p>The trigger is event-driven, not scheduled. Every new dataset that lands automatically kicks off the pipeline. For existing datasets, continuous profiling monitors for meaningful changes &#8212; a new column appears, a column is dropped, a data distribution shifts in ways that suggest something changed upstream. Any of these events re-triggers the pipeline for the affected entities. Semantic context isn&#8217;t a one-time annotation exercise. It tracks the data as it evolves.</p><p>The pipeline itself is agentic. An AI agent analyzes the incoming data &#8212; schema, sample values, statistical profiles, lineage &#8212; and infers semantic meaning. What does this field represent? What business entity does it belong to? What relationships exist between it and other data in the landscape? It produces structured, versioned context artifacts: inferences about meaning that didn&#8217;t require a domain expert to pre-specify every scenario.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2K_z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2K_z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 424w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 848w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 1272w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2K_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png" width="1129" height="464" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:1129,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:646894,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2K_z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 424w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 848w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 1272w, https://substackcdn.com/image/fetch/$s_!2K_z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1dad946a-2f01-4407-ac7b-1ec7acbbf60b_1129x464.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Those inferences don&#8217;t automatically commit. They route to a validation layer that works like a labeling workflow &#8212; because structurally, it is one. An LLM-as-Judge validates high-confidence inferences before any human review triggers. Medium-confidence ones surface to domain experts for labeling. The pipeline flags low-confidence or contested inferences for deeper investigation. The humans aren&#8217;t reviewing every artifact; they&#8217;re reviewing the uncertain ones. Every labeling automation technique that works in ML pipelines applies here.</p><p>Validated artifacts land in a Context Store &#8212; a dedicated, versioned, queryable store of semantic definitions, entity classifications, and relationship maps. This is the new infrastructure component that ECL requires. Downstream agents don&#8217;t query raw data and infer meaning on the fly. They query the Context Store first, ground their understanding in validated context, and then query the data. The context is stable, reusable, and auditable &#8212; the opposite of ephemeral query-time inference.</p><div><hr></div><h1>Early Binding vs Late Binding &#8212; When to Choose What</h1><p>The decision criterion isn&#8217;t about semantic maturity or how well-understood a domain is. It&#8217;s about where the data comes from relative to your accountability boundary.</p><p>When a dataset originates within a controlled environment &#8212; produced by a team or system within your organization&#8217;s sphere of accountability &#8212; early binding is the right tool. The producer and consumer share an organizational context. Contracts can be negotiated, enforced, and held to. The producing team can be made accountable for the schema they declare and the semantics they commit to. Prescribed context is possible because the relationship that makes it enforceable exists.</p><p>When a dataset originates outside that boundary &#8212; third-party feeds, partner data, public datasets, marketplace sources &#8212; that relationship doesn&#8217;t exist. You cannot hold an external provider to a data contract. The schema can change without notice. The semantics are inferred, not declared. This is where the Contextualize pipeline earns its place. Discovered context is the only kind available.</p><p>But the boundary is not purely organizational. Poorly governed internal data &#8212; produced by a team with no accountability to its consumers, with undocumented schemas and inconsistent definitions &#8212; is effectively uncontrolled even if it sits within the same organization. The real test is not position on an org chart. It is accountability. Early bind where accountability exists. Let the Contextualize pipeline discover where it doesn&#8217;t.</p><p>The feedback loop holds in both directions. Discovered context built up through repeated profiling, inference, and validation can graduate into a prescribed context over time. An external dataset that your organization ingests consistently enough to profile, validate, and republish as an internal data product crosses the boundary from uncontrolled to controlled at that point. The Contextualize pipeline is what makes that transition possible &#8212; and makes the resulting contract trustworthy rather than assumed.</p><p>A data environment that treats all data as early-bindable is brittle. It can only contract what it already understands, and it has no mechanism for the uncontrolled data that makes up a growing share of the analytical landscape. A data environment that treats all data as requiring discovery never formalizes what it learns into enforceable guarantees. The architecture that works reads the accountability boundary correctly and applies the right technique on both sides.</p><div><hr></div><h1>Context Propagation &#8212; The Relay, Not the Pipeline</h1><p>With three pipelines now in play, the question becomes: how does context actually travel through the architecture without getting lost?</p><p>The conventional mental model is wrong. Context doesn&#8217;t travel <em>through</em> the data pipeline&#8212;if it did, it would be lost at every transformation step, which is precisely the Medallion erosion problem. Context travels <em>alongside</em> the pipeline, as metadata, lineage records, and contract provenance. The transformations change the data; the metadata preserves the meaning.</p><p>The relay works like this. Early binding stamps prescribed context at the point of origin &#8212; schema, field-level semantics, producing team ownership, quality thresholds &#8212; as an executable contract living in metadata, not column values. Lineage tooling propagates this through Bronze, Silver, and Gold, maintaining a record of the transformations applied and the contract that governed the data at each stage. The Contextualize pipeline reads that lineage as part of its inference process &#8212; understanding not just what a field looks like today, but also the history of how it arrived and the commitments made about it at the source. Validated inferences land in the Context Store, which becomes the relay&#8217;s destination: a durable, queryable record of what the data means, grounded in both original contract and accumulated lineage.</p><p>The analogy that makes this concrete is git. A file can be modified heavily across dozens of commits &#8212; refactored, renamed, moved, rewritten &#8212; but the context of how it got there is never lost, because it lives in the commit history, not in the file itself. The Gold layer is the latest commit. The lineage graph is the git log. The Context Store is the understanding you build by reading that log systematically rather than hoping the current file tells the whole story.</p><p>This reframe &#8212; from pipeline to relay &#8212; changes what data engineers are actually responsible for building. The transformations are increasingly automatable. The metadata infrastructure, the lineage graph, the Contextualize pipeline that reads it, the Context Store that accumulates from it &#8212; that is the engineering surface that requires sustained human judgment.</p><div><hr></div><h1>The Context Store as the New Engineering Surface</h1><p>Which brings us to where the most consequential engineering work has migrated.</p><p>The Context Store is where business definitions live &#8212; not as documentation in a wiki, not as logic engineers have baked into a Gold table, but as versioned, validated artifacts that downstream systems can query and trust. This is where the validation workflow resolves the competing interpretations of &#8220;revenue&#8221; from Finance and Sales &#8212; not organizational politics, but a confidence-based process that determines which inference earns formalization. Where AI consumers find the grounded, stable context they need to act reliably rather than reverting to ad hoc inference.</p><p>This surface distinguishes queryable data from trustworthy data. A table can be perfectly partitioned, indexed, and replicated while being semantically wrong &#8212; built on a definition that drifted from its source contract three transformations ago and never caught because no Contextualize pipeline was watching. The Context Store is where that failure mode gets closed.</p><p>As AI generates more transformation code and AI agents consume more data at scale, the stakes of this surface rise. An agent operating on a stale or conflicting context artifact produces systematic errors rather than recoverable ones. The engineering work that governs trustworthiness &#8212; designing the trigger model for the Contextualize pipeline, structuring the labeling workflow, deciding what validation confidence threshold earns formalization, and versioning context artifacts as definitions evolve &#8212; requires human judgment at every step.</p><p>Practitioners are still working out the patterns for doing this at scale. The tooling is maturing. How organizations govern ownership of the Context Store, adjudicate conflicts between teams, and manage the graduation from discovered to prescribed context are genuinely open questions. This is where the frontier actually is.</p><div><hr></div><h1>The New Data Engineer &#8212; Context Architect</h1><p>Return to the poll. 53% said architecture and trade-offs are what remain irreducibly human. In the data engineering context, ECL is what that looks like in practice.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_X4x!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_X4x!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 424w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 848w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 1272w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_X4x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png" width="1154" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1154,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:784728,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188977018?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_X4x!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 424w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 848w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 1272w, https://substackcdn.com/image/fetch/$s_!_X4x!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c872ba5-b9fe-4282-a931-32886719e1d5_1154x486.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The data engineer of the next decade owns the architecture of meaning. They design the contractual foundations at the source&#8212;executable, enforceable, versioned. They build the lineage infrastructure that carries context through every transformation layer without losing it. They design and govern the Contextualize pipeline and the Context Store &#8212; the infrastructure where inferences get built, validated, and formalized into the definitions that everything downstream depends on. They understand when to prescribe context upfront and when to let it be discovered, and they build the systems that make both possible.</p><p>But this is not only a technical role. Context erosion is as much an organizational failure as a technical one. Teams don&#8217;t share semantic definitions because no ownership model incentivizes them to do so. Nobody enforces contracts because producing teams have no accountability to the consumers they serve. In this new frame, the data engineer is the person who builds both the technical system and the organizational agreement that holds it together. They sit at the intersection of architecture and coordination &#8212; the two things the poll respondents correctly identified as irreducibly human.</p><p>The title &#8220;Data Engineer&#8221; may need an update. What we are actually describing is a Context Architect &#8212; someone whose primary material is not data movement but data meaning, not pipelines but provenance, not transformation logic but the semantic infrastructure that makes transformation logic trustworthy.</p><div><hr></div><h1>An Open Frontier</h1><p>I want to be honest about what ECL is and what it isn&#8217;t. It is a reorientation &#8212; a way of thinking about what the work actually is, now that AI is handling more of what the work used to look like. It is not a finished methodology. The tooling that links early binding contracts to the Contextualize pipeline and Context Store is still maturing. The organizational patterns for governing who owns the Context Store, how conflicts between teams get adjudicated, and how discovered context earns formalization don&#8217;t yet have established templates. Practitioners are working out the engineering patterns for building contextual pipelines that operate reliably at scale in production environments right now, figuring it out as they go.</p><p>That&#8217;s precisely what makes this moment worth paying close attention to. The frontier is genuinely open. The practitioners who invest in the architectural and organizational work of context &#8212; who treat contracts as executable infrastructure, who build lineage as a first-class engineering concern, who govern the Contextualize pipeline and Context Store as seriously as they once owned the ETL pipeline &#8212; will define the discipline for the decade ahead.</p><p>The 53% who said architecture and trade-offs are irreducibly human were right. We didn&#8217;t yet know which architecture, or which trade-offs.</p><p>Now we do.</p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #258]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-257-19d</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-257-19d</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 23 Feb 2026 01:00:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_22_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pcFF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pcFF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28626,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_22_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pcFF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!pcFF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30ec3cc0-0f2e-4483-bb91-5847d16f9c12_2400x1260.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>AI is moving fast. Is your data platform ready?</h1><p>AI is reshaping how data teams operate. But legacy pipelines, brittle workflows, and fragmented tooling weren&#8217;t designed for this shift.<br>Learn how leading teams are future-proofing their infrastructure before AI demands overwhelm it.</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_22_26_data_engineering_weekly">Download the AI Modernization Guide</a></strong></p><div><hr></div><h1>Garry Tan: Half the AI Agent Market Is One Category. The Rest Is Wide Open</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pr4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pr4c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 424w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 848w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 1272w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic" width="1200" height="718" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:718,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16453,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pr4c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 424w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 848w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 1272w, https://substackcdn.com/image/fetch/$s_!Pr4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdd197d5-0d12-4a80-ab7c-50c28896cc3e_1200x718.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI Agents thrive in RL environments with a verifiable target and quick feedback. Software manufacturing is a perfect model for such an environment, but the challenge persists in other categories. It will be an exciting decade for software engineering as we build new infrastructure that we never imagined.  </p><p><strong><a href="https://garryslist.org/posts/half-the-ai-agent-market-is-one-category-the-rest-is-wide-open">https://garryslist.org/posts/half-the-ai-agent-market-is-one-category-the-rest-is-wide-open</a></strong></p><div><hr></div><h1>LangChain: How to Use Memory in Agent Builder</h1><p>Agents fail to improve over time when they treat every conversation as stateless and discard learned preferences or workflows. The article explains how LangChain&#8217;s Agent Builder implements short-term and long-term memory as a filesystem of Markdown files, enabling persistent instructions and reusable skills. Explicit memory updates, modular skill loading, and direct file editing enable agents to reliably evolve behavior without increasing core prompt complexity.</p><p><strong><a href="https://blog.langchain.com/how-to-use-memory-in-agent-builder/">https://blog.langchain.com/how-to-use-memory-in-agent-builder/</a></strong></p><div><hr></div><h1>LinkedIn: Scaling LLM-Based ranking systems with SGLang at LinkedIn</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9eIl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9eIl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 424w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 848w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 1272w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9eIl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic" width="1024" height="571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:571,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12896,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9eIl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 424w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 848w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 1272w, https://substackcdn.com/image/fetch/$s_!9eIl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc187884f-7b73-4d47-b2a2-810f3251a56e_1024x571.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLM-based ranking systems face strict latency and concurrency constraints because they score thousands of items per query without requiring text generation. The article explains how LinkedIn optimized SGLang for prefill-only ranking through batching improvements, scoring-only execution paths, prefix KV reuse, and Python runtime parallelization.</p><p><strong><a href="https://www.linkedin.com/blog/engineering/ai/scaling-llm-based-ranking-systems-with-sglang-at-linkedin">https://www.linkedin.com/blog/engineering/ai/scaling-llm-based-ranking-systems-with-sglang-at-linkedin</a></strong></p><div><hr></div><h1>Sponsored: The Scaling Data Teams Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=02_22_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uaud!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uaud!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:25368,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=02_22_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uaud!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Uaud!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf3d41c2-93f6-40dc-ad7d-a310216a1922_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>More datasets. More pipelines. More AI demands. The old way of doing things doesn&#8217;t work at this scale.<br>This free eBook walks through how teams actually scale sustainably with roles, responsibilities, automation, and patterns that work.</p><p><strong><a href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=02_22_26_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Spotify: Our Multi-Agent Architecture for Smarter Advertising</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dXi7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dXi7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 424w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 848w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 1272w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dXi7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14952,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dXi7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 424w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 848w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 1272w, https://substackcdn.com/image/fetch/$s_!dXi7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9184364-e8f9-4d15-883a-882b12d212c4_1920x1080.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Fragmented decision logic across buying channels prevented Spotify from translating high-level campaign goals into unified execution plans. The article explains how Spotify built Ads AI, a multi-agent orchestration layer with intent routing, specialized resolution agents, and data-grounded media planning using real-time tool integration. The architecture reduced campaign setup time from minutes to seconds, simplified user inputs, and grounded recommendations in historical performance data.</p><p><strong><a href="https://engineering.atspotify.com/2026/2/our-multi-agent-architecture-for-smarter-advertising">https://engineering.atspotify.com/2026/2/our-multi-agent-architecture-for-smarter-advertising</a></strong></p><div><hr></div><h1>Uber: Database Federation: Decentralized and ACL-Compliant Hive&#8482; Databases</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0mK-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0mK-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 424w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 848w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 1272w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0mK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic" width="1456" height="797" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:797,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22398,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0mK-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 424w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 848w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 1272w, https://substackcdn.com/image/fetch/$s_!0mK-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F970d155b-ca3d-413e-a7ec-74766d2e798e_1528x836.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Monolithic Hive warehouses create shared-fate outages, resource contention, and weak governance when thousands of datasets share a single database. The article explains how Uber implemented database federation by reorganizing datasets into domain-specific units, updating Hive Metastore pointers to avoid data duplication, and deploying both real-time and batch synchronizers to maintain consistency. The decentralized architecture improves ACL compliance, strengthens resource isolation, and reclaims storage while enabling zero-downtime migration at the petabyte scale.</p><p><strong><a href="https://www.uber.com/en-IN/blog/database-federation/">https://www.uber.com/en-IN/blog/database-federation/</a></strong></p><div><hr></div><h1>Anton Borisov: AutoMQ: Shared Storage Architecture Deep Dive</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!r-r5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r-r5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 424w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 848w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 1272w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r-r5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic" width="1400" height="635" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:635,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r-r5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 424w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 848w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 1272w, https://substackcdn.com/image/fetch/$s_!r-r5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8b5b1810-6be0-427c-9836-97ade14737eb_1400x635.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Kafka&#8217;s shared-nothing architecture imposes high replication costs, slow failover, and tight coupling between storage and compute. The article explains how AutoMQ replaces local disk replication with S3-backed shared storage, using layered abstractions, WAL batching, metadata-driven ownership, and epoch fencing to enable stateless brokers and zero-copy failover. AutoMQ design eliminates the 3x replication tax and simplifies scaling to &#8220;add compute,&#8221; while accepting higher cold-read latency from object storage.</p><p><strong><a href="https://medium.com/fresha-data-engineering/automq-shared-storage-architecture-deep-dive-043c5226847e">https://medium.com/fresha-data-engineering/automq-shared-storage-architecture-deep-dive-043c5226847e</a></strong></p><div><hr></div><h1>Pinterest: Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B_ZY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B_ZY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 424w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 848w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 1272w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic" width="1400" height="515" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:515,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24764,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B_ZY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 424w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 848w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 1272w, https://substackcdn.com/image/fetch/$s_!B_ZY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F608a2f5a-c8d4-42ca-8dd7-2ce2e6665897_1400x515.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>OOM in Spark jobs is an infamous issue across data processing, creating operational overhead and inefficient cluster utilization. The article explains how Auto Memory Retries dynamically adjusts executor resources by retrying failed tasks with higher CPU allocation or larger executors through modified Spark resource profiles. The elastic strategy reduced OOM failures by 96%, lowered infrastructure costs by avoiding over-provisioning, and improved overall pipeline reliability.</p><p><strong><a href="https://medium.com/pinterest-engineering/drastically-reducing-out-of-memory-errors-in-apache-spark-at-pinterest-c55d7dac2257">https://medium.com/pinterest-engineering/drastically-reducing-out-of-memory-errors-in-apache-spark-at-pinterest-c55d7dac2257</a></strong></p><div><hr></div><h1>StarTree: Consistent, Scalable Compaction for Real-Time Upserts in Apache Pinot</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R5Mt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R5Mt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 424w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 848w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 1272w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic" width="1301" height="870" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:870,&quot;width&quot;:1301,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14034,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R5Mt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 424w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 848w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 1272w, https://substackcdn.com/image/fetch/$s_!R5Mt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5861d413-e0ca-42a2-9644-4025a0ceec7b_1301x870.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Near-Real-Time upsert is my favorite subject to study, and I've worked with many OLAP engines. Apache Pinot always stands out for its flexible indexing and fast upsert capabilities. The article explains how StarTree&#8217;s SegmentRefreshTask compacts segments in the background by merging only valid records and ensuring atomic visibility with bitmap-based consistency controls. The approach reduces storage costs, supports sustained high ingestion rates, and maintains predictable query latency at a billion-key scale.</p><p><strong><a href="https://startree.ai/resources/upserts-compaction-in-apache-pinot-startree/">https://startree.ai/resources/upserts-compaction-in-apache-pinot-startree/</a></strong></p><div><hr></div><h1>Zepto: Debezium at Scale: An Open Source CDC Story from Zepto</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yR5T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yR5T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 424w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 848w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 1272w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yR5T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic" width="1050" height="285" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:285,&quot;width&quot;:1050,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11155,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188850941?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yR5T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 424w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 848w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 1272w, https://substackcdn.com/image/fetch/$s_!yR5T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb625a238-f1ba-4b74-8300-0bf051db1d26_1050x285.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>High-velocity CDC pipelines can overwhelm downstream databases due to redundant updates and MVCC-induced write amplification. The article explains how Zepto optimized Debezium by introducing an in-memory reduction buffer to collapse duplicate updates and a Postgres UNNEST-based batching strategy to reduce parsing overhead. These improvements stabilized CPU and I/O usage, eliminated replication lag during peak traffic, and ensured the database processes only final record states.</p><p><strong><a href="https://blog.zeptonow.com/debezium-at-scale-an-open-source-cdc-story-from-zepto-aa4b12e32bf7">https://blog.zeptonow.com/debezium-at-scale-an-open-source-cdc-story-from-zepto-aa4b12e32bf7</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #257]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-257</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-257</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 16 Feb 2026 01:45:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-15-26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!r0G-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!r0G-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!r0G-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!r0G-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!r0G-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24171,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-15-26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!r0G-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!r0G-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!r0G-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!r0G-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b7d6f09-30cc-4fac-bf6d-660c295596b2_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Dagster Running Dagster</h1><p>In this upcoming session, find out how Dagster's data team has increased its capacity, along with best practices for data modeling that work well with AI assistants. We'll also demo a real case where our Compass Dagster+ integration identified the root cause of a Postgres-to-Snowflake pipeline that was failing 40-50% of the time.</p><p><strong><a href="https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-15-26_data_engineering_weekly">Register now</a></strong></p><div><hr></div><h1>Ben Lorica: Your agents need runbooks, not bigger context windows</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ym8Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic 424w, https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic 848w, https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic" width="1456" height="871" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db951298-9b82-472b-a013-b293b10b62d4_1456x871.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:871,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19513,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic 424w, https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic 848w, https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ym8Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb951298-9b82-472b-a013-b293b10b62d4_1456x871.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI agents struggle to scale in operational environments because they rely on large, transient context windows that reset after each task and incur repeated planning costs. The article discusses a Context File System (CFS) that separates reasoning from persistent procedural memory, enabling agents to mount task-specific runbooks, reuse indexed tools, and replay proven workflows.</p><p><strong><a href="https://gradientflow.substack.com/p/the-missing-layer-in-todays-agent">https://gradientflow.substack.com/p/the-missing-layer-in-todays-agent</a></strong></p><div><hr></div><h1>Netflix: High-Throughput Graph Abstraction at Netflix - Part I</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R51d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R51d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic 424w, https://substackcdn.com/image/fetch/$s_!R51d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic 848w, https://substackcdn.com/image/fetch/$s_!R51d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic 1272w, https://substackcdn.com/image/fetch/$s_!R51d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R51d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic" width="1400" height="1016" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1016,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16009,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R51d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic 424w, https://substackcdn.com/image/fetch/$s_!R51d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic 848w, https://substackcdn.com/image/fetch/$s_!R51d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic 1272w, https://substackcdn.com/image/fetch/$s_!R51d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1537e1d0-b08a-4d57-bf20-2a67ef9bec4e_1400x1016.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>High-throughput OLTP graph workloads demand low-latency traversal, strong typing, and global consistency at scale. The article explains how Netflix built a Graph Abstraction service on existing KV, TimeSeries, and caching infrastructure, using a property graph model with partitioned namespaces, optimized edge indexing, and write- and read-aside caching. The architecture processes millions of operations per second with single-digit millisecond latency while maintaining strict eventual consistency across regions.</p><p><strong><a href="https://netflixtechblog.medium.com/high-throughput-graph-abstraction-at-netflix-part-i-e88063e6f6d5">https://netflixtechblog.medium.com/high-throughput-graph-abstraction-at-netflix-part-i-e88063e6f6d5</a></strong></p><div><hr></div><h1>Reliable Data Engineering: <strong>Data Contracts in Practice - What 50 Production Implementations Actually Look Like</strong></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!beaW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!beaW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic 424w, https://substackcdn.com/image/fetch/$s_!beaW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic 848w, https://substackcdn.com/image/fetch/$s_!beaW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic 1272w, https://substackcdn.com/image/fetch/$s_!beaW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!beaW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic" width="1400" height="933" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:933,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12712,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!beaW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic 424w, https://substackcdn.com/image/fetch/$s_!beaW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic 848w, https://substackcdn.com/image/fetch/$s_!beaW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic 1272w, https://substackcdn.com/image/fetch/$s_!beaW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5affcb2b-ac74-4242-9f5d-654775e34c08_1400x933.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I recently wrote about <strong><a href="https://www.dataengineeringweekly.com/p/data-contracts-a-missed-opportunity">Data Contract - a missed opportunity</a>, </strong>the recent <strong><a href="https://x.com/tayloramurphy/status/2022530907526107465">Twitter conversation</a></strong>, and the recent <strong><a href="https://github.com/open-semantic-interchange/OSI">OSI spec from Snowflake</a> </strong>reflecting the traces of Data Contract. The author did a solid job summarizing various data contract patterns and their implementations. </p><p><strong><a href="https://medium.com/@reliabledataengineering/data-contracts-in-practice-what-50-production-implementations-actually-look-like-f1c953336bf2">https://medium.com/@reliabledataengineering/data-contracts-in-practice-what-50-production-implementations-actually-look-like-f1c953336bf2</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_15_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!z5zN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!z5zN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!z5zN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!z5zN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!z5zN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15511,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_15_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!z5zN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!z5zN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!z5zN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!z5zN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54d12322-1343-4527-ba92-abc82cb3c91c_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Learn how to build a data platform that enables AI-driven development, reduces pipeline failures, and cuts complexity.<br><br>- Transform from Big Complexity to AI-ready architecture<br>- Real metrics from organizations achieving 50% cost reductions<br>- Introduction to Components: YAML-first pipelines that AI can build</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_15_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Netflix: Scaling LLM Post-Training at Netflix</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!d20Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!d20Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic 424w, https://substackcdn.com/image/fetch/$s_!d20Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic 848w, https://substackcdn.com/image/fetch/$s_!d20Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic 1272w, https://substackcdn.com/image/fetch/$s_!d20Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!d20Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic" width="1162" height="415" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:415,&quot;width&quot;:1162,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13201,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!d20Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic 424w, https://substackcdn.com/image/fetch/$s_!d20Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic 848w, https://substackcdn.com/image/fetch/$s_!d20Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic 1272w, https://substackcdn.com/image/fetch/$s_!d20Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F13c3e891-d2db-4a1c-ba89-7ea1f632db14_1162x415.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Production LLM deployment requires a scalable post-training infrastructure that handles complex data pipelines, distributed GPUs, and evolving fine-tuning strategies. The article explains how Netflix built a unified post-training framework on its ML platform to support efficient SFT and RL workflows, modular data and model abstractions, and tight integration with open-source ecosystems. Custom optimizations, such as sequence packing and hybrid RL orchestration, increase token throughput and enable researchers to focus on modeling rather than infrastructure.</p><p><strong><a href="https://netflixtechblog.com/scaling-llm-post-training-at-netflix-0046f8790194">https://netflixtechblog.com/scaling-llm-post-training-at-netflix-0046f8790194</a></strong></p><div><hr></div><h1>Abhishek Goswami: From Prompts to Production: A Playbook for Agentic Development</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rdac!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rdac!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic 424w, https://substackcdn.com/image/fetch/$s_!rdac!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic 848w, https://substackcdn.com/image/fetch/$s_!rdac!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic 1272w, https://substackcdn.com/image/fetch/$s_!rdac!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rdac!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic" width="1456" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15094,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rdac!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic 424w, https://substackcdn.com/image/fetch/$s_!rdac!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic 848w, https://substackcdn.com/image/fetch/$s_!rdac!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic 1272w, https://substackcdn.com/image/fetch/$s_!rdac!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdead1cb-0e23-4630-bdb4-df7128aa3098_2209x1017.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Enterprise AI agents fail in production when teams rely on prompt experimentation without structured lifecycle and behavioral controls. The article introduces an Agentic SDLC that separates deterministic and agentic components, applies reusable orchestration patterns, and enforces versioned prompts, tool manifests, and MCP-based integrations. Behavioral testing with golden trajectories, layered validation, and human-in-the-loop oversight enables reliable, scalable deployment of agents beyond prototypes.</p><p><strong><a href="https://www.infoq.com/articles/prompts-to-production-playbook-for-agentic-development/">https://www.infoq.com/articles/prompts-to-production-playbook-for-agentic-development/</a></strong></p><div><hr></div><h1>Zepto: How We Built High-Precision, Low-Latency Semantic Search in Production</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fh2j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fh2j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic 424w, https://substackcdn.com/image/fetch/$s_!Fh2j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic 848w, https://substackcdn.com/image/fetch/$s_!Fh2j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic 1272w, https://substackcdn.com/image/fetch/$s_!Fh2j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fh2j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic" width="1050" height="239" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:239,&quot;width&quot;:1050,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7656,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fh2j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic 424w, https://substackcdn.com/image/fetch/$s_!Fh2j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic 848w, https://substackcdn.com/image/fetch/$s_!Fh2j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic 1272w, https://substackcdn.com/image/fetch/$s_!Fh2j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc8d2897-cc06-4efb-80fc-f64941efa245_1050x239.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Keyword-based search fails on short, misspelled, and tail queries that lack lexical overlap with product catalogs. The article explains how Zepto built a dual-encoder semantic retrieval system trained with weak supervision, synthetic data, and InfoNCE loss to learn intent-aware embeddings under strict latency constraints. The approach delivered a 35% uplift on impacted queries, improved downstream ranking quality, and enabled semantic retrieval for both search and ads use cases.</p><p><strong><a href="https://blog.zeptonow.com/how-we-built-high-precision-low-latency-semantic-search-in-production-75a6c61dee25">https://blog.zeptonow.com/how-we-built-high-precision-low-latency-semantic-search-in-production-75a6c61dee25</a></strong></p><div><hr></div><h1>Atlas9: The challenges of soft delete</h1><p>Deleting data is one of the hardest problems; it is easy to write, but hard to delete. The standard approach is soft deletion, and it has several complications. The article covers the nuances of soft deletes, architectural patterns, and best practices for handling delete system design at scale. </p><p><strong><a href="https://atlas9.dev/blog/soft-delete.html">https://atlas9.dev/blog/soft-delete.html</a></strong></p><div><hr></div><h1>Apache Parquet: Native Geospatial Types in Apache Parquet</h1><p>It is exciting to see Apache Parquet evolve with the addition of more types and indexing support. The flexibility of a data format that allows you to select data types and the indexing patterns associated with them improves data management efficiency. I wish Apache Parquet/Lakehouse formats such as Hudi, Iceberg, and Delta Lake offered the flexibility of <strong><a href="https://docs.pinot.apache.org/basics/indexing">Apache Pinot&#8217;s types and indexing patterns. </a></strong></p><p><strong><a href="https://parquet.apache.org/blog/2026/02/13/native-geospatial-types-in-apache-parquet/">https://parquet.apache.org/blog/2026/02/13/native-geospatial-types-in-apache-parquet/</a></strong></p><div><hr></div><h1>Dalto Curvelano: Introduction to PostgreSQL Indexes</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HJNU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HJNU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic 424w, https://substackcdn.com/image/fetch/$s_!HJNU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic 848w, https://substackcdn.com/image/fetch/$s_!HJNU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic 1272w, https://substackcdn.com/image/fetch/$s_!HJNU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HJNU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic" width="1456" height="746" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:746,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21043,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/188089137?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HJNU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic 424w, https://substackcdn.com/image/fetch/$s_!HJNU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic 848w, https://substackcdn.com/image/fetch/$s_!HJNU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic 1272w, https://substackcdn.com/image/fetch/$s_!HJNU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff489ecc0-cbd5-4bfb-9213-05538bb8e4a6_2408x1234.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Speaking of types and indexing, it always comes back to basics and a solid foundation. The article is an excellent overview of PostgreSQL indexing support and its use. </p><p><strong><a href="https://dlt.github.io/blog/posts/introduction-to-postgresql-indexes/">https://dlt.github.io/blog/posts/introduction-to-postgresql-indexes/</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #256]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-256</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-256</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 09 Feb 2026 04:54:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/what-assets-do-best?utm_campaign=37123386-26-02-DMND_Dagster_Childrens_Book&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=childrens_book&amp;utm_content=02-08-26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lc0r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!Lc0r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!Lc0r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!Lc0r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lc0r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17350,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/what-assets-do-best?utm_campaign=37123386-26-02-DMND_Dagster_Childrens_Book&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=childrens_book&amp;utm_content=02-08-26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Lc0r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!Lc0r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!Lc0r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!Lc0r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc0e92cf7-314c-4a76-9c5e-f60bdf37821b_1200x630.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>What assets do best: The Dagster Children's Book</h1><p>We&#8217;re excited to share something a little unexpected from the Dagster team: What assets do best, a children&#8217;s book about data assets! Perfect for kids and data-loving grown-ups alike, you&#8217;ll learn how assets work together, adapt to change, and give teams a complete view of their data.<br><br>Watch the narrated story, find out where you can snag a free book IRL, and print &amp; play puzzles!</p><p><strong><a href="https://dagster.io/what-assets-do-best?utm_campaign=37123386-26-02-DMND_Dagster_Childrens_Book&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=childrens_book&amp;utm_content=02-08-26_data_engineering_weekly">Check out the book &amp; other activities</a></strong></p><div><hr></div><h1>Alexander Shereshevsky: Graph RAG in 2026 - A Practitioner&#8217;s Guide to What Actually Works</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XWy8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XWy8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic 424w, https://substackcdn.com/image/fetch/$s_!XWy8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic 848w, https://substackcdn.com/image/fetch/$s_!XWy8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic 1272w, https://substackcdn.com/image/fetch/$s_!XWy8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XWy8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic" width="1400" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32473,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XWy8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic 424w, https://substackcdn.com/image/fetch/$s_!XWy8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic 848w, https://substackcdn.com/image/fetch/$s_!XWy8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic 1272w, https://substackcdn.com/image/fetch/$s_!XWy8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5c0303d-8bc9-4ffb-8f0d-4c0c6ab623b1_1400x764.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Graph RAG adoption stalled after early hype because high indexing costs and unclear performance trade-offs limited production use. The article explains when Graph RAG outperforms vector search, how teams reduce costs with selective graph construction, and why hybrid vector&#8211;graph architectures deliver the best results.</p><p><strong><a href="https://medium.com/@shereshevsky/graph-rag-in-2026-a-practitioners-guide-to-what-actually-works-dca4962e7517">https://medium.com/@shereshevsky/graph-rag-in-2026-a-practitioners-guide-to-what-actually-works-dca4962e7517</a></strong></p><div><hr></div><h1>Mark Rittman: So, Just How Relevant is Multi-Touch Attribution for Marketers in 2026?</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-U-T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-U-T!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic 424w, https://substackcdn.com/image/fetch/$s_!-U-T!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic 848w, https://substackcdn.com/image/fetch/$s_!-U-T!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic 1272w, https://substackcdn.com/image/fetch/$s_!-U-T!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-U-T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic" width="1456" height="693" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:693,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27323,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-U-T!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic 424w, https://substackcdn.com/image/fetch/$s_!-U-T!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic 848w, https://substackcdn.com/image/fetch/$s_!-U-T!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic 1272w, https://substackcdn.com/image/fetch/$s_!-U-T!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59159696-3c90-4f51-8429-4c3aaccfcf19_2950x1404.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Marketing attribution struggles in 2026 as privacy controls, regulations, and cookie loss remove large portions of user-level data. The article explains how teams adapt by combining deterministic identity for logged-in users, server-side tracking, and triangulation across MTA, MMM, and incrementality testing. Prioritizing authentication, tracking micro-conversions, and owning raw event data enables more reliable attribution in a privacy-first environment.</p><p><strong><a href="https://blog.rittmananalytics.com/how-relevant-is-multi-touch-attribution-for-marke-275a71a36d5e">https://blog.rittmananalytics.com/how-relevant-is-multi-touch-attribution-for-marke-275a71a36d5e</a></strong></p><div><hr></div><h1>Pinterest: Next Generation DB Ingestion at Pinterest</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0PtB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0PtB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic 424w, https://substackcdn.com/image/fetch/$s_!0PtB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic 848w, https://substackcdn.com/image/fetch/$s_!0PtB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic 1272w, https://substackcdn.com/image/fetch/$s_!0PtB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0PtB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic" width="1400" height="621" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:621,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7333,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0PtB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic 424w, https://substackcdn.com/image/fetch/$s_!0PtB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic 848w, https://substackcdn.com/image/fetch/$s_!0PtB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic 1272w, https://substackcdn.com/image/fetch/$s_!0PtB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29536bca-c84e-4130-89a6-8c7abd79fe59_1400x621.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Legacy batch-based ingestion pipelines created high latency, operational complexity, and compliance gaps across Pinterest&#8217;s data ecosystem. The article explains how Pinterest built a unified CDC-based ingestion framework using Kafka, Flink, Spark, and Iceberg to stream database changes and efficiently upsert them into analytical tables, reducing data latency from days to minutes while lowering compute costs and improving reliability at the petabyte scale.</p><p><strong><a href="https://medium.com/pinterest-engineering/next-generation-db-ingestion-at-pinterest-66844b7153b7">https://medium.com/pinterest-engineering/next-generation-db-ingestion-at-pinterest-66844b7153b7</a></strong></p><div><hr></div><h1>Sponsored: Dagster Running Dagster</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-08-26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!avAE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!avAE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!avAE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!avAE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!avAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:29444,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-08-26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!avAE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!avAE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!avAE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!avAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8361337-d96b-4210-8bf7-0b913055da7a_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In this upcoming session, find out how Dagster's data team has increased its capacity, along with best practices for data modeling that work well with AI assistants. We'll also demo a real case where our Compass Dagster+ integration identified the root cause of a Postgres-to-Snowflake pipeline that was failing 40-50% of the time.</p><p><strong><a href="https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-08-26_data_engineering_weekly">Register now</a></strong></p><div><hr></div><h1>Netflix: The Data Canary: How Netflix Validates Catalog Metadata</h1><p>Although this article is not specifically about the data warehouse, it demonstrates how data corruption can occur without a code change and still disrupt the system. The article describes how Netflix built a data canary system that validates new catalog metadata using side-by-side clusters, chaos-based testing, and customer-centric behavioral metrics. By detecting corruption within minutes and blocking the release of unsafe data, Netflix applies code-level deployment rigor to high-velocity data pipelines.</p><p><strong><a href="https://netflixtechblog.medium.com/the-data-canary-how-netflix-validates-catalog-metadata-18b699d58e36">https://netflixtechblog.medium.com/the-data-canary-how-netflix-validates-catalog-metadata-18b699d58e36</a></strong></p><div><hr></div><h1>Uber: Introducing uFowarder - The Consumer Proxy for Kafka Async Queuing</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ul1d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ul1d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic 424w, https://substackcdn.com/image/fetch/$s_!Ul1d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic 848w, https://substackcdn.com/image/fetch/$s_!Ul1d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ul1d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ul1d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic" width="1456" height="817" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:817,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14301,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ul1d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic 424w, https://substackcdn.com/image/fetch/$s_!Ul1d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic 848w, https://substackcdn.com/image/fetch/$s_!Ul1d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ul1d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6498c7c6-0fba-49cc-ab8a-ed96714d077d_1536x862.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Kafka consumer services struggle to scale reliably when direct protocol access introduces head-of-the-line blocking, inefficiency, and operational complexity. The article explains how Uber built uForwarder, a gRPC-based Kafka consumer proxy that resolves head-of-line blocking, improves hardware utilization, isolates traffic, and supports delayed processing. By abstracting Kafka internals behind a push-based interface, uForwarder increases reliability and efficiency across thousands of consumer workloads.</p><p><strong><a href="https://www.uber.com/en-IN/blog/introducing-ufowarder/">https://www.uber.com/en-IN/blog/introducing-ufowarder/</a></strong></p><div><hr></div><h1>Pierce Lamb: Agentic Search over Graphs of Long Documents (or LAD-RAG++)</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PUZG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PUZG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic 424w, https://substackcdn.com/image/fetch/$s_!PUZG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic 848w, https://substackcdn.com/image/fetch/$s_!PUZG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic 1272w, https://substackcdn.com/image/fetch/$s_!PUZG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PUZG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic" width="1400" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:39712,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PUZG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic 424w, https://substackcdn.com/image/fetch/$s_!PUZG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic 848w, https://substackcdn.com/image/fetch/$s_!PUZG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic 1272w, https://substackcdn.com/image/fetch/$s_!PUZG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8807a27-20c2-44fc-ac4b-6f7bc5046a16_1400x764.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Possibly one of the best reads for this week for me. Vanilla RAG struggles with long, structured documents because static chunking loses layout, relationships, and cross-page context. The author reviews LAD-RAG++, which constructs layout-aware document graphs and employs agentic retrieval to explore structural and semantic connections dynamically. Engineering improvements in memory control, graph pruning, and cost-efficient processing make graph-based RAG practical for high-recall question answering over dense professional documents. </p><p><strong><a href="https://pierce-lamb.medium.com/agentic-search-over-graphs-of-long-documents-or-lad-rag-1264030158e8">https://pierce-lamb.medium.com/agentic-search-over-graphs-of-long-documents-or-lad-rag-1264030158e8</a></strong></p><div><hr></div><h1>Halodoc: Halodoc&#8217;s Layered Data Validation Strategy for Building Trust in the Lakehouse</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kwYE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kwYE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic 424w, https://substackcdn.com/image/fetch/$s_!kwYE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic 848w, https://substackcdn.com/image/fetch/$s_!kwYE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic 1272w, https://substackcdn.com/image/fetch/$s_!kwYE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kwYE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic" width="1456" height="862" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:862,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18964,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kwYE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic 424w, https://substackcdn.com/image/fetch/$s_!kwYE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic 848w, https://substackcdn.com/image/fetch/$s_!kwYE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic 1272w, https://substackcdn.com/image/fetch/$s_!kwYE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffde70d11-a62d-4983-838e-d318b0da4db3_1677x993.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Data quality issues in healthcare analytics demand stronger guarantees than generic validation frameworks can provide. The article explains how Halodoc built a custom, configuration-driven validation pipeline with four-layered checks that combine time-bound reconciliation, AI-generated structural tests, and business-rule enforcement across the Lakehouse. By integrating LLMs into validation and surfacing failures in real time, the system reduces incidents, increases trust in analytics, and supports reliable clinical decision-making.</p><p><strong><a href="https://blogs.halodoc.io/halodocs-layered-data-validation-strategy/amp/">https://blogs.halodoc.io/halodocs-layered-data-validation-strategy/amp/</a></strong></p><div><hr></div><h1>Booking.com: Beyond Prompt Engineering: How We Used Supervised Fine-Tuning for Travel Recommendations</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!beVr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!beVr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic 424w, https://substackcdn.com/image/fetch/$s_!beVr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic 848w, https://substackcdn.com/image/fetch/$s_!beVr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic 1272w, https://substackcdn.com/image/fetch/$s_!beVr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!beVr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic" width="1400" height="830" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:830,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:31481,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/187353312?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!beVr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic 424w, https://substackcdn.com/image/fetch/$s_!beVr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic 848w, https://substackcdn.com/image/fetch/$s_!beVr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic 1272w, https://substackcdn.com/image/fetch/$s_!beVr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6f334ba-7c0c-4380-8443-7d8d030fe499_1400x830.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Prompt-based LLMs struggle to deliver fast, privacy-safe, and personalized travel recommendations at production scale. The article explains how Booking.com used supervised fine-tuning on an open-weight model with parameter-efficient techniques, contextual inputs, and carefully designed labels to combine conversational understanding with behavioral signals.</p><p><strong><a href="https://booking.ai/beyond-prompt-engineering-how-we-used-supervised-fine-tuning-for-travel-recommendations-91e8f4711e4b">https://booking.ai/beyond-prompt-engineering-how-we-used-supervised-fine-tuning-for-travel-recommendations-91e8f4711e4b</a></strong></p><div><hr></div><h1>Pranav Mehta: Clickhouse Internals: A Deep Dive into ClickHouse Distributed Connection Pooling</h1><p>ClickHouse operators may misinterpret connection-retry warnings as leaks when distributed queries encounter transient network errors. The article explains how ClickHouse reuses pooled TCP connections for distributed tables and why idle timeouts in spiky workloads produce harmless &#8220;Broken pipe&#8221; warnings. </p><p><strong><a href="https://medium.com/@pranavmehta94/clickhouse-internals-a-deep-dive-into-clickhouse-distributed-connection-pooling-d9e956b5eb57">https://medium.com/@pranavmehta94/clickhouse-internals-a-deep-dive-into-clickhouse-distributed-connection-pooling-d9e956b5eb57</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #255]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-255</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-255</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 02 Feb 2026 02:22:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-01-26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!azkE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!azkE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!azkE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!azkE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!azkE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:22021,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-01-26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!azkE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!azkE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!azkE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!azkE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87c6cd40-4b1c-480e-83bf-193e1fd0b1a9_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Dagster Running Dagster dives into AI analytics.</h1><p>In this upcoming session, Analytics Lead Anil walks through how Compass has increased the Dagster data team's capacity, shares best practices for data modeling that work well with AI assistants (hint: nested columns and wide tables are your friends), and demos a real case where our Compass Dagster+ integration identified the root cause of a Postgres-to-Snowflake pipeline that was failing 40-50% of the time.</p><p><strong><a href="https://dagster.io/events/dagster-running-dagster-how-we-use-compass-for-ai-analytics?utm_campaign=34764303-26-02-WBNR_Deep%20Dive_Dagster_Running_Dagster_Compass&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_compass&amp;utm_content=02-01-26_data_engineering_weekly">Save your spot now</a>.</strong></p><div><hr></div><p>OpenAI: Unrolling the Codex agent loop</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pE2N!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pE2N!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic 424w, https://substackcdn.com/image/fetch/$s_!pE2N!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic 848w, https://substackcdn.com/image/fetch/$s_!pE2N!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic 1272w, https://substackcdn.com/image/fetch/$s_!pE2N!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pE2N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic" width="1198" height="716" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:716,&quot;width&quot;:1198,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11866,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pE2N!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic 424w, https://substackcdn.com/image/fetch/$s_!pE2N!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic 848w, https://substackcdn.com/image/fetch/$s_!pE2N!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic 1272w, https://substackcdn.com/image/fetch/$s_!pE2N!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d81c303-3c96-476c-8c27-93c60e1f3c64_1198x716.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The explanations of AI agents often obscure how local tools, model inference, and user interaction are orchestrated in practice. The article breaks down the Codex CLI agent loop, detailing how prompts, tool calls, iterative inference, context compaction, and prompt caching work together to execute software tasks efficiently. By combining stateless operation, automatic context management, and flexible tool integration via MCP, Codex achieves secure, performant local agent execution without server-side session retention.</p><p><strong><a href="https://openai.com/index/unrolling-the-codex-agent-loop/">https://openai.com/index/unrolling-the-codex-agent-loop/</a></strong></p><div><hr></div><h1>OpenAI: Inside OpenAI&#8217;s in-house data agent</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bzTg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bzTg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic 424w, https://substackcdn.com/image/fetch/$s_!bzTg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic 848w, https://substackcdn.com/image/fetch/$s_!bzTg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic 1272w, https://substackcdn.com/image/fetch/$s_!bzTg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bzTg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic" width="1456" height="894" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:894,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:23903,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bzTg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic 424w, https://substackcdn.com/image/fetch/$s_!bzTg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic 848w, https://substackcdn.com/image/fetch/$s_!bzTg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic 1272w, https://substackcdn.com/image/fetch/$s_!bzTg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0caca09a-1888-48f5-ab5a-983a45c4368f_1830x1124.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>OpenAI writes about its internal data agent, which uses a closed-loop, self-correcting process and multiple context layers to translate natural language into reliable queries across hundreds of petabytes of data. By grounding meaning in code, minimizing tool complexity, and enforcing pass-through permissions with continuous evaluation, the system delivers fast, secure, and reliable data access for employees at scale.</p><p><strong><a href="https://openai.com/index/inside-our-in-house-data-agent/">https://openai.com/index/inside-our-in-house-data-agent/</a></strong></p><div><hr></div><h1>Preset: The Semantic Layer Is Back. Here&#8217;s What We&#8217;re Doing About It.</h1><p>The article reads like a pitch for the present, but what I liked most is the clear analogy for what a semantic layer is and why it has failed, even though legacy tools like Business Objects do support it. Overall, I&#8217;m excited about the agents as an interface for insights and the renewed interest in the semantic layer. </p><p><strong><a href="https://preset.io/blog/semantic-layer-is-back/">https://preset.io/blog/semantic-layer-is-back/</a></strong></p><div><hr></div><h1>Sponsored: How to build a data platform that's ready for AI</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_01_26_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2i_W!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!2i_W!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!2i_W!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!2i_W!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2i_W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15511,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_01_26_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2i_W!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!2i_W!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!2i_W!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!2i_W!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F682b92fe-6c83-4992-b0e2-9dad75a8be4f_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Traditional data platforms are becoming the biggest bottleneck when companies experiment with AI. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.<br><br>- Transform from Big Complexity to AI-ready architecture<br>- Real metrics from organizations achieving 50% cost reductions<br>- Introduction to Components: YAML-first pipelines that AI can build</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=02_01_26_data_engineering_weekly">Get the free guide now</a></strong></p><div><hr></div><h1>LangChain: Context Management for Deep Agents</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!95-w!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!95-w!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic 424w, https://substackcdn.com/image/fetch/$s_!95-w!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic 848w, https://substackcdn.com/image/fetch/$s_!95-w!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic 1272w, https://substackcdn.com/image/fetch/$s_!95-w!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!95-w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic" width="1456" height="855" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:855,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17628,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!95-w!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic 424w, https://substackcdn.com/image/fetch/$s_!95-w!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic 848w, https://substackcdn.com/image/fetch/$s_!95-w!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic 1272w, https://substackcdn.com/image/fetch/$s_!95-w!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3a27a000-f63e-42fd-8e78-a254b7a9c4e8_1783x1047.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Large AI agents risk context rot when long-running tasks exceed LLM memory limits, degrading reasoning quality. The article explains how the Deep Agents SDK actively manages context using tool input and output offloading, filesystem-backed pointers, and structured summarization to stay within token limits. Targeted evaluations ensure agents can recover critical details from compressed context and maintain task intent over extended workflows.</p><p><strong><a href="https://www.blog.langchain.com/context-management-for-deepagents/">https://www.blog.langchain.com/context-management-for-deepagents/</a></strong></p><div><hr></div><h1>Dropbox: Engineering VP Josh Clemm on how we use knowledge graphs, MCP, and DSPy in Dash</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ioGh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ioGh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic 424w, https://substackcdn.com/image/fetch/$s_!ioGh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic 848w, https://substackcdn.com/image/fetch/$s_!ioGh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic 1272w, https://substackcdn.com/image/fetch/$s_!ioGh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ioGh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20332,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ioGh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic 424w, https://substackcdn.com/image/fetch/$s_!ioGh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic 848w, https://substackcdn.com/image/fetch/$s_!ioGh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic 1272w, https://substackcdn.com/image/fetch/$s_!ioGh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6706b1e-7ecf-4828-877c-cb94457c44c1_1600x900.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Building a universal search and agentic workspace is difficult because work data spans many tools, formats, and contexts, while LLMs face latency and context limits. The article explains how Dropbox Dash uses an index-based retrieval system, a context engine with multimodal processing, knowledge bundles, and MCP-based super tools, combined with LLM-as-a-judge and DSPy-driven prompt optimization.</p><p><strong><a href="https://dropbox.tech/machine-learning/vp-josh-clemm-knowledge-graphs-mcp-and-dspy-dash">https://dropbox.tech/machine-learning/vp-josh-clemm-knowledge-graphs-mcp-and-dspy-dash</a></strong></p><div><hr></div><h1>Whatnot: Lessons learned from scaling data scientists with AI</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!susK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!susK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic 424w, https://substackcdn.com/image/fetch/$s_!susK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic 848w, https://substackcdn.com/image/fetch/$s_!susK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic 1272w, https://substackcdn.com/image/fetch/$s_!susK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!susK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic" width="1400" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14588,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!susK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic 424w, https://substackcdn.com/image/fetch/$s_!susK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic 848w, https://substackcdn.com/image/fetch/$s_!susK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic 1272w, https://substackcdn.com/image/fetch/$s_!susK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa10b0837-4a08-415f-939c-890e10cefc15_1400x402.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI-driven analytics struggle to generate correct queries because raw tables lack explicit business meaning and consistent relationships. The article explains how semantic views encode business logic, table relationships, and approved data scope to give LLMs precise, machine-readable context for SQL generation. By standardizing definitions and constraining access to vetted datasets, semantic views improve query accuracy, reduce hallucinations, and make AI-assisted analytics safer and more reliable.</p><p><strong><a href="https://medium.com/whatnot-engineering/lessons-learned-from-scaling-data-scientists-with-ai-e7aa7b3235b4">https://medium.com/whatnot-engineering/lessons-learned-from-scaling-data-scientists-with-ai-e7aa7b3235b4</a></strong></p><div><hr></div><h1>Netflix: Data Bridge: How Netflix simplifies data movement</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CXjk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CXjk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic 424w, https://substackcdn.com/image/fetch/$s_!CXjk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic 848w, https://substackcdn.com/image/fetch/$s_!CXjk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic 1272w, https://substackcdn.com/image/fetch/$s_!CXjk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CXjk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic" width="1275" height="582" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:582,&quot;width&quot;:1275,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17276,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CXjk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic 424w, https://substackcdn.com/image/fetch/$s_!CXjk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic 848w, https://substackcdn.com/image/fetch/$s_!CXjk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic 1272w, https://substackcdn.com/image/fetch/$s_!CXjk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cd3f05-b03d-4aac-984e-ea9481358af3_1275x582.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Fragmented data movement tooling creates operational overhead, inconsistent governance, and tightly coupled implementations across large data ecosystems. The article describes how Netflix built Data Bridge as a unified control plane that separates user intent from execution, centralizes governance, and orchestrates existing data movement systems through standardized interfaces.</p><p><strong><a href="https://netflixtechblog.medium.com/data-bridge-how-netflix-simplifies-data-movement-36d10d91c313">https://netflixtechblog.medium.com/data-bridge-how-netflix-simplifies-data-movement-36d10d91c313</a></strong></p><div><hr></div><h1>LinkedIn: Contextual agent playbooks and tools: How LinkedIn gave AI coding agents organizational context</h1><p>AI coding agents struggle to operate effectively without access to company-specific context, tools, and workflows. The article describes LinkedIn&#8217;s CAPT framework, which uses MCP, executable playbooks, and scalable meta-tools to connect agents to internal systems while controlling context and tool discovery. By packaging CAPT as a zero-friction local service, LinkedIn enables agents to automate debugging, incident response, data analysis, and issue triage, reducing investigation time by up to 70%.</p><p><strong><a href="https://www.linkedin.com/blog/engineering/ai/contextual-agent-playbooks-and-tools-how-linkedin-gave-ai-coding-agents-organizational-context">https://www.linkedin.com/blog/engineering/ai/contextual-agent-playbooks-and-tools-how-linkedin-gave-ai-coding-agents-organizational-context</a></strong></p><div><hr></div><h1>Netflix: The AI Evolution of Graph Search at Netflix: From Structured Queries to Natural Language</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WjZA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WjZA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic 424w, https://substackcdn.com/image/fetch/$s_!WjZA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic 848w, https://substackcdn.com/image/fetch/$s_!WjZA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic 1272w, https://substackcdn.com/image/fetch/$s_!WjZA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WjZA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic" width="1323" height="1560" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1560,&quot;width&quot;:1323,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26591,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WjZA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic 424w, https://substackcdn.com/image/fetch/$s_!WjZA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic 848w, https://substackcdn.com/image/fetch/$s_!WjZA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic 1272w, https://substackcdn.com/image/fetch/$s_!WjZA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0332407-ffe3-4816-b42f-c3ecc1e51ecf_1323x1560.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Enterprise search systems struggle when users must express complex filters through rigid, technical query languages. The article explains how Netflix evolved Graph Search by using LLMs to translate natural language into validated, schema-aware DSL queries with field-level RAG and AST-based verification. By visualizing AI-generated logic and supporting explicit entity selection, the platform lets users query federated data intuitively while maintaining correctness and trust.</p><p><strong><a href="https://netflixtechblog.com/the-ai-evolution-of-graph-search-at-netflix-d416ec5b1151">https://netflixtechblog.com/the-ai-evolution-of-graph-search-at-netflix-d416ec5b1151</a></strong></p><div><hr></div><h1>Modern Data 101: Modeling Semantics: How Data Models and Ontologies Connect to Build Your Semantic Foundations</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kQNr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kQNr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic 424w, https://substackcdn.com/image/fetch/$s_!kQNr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic 848w, https://substackcdn.com/image/fetch/$s_!kQNr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic 1272w, https://substackcdn.com/image/fetch/$s_!kQNr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kQNr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic" width="1400" height="1003" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1003,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18258,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186564828?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kQNr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic 424w, https://substackcdn.com/image/fetch/$s_!kQNr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic 848w, https://substackcdn.com/image/fetch/$s_!kQNr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic 1272w, https://substackcdn.com/image/fetch/$s_!kQNr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d0c098a-29dc-4d5c-8b2c-f4e296970e84_1400x1003.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI-driven systems struggle without explicit semantic context to ground reasoning and reduce hallucinations. The article argues that data modeling and ontologies both capture entities and relationships and should serve as core methods for discovering and structuring organizational knowledge. By combining industry standards, conceptual modeling, and AI-assisted enrichment, teams can build a unified semantic foundation that improves both human understanding and AI accuracy.</p><p><strong><a href="https://medium.com/@community_md101/modeling-semantics-how-data-models-and-ontologies-connect-to-build-your-semantic-foundations-3a9a0664e3ff">https://medium.com/@community_md101/modeling-semantics-how-data-models-and-ontologies-connect-to-build-your-semantic-foundations-3a9a0664e3ff</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[The Missing Layer in Your AI Stack: Context, Not Just State]]></title><description><![CDATA[From SQL to Semantics: The Rise of the Context Graph for AI Agents]]></description><link>https://www.dataengineeringweekly.com/p/the-missing-layer-in-your-ai-stack</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/the-missing-layer-in-your-ai-stack</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Sat, 31 Jan 2026 04:13:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3e6i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ip7u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ip7u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic 424w, https://substackcdn.com/image/fetch/$s_!Ip7u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic 848w, https://substackcdn.com/image/fetch/$s_!Ip7u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ip7u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ip7u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic" width="1314" height="812" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:812,&quot;width&quot;:1314,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36462,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/186379044?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ip7u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic 424w, https://substackcdn.com/image/fetch/$s_!Ip7u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic 848w, https://substackcdn.com/image/fetch/$s_!Ip7u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic 1272w, https://substackcdn.com/image/fetch/$s_!Ip7u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbdad6f15-bc0b-4bdc-9369-04c11d67c9a8_1314x812.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em><strong><a href="https://atlan.com/great-data-debate-2026/?utm_source=DEW+&amp;utm_medium=Substack&amp;utm_campaign=DEW_GDD">Join The Great Data Debate</a></strong> to get answers to questions the data &amp; AI industry is so curious about right now:</em></p><ul><li><p><em>Where does context materialize in practice?</em></p></li><li><p><em>Semantic layers, ontologies, context graphs - what should data teams build in 2026?</em></p></li><li><p><em>Who owns context as meaning evolves?</em></p></li><li><p><em>Where should that context live: in the warehouse, inside agents, or in a dedicated context layer?</em></p></li></ul><p><strong><a href="https://atlan.com/great-data-debate-2026/?utm_source=DEW+&amp;utm_medium=Substack&amp;utm_campaign=DEW_GDD">Register Here</a></strong></p><div><hr></div><h1>Why Data Engineers Must Think in Graphs, Not Just Tables</h1><p>If you have been following the &#8220;Systems of Record&#8221; debate on tech Twitter, you likely saw the clash between the &#8220;Agents kill SaaS&#8221; camp and the &#8220;Long live the Database&#8221; camp. But for data engineers, the reality is more nuanced&#8212;and far more interesting.</p><p>As we move from dashboards to autonomous agents, we are hitting a wall. It turns out that knowing the <em>state</em> (what happened) is not the same as knowing the <em>reasoning</em> (why it happened).</p><p>Drawing on recent insights from Foundation Capital, Jamin Ball (Altimeter), OpenAI&#8217;s internal engineering team, and the TrustGraph manifesto, this post explores the emergence of the <strong>Context Graph</strong>. This missing architectural layer will likely redefine how we build data platforms in the agentic era.</p><div><hr></div><h1>The Problem: State Machines vs. Decision Traces</h1><p>For the past decade, our role as data engineers has been to centralize data in the warehouse (or Lakehouse). We built ETL pipelines to move data from Salesforce, NetSuite, and Zendesk into a &#8220;Single Source of Truth.&#8221;</p><p>However, traditional Systems of Record (SoR) effectively act as &#8220;state machines.&#8221; They record the final output: the organization closed a deal, applied a discount, and escalated a ticket. But they fail to capture the <strong>decision traces</strong>.</p><p>As Foundation Capital notes, the <em>reasoning</em> behind a decision&#8212;the Slack threads, the cross-system synthesis, the VP&#8217;s verbal override of a policy&#8212;is rarely captured in the database. A CRM might show a &#8220;20% discount,&#8221; but it won&#8217;t tell an AI agent <em>why</em> that exception was granted (e.g., &#8220;Customer represents a strategic entry into the APAC market&#8221;).</p><p>Without these traces, agents fly blind. They have the rules (&#8221;Do not give discounts &gt;10%&#8221;), but they lack historical context on when and why they were violated.</p><div><hr></div><h1>The Solution: The Truth Registry and the Context Graph</h1><p>To address this, we observe a bifurcation in the modern data stack, as illustrated by the <strong>Hybrid Agentic Architecture</strong> (see Figure 1 below).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3e6i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3e6i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!3e6i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!3e6i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!3e6i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3e6i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3e6i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!3e6i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!3e6i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!3e6i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50511770-69e0-465d-bcba-3f5078763d59_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This architecture consists of two distinct but integrated planes:</p><h2>1. The Warehouse as the &#8220;Truth Registry.&#8221;</h2><p>Jamin Ball argues that systems of record aren&#8217;t dying; they are becoming &#8220;boring, rock-solid sources of truth&#8221;. In an agentic world, the warehouse must evolve into a <strong>&#8220;Truth Registry&#8221;</strong> that encodes semantic contracts.</p><p>Agents are fragile. If an agent hallucinates the definition of &#8220;Churn,&#8221; it can automate disastrous decisions. Therefore, we must clean and canonize data <em>before</em> the agent sees it. In the architecture above, the flow is from <strong>Raw (Variant)</strong> to <strong>Silver (Extracted)</strong> to <strong>Gold (Canonical Model)</strong>.</p><ul><li><p><strong>Engineering Takeaway:</strong> You cannot feed agents raw JSON blobs. Extracting variant columns into typed, named columns in the Silver layer is critical. It transforms &#8220;available data&#8221; into &#8220;governed data,&#8221; preventing agents from guessing schemas at runtime.</p></li></ul><h2>2. The Context Graph as the &#8220;Reasoning Layer.&#8221;</h2><p>While the warehouse handles facts, the <strong>Context Graph</strong> handles relationships. TrustGraph defines a context graph as a &#8220;triples-representation of data (Subject &#8594; Predicate &#8594; Object) optimized for AI&#8221;.</p><p>Why a graph? Because <strong>structure is information</strong>. When you feed an LLM structured data (like RDF or Cypher), the structure itself encodes meaning. This allows agents to traverse relationships that SQL joins struggle to represent&#8212;stitching together a user&#8217;s support ticket, their billing status, and their web activity into a single, queryable context.</p><div><hr></div><h1>Case Study: Inside OpenAI&#8217;s Data Agent</h1><p>OpenAI recently reported that standard metadata was insufficient for their internal data agent. They had to build a custom &#8220;Context Layer&#8221; that closely resembles the architecture above.</p><p>Their agent failed when it relied solely on table schemas. To fix this, they added:</p><ol><li><p><strong>Human Annotations:</strong> Curated descriptions of what tables <em>actually</em> mean (e.g., &#8220;This table excludes logged-out users&#8221;).</p></li><li><p><strong>Code Enrichment:</strong> They used &#8220;Codex&#8221; to crawl their own codebase, understanding data lineage not just by metadata, but by reading the pipelines that produced the data.</p></li></ol><p>This confirms a major trend: <strong>The metadata </strong><em><strong>is</strong></em><strong> the model.</strong> Providing agents with a semantic ontology (machine-readable definitions of terms) is just as important as the data itself.</p><div><hr></div><h1>The &#8220;Front Door&#8221; is Moving</h1><p>The implications for the industry are massive. Historically, if you owned the System of Record (like Salesforce), you owned the &#8220;Front Door&#8221; (the UI).</p><p>But as agents take over workflows, the UI is unbundling from the data. Jamin Ball compares this to the travel industry: <strong>GDS systems</strong> (Sabre, Amadeus) remained the backend source of truth, but <strong>Online Travel Agencies</strong> (Expedia, Booking) captured the front door&#8212;and the value.</p><p>In our new stack, the <strong>Agents</strong> become the OTAs. They are the new interface. The Warehouse/Lakehouse becomes the GDS&#8212;the invisible, essential infrastructure layer.</p><div><hr></div><h1>What This Means for Data Engineers</h1><ol><li><p><strong>Stop Hoarding State, Start capturing Traces:</strong> We need to instrument our systems to emit &#8220;decision traces&#8221; on every run. If an agent (or human) makes a decision, record the <em>inputs</em> and the <em>logic</em> used, not just the result.</p></li><li><p><strong>The Rise of the &#8220;Gold&#8221; Layer:</strong> Your dbt models are no longer just for dashboards. They are the safety rails for autonomous agents. Strict typing, &#8220;Gold&#8221; tables, and canonical definitions are non-negotiable.</p></li><li><p><strong>Graph Literacy:</strong> You don&#8217;t need to be a Neo4j expert, but understanding the basics of triples (Subject-Predicate-Object) and ontologies is becoming a core DE skill.</p></li><li><p><strong>Extract Your Semi-Structured/Unstructured Data:</strong>&nbsp;As shown in the architecture diagram, leaving data in unstructured blobs is a liability. Agents need explicit structure to reason safely.</p></li></ol><p>As agents grow more capable, the infrastructure beneath them must evolve. The Context Graph offers a powerful new foundation&#8212;not just for smarter agents, but for more transparent, explainable, and aligned systems. It&#8217;s time for data teams to build not just pipelines, but reasoning engines.</p><div><hr></div><h1>References</h1><p><strong><a href="https://x.com/KirkMarple/status/2003944353342149021">https://x.com/KirkMarple/status/2003944353342149021</a></strong></p><p><strong><a href="https://x.com/KirkMarple/status/2005443843848856047">https://x.com/KirkMarple/status/2005443843848856047</a></strong></p><p><strong><a href="https://foundationcapital.com/context-graphs-ais-trillion-dollar-opportunity/">https://foundationcapital.com/context-graphs-ais-trillion-dollar-opportunity/</a></strong></p><p><strong><a href="https://trustgraph.ai/news/context-graph-manifesto/">https://trustgraph.ai/news/context-graph-manifesto/</a></strong></p><p><strong><a href="https://openai.com/index/inside-our-in-house-data-agent/">https://openai.com/index/inside-our-in-house-data-agent/</a></strong></p><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:181366171,&quot;url&quot;:&quot;https://cloudedjudgement.substack.com/p/clouded-judgement-121225-long-live&quot;,&quot;publication_id&quot;:56878,&quot;publication_name&quot;:&quot;Clouded Judgement&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!UZpO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a0e019-4ab6-4db8-b07a-5fe763dd3fab_669x669.png&quot;,&quot;title&quot;:&quot;Clouded Judgement 12.12.25 - Long Live Systems of Record&quot;,&quot;truncated_body_text&quot;:&quot;Every week I&#8217;ll provide updates on the latest trends in cloud software companies. Follow along to stay up to date!&quot;,&quot;date&quot;:&quot;2025-12-12T14:03:40.604Z&quot;,&quot;like_count&quot;:203,&quot;comment_count&quot;:12,&quot;bylines&quot;:[{&quot;id&quot;:11803623,&quot;name&quot;:&quot;Jamin Ball&quot;,&quot;handle&quot;:&quot;cloudedjudgement&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/94fea488-4be3-4043-9fdc-62e3018a3163_297x297.jpeg&quot;,&quot;bio&quot;:&quot;Venture Capitalist investing in enterprise software businesses&quot;,&quot;profile_set_up_at&quot;:&quot;2021-09-08T18:11:34.947Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-06-22T02:56:08.332Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:210259,&quot;user_id&quot;:11803623,&quot;publication_id&quot;:56878,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:56878,&quot;name&quot;:&quot;Clouded Judgement&quot;,&quot;subdomain&quot;:&quot;cloudedjudgement&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Weekly data driven analysis of SaaS companies &quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f3a0e019-4ab6-4db8-b07a-5fe763dd3fab_669x669.png&quot;,&quot;author_id&quot;:11803623,&quot;primary_user_id&quot;:11803623,&quot;theme_var_background_pop&quot;:&quot;#d10000&quot;,&quot;created_at&quot;:&quot;2020-06-16T19:15:55.639Z&quot;,&quot;email_from_name&quot;:&quot;Clouded Judgement by Jamin Ball&quot;,&quot;copyright&quot;:&quot;Jamin Ball&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:null,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://cloudedjudgement.substack.com/p/clouded-judgement-121225-long-live?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!UZpO!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a0e019-4ab6-4db8-b07a-5fe763dd3fab_669x669.png" loading="lazy"><span class="embedded-post-publication-name">Clouded Judgement</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Clouded Judgement 12.12.25 - Long Live Systems of Record</div></div><div class="embedded-post-body">Every week I&#8217;ll provide updates on the latest trends in cloud software companies. Follow along to stay up to date&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">3 months ago &#183; 203 likes &#183; 12 comments &#183; Jamin Ball</div></a></div><div class="embedded-post-wrap" data-attrs="{&quot;id&quot;:181913265,&quot;url&quot;:&quot;https://cloudedjudgement.substack.com/p/clouded-judgement-121925-the-front&quot;,&quot;publication_id&quot;:56878,&quot;publication_name&quot;:&quot;Clouded Judgement&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!UZpO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a0e019-4ab6-4db8-b07a-5fe763dd3fab_669x669.png&quot;,&quot;title&quot;:&quot;Clouded Judgement 12.19.25 - The System of Record's Front Door&quot;,&quot;truncated_body_text&quot;:&quot;Every week I&#8217;ll provide updates on the latest trends in cloud software companies. Follow along to stay up to date!&quot;,&quot;date&quot;:&quot;2025-12-19T14:04:27.561Z&quot;,&quot;like_count&quot;:61,&quot;comment_count&quot;:10,&quot;bylines&quot;:[{&quot;id&quot;:11803623,&quot;name&quot;:&quot;Jamin Ball&quot;,&quot;handle&quot;:&quot;cloudedjudgement&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/94fea488-4be3-4043-9fdc-62e3018a3163_297x297.jpeg&quot;,&quot;bio&quot;:&quot;Venture Capitalist investing in enterprise software businesses&quot;,&quot;profile_set_up_at&quot;:&quot;2021-09-08T18:11:34.947Z&quot;,&quot;reader_installed_at&quot;:&quot;2024-06-22T02:56:08.332Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:210259,&quot;user_id&quot;:11803623,&quot;publication_id&quot;:56878,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:true,&quot;publication&quot;:{&quot;id&quot;:56878,&quot;name&quot;:&quot;Clouded Judgement&quot;,&quot;subdomain&quot;:&quot;cloudedjudgement&quot;,&quot;custom_domain&quot;:null,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Weekly data driven analysis of SaaS companies &quot;,&quot;logo_url&quot;:&quot;https://bucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com/public/images/f3a0e019-4ab6-4db8-b07a-5fe763dd3fab_669x669.png&quot;,&quot;author_id&quot;:11803623,&quot;primary_user_id&quot;:11803623,&quot;theme_var_background_pop&quot;:&quot;#d10000&quot;,&quot;created_at&quot;:&quot;2020-06-16T19:15:55.639Z&quot;,&quot;email_from_name&quot;:&quot;Clouded Judgement by Jamin Ball&quot;,&quot;copyright&quot;:&quot;Jamin Ball&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:null,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:null,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:null,&quot;paidPublicationIds&quot;:[],&quot;subscriber&quot;:null}}],&quot;utm_campaign&quot;:null,&quot;belowTheFold&quot;:true,&quot;type&quot;:&quot;newsletter&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="EmbeddedPostToDOM"><a class="embedded-post" native="true" href="https://cloudedjudgement.substack.com/p/clouded-judgement-121925-the-front?utm_source=substack&amp;utm_campaign=post_embed&amp;utm_medium=web"><div class="embedded-post-header"><img class="embedded-post-publication-logo" src="https://substackcdn.com/image/fetch/$s_!UZpO!,w_56,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Ff3a0e019-4ab6-4db8-b07a-5fe763dd3fab_669x669.png" loading="lazy"><span class="embedded-post-publication-name">Clouded Judgement</span></div><div class="embedded-post-title-wrapper"><div class="embedded-post-title">Clouded Judgement 12.19.25 - The System of Record's Front Door</div></div><div class="embedded-post-body">Every week I&#8217;ll provide updates on the latest trends in cloud software companies. Follow along to stay up to date&#8230;</div><div class="embedded-post-cta-wrapper"><span class="embedded-post-cta">Read more</span></div><div class="embedded-post-meta">3 months ago &#183; 61 likes &#183; 10 comments &#183; Jamin Ball</div></a></div>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #254]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-254</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-254</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 26 Jan 2026 04:12:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_25_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lFOo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!lFOo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!lFOo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!lFOo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lFOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26077,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_25_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185800885?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lFOo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!lFOo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!lFOo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!lFOo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d9b37f3-97c3-4a24-a589-1b1e6666710e_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLMs are transforming software development, but integrating them into real projects can be tricky when models don&#8217;t understand your codebase, pipelines, or conventions.<br><br>Join Dagster on Tuesday, January 27th, for a practical look at data engineering best practices, common pitfalls, and live demos of LLM developments.</p><p><strong><a href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_25_data_engineering_weekly">Register now</a></strong></p><div><hr></div><h1><em><strong>Debate Alert: Ontology vs Context Graphs vs Semantic Layers: What Will AI Need in 2026?</strong></em></h1><p><em>What are Context Graphs, and why are builders, VCs, and operators calling it the next $1T opportunity? Join The Great Data Debate to get answers to questions the data &amp; AI industry is so curious about right now:</em></p><ul><li><p><em>Where does context materialize in practice? Who owns context as meaning evolves?</em></p></li><li><p><em>Semantic layers, ontologies, context graphs - what should data teams build in 2026?</em></p></li><li><p><em>Where should that context live: in the warehouse, inside agents, or in a dedicated context layer?</em></p></li></ul><p><em>Join <strong>Bob Muglia</strong>(former CEO, Snowflake), <strong>Karthik Ravindran</strong>(GM, Microsoft), <strong>Tony Gentilcore</strong>(Co-founder, Glean), <strong>Prukalpa Sankar</strong>(Co-founder, Atlan), and <strong>Jaya Gupta</strong>(Foundation Capital) for an open discussion on what data teams should actually build next.</em></p><p><em><strong><a href="https://atlan.com/great-data-debate-2026/?utm_source=DEW+&amp;utm_medium=Substack&amp;utm_campaign=DEW_GDD">Register: Feb 5 &#183; Virtual &#183; 11 AM ET</a></strong></em></p><div><hr></div><h1>Mark Rittman: Why We&#8217;ve Tried to Replace Data Analytics Developers Every Decade Since 1974</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!orSS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!orSS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic 424w, https://substackcdn.com/image/fetch/$s_!orSS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic 848w, https://substackcdn.com/image/fetch/$s_!orSS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic 1272w, https://substackcdn.com/image/fetch/$s_!orSS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!orSS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic" width="1360" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1360,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14087,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185800885?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!orSS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic 424w, https://substackcdn.com/image/fetch/$s_!orSS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic 848w, https://substackcdn.com/image/fetch/$s_!orSS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic 1272w, https://substackcdn.com/image/fetch/$s_!orSS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b983c-5936-4259-a3b4-d7ad04eda04e_1360x768.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p>Perhaps the recurring dream of replacing data analytics developers isn&#8217;t a mistake. Perhaps it&#8217;s a necessary optimism that drives tool creation.</p></blockquote><p>Developers are always hungry for the next big abstractions. The article is an excellent reminder that Semantic layers didn&#8217;t make metric definition trivial, but they made it more maintainable. Self-service BI didn&#8217;t remove IT from the equation, but it expanded who could explore data.</p><p><strong><a href="https://blog.rittmananalytics.com/why-weve-tried-to-replace-data-analytics-developers-every-decade-since-1974-5c0de5a05088">https://blog.rittmananalytics.com/why-weve-tried-to-replace-data-analytics-developers-every-decade-since-1974-5c0de5a05088</a></strong></p><div><hr></div><h1>Alibaba: AI Trends Reshaping Data Engineering in 2026</h1><p>Alibaba writes that data engineering is evolving from data movement to building intelligent, autonomous systems in which data serves as a continuously learning capability. The field is getting reshaped by unified data&#8211;AI platforms, self-healing operations, context engineering for AI agents, real-time and multimodal pipelines, and privacy-first approaches such as synthetic data and federated learning.</p><p><strong><a href="https://www.alibabacloud.com/blog/ai-trends-reshaping-data-engineering-in-2026_602816">https://www.alibabacloud.com/blog/ai-trends-reshaping-data-engineering-in-2026_602816</a></strong></p><div><hr></div><h1>Thoughtworks: The state of data mesh in 2026: From hype to hard-won maturity</h1><p>As with any scientific theories, it starts with a half-finished, widely debated theory and matures over time. Data Mesh and Data Contract are two such theories; though widely adopted by many companies internally, they still have much room to mature.  ThoughtWorks writes an excellent article about the current state of data mesh adoption. </p><p><strong><a href="https://www.thoughtworks.com/insights/blog/data-strategy/the-state-of-data-mesh-in-2026-from-hype-to-hard-won-maturity">https://www.thoughtworks.com/insights/blog/data-strategy/the-state-of-data-mesh-in-2026-from-hype-to-hard-won-maturity</a></strong></p><div><hr></div><h1>Sponsored: The AI Modernization Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=01_25_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NwaO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!NwaO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!NwaO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!NwaO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NwaO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15511,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=01_25_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185800885?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NwaO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!NwaO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!NwaO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!NwaO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2479aabb-0ed2-4aba-b72f-e6a77a8e19dc_2400x1260.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.</p><p><br>- Transform from Big Complexity to AI-ready architecture<br>- Real metrics from organizations achieving 50% cost reductions<br>- Introduction to Dagster Components: YAML-first pipelines that AI can build</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=01_25_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Anthropic: Demystifying evals for AI agents</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fnpl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fnpl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic 424w, https://substackcdn.com/image/fetch/$s_!fnpl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic 848w, https://substackcdn.com/image/fetch/$s_!fnpl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic 1272w, https://substackcdn.com/image/fetch/$s_!fnpl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fnpl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32650,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185800885?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fnpl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic 424w, https://substackcdn.com/image/fetch/$s_!fnpl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic 848w, https://substackcdn.com/image/fetch/$s_!fnpl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic 1272w, https://substackcdn.com/image/fetch/$s_!fnpl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8092f781-174a-4c01-be40-bae48c63c054_3840x2161.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>AI agent development faces evaluation gaps as systems evolve from single-turn prompts to autonomous, multi-step tool use. The article defines agent evals around task outcomes rather than transcripts, using repeated trials, hybrid grading methods, and reliability metrics such as pass@k and pass^k to capture non-deterministic behavior. Starting with small, failure-driven task sets and layering automated evals with production monitoring and human review enables teams to iterate faster, adopt new models safely, and maintain consistent agent reliability.</p><p><strong><a href="https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents">https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents</a></strong></p><div><hr></div><h1>Booking.com: AI Agent Evaluation - practical tips at Booking.com</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eKVJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eKVJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic 424w, https://substackcdn.com/image/fetch/$s_!eKVJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic 848w, https://substackcdn.com/image/fetch/$s_!eKVJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic 1272w, https://substackcdn.com/image/fetch/$s_!eKVJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eKVJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic" width="1124" height="889" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:889,&quot;width&quot;:1124,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185800885?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eKVJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic 424w, https://substackcdn.com/image/fetch/$s_!eKVJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic 848w, https://substackcdn.com/image/fetch/$s_!eKVJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic 1272w, https://substackcdn.com/image/fetch/$s_!eKVJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F257d9e6b-6fa3-440c-ab07-6accb2405bf7_1124x889.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Evaluating autonomous AI agents is challenging because simple prompt-based metrics fail to capture whether user intents are reliably fulfilled. The article presents a dual approach combining black-box evaluation focused on task completion using LLM-as-a-judge with glass-box evaluation that inspects tool selection, syntax, and intermediate agent decisions. Benchmarking agents against simpler baselines and measuring consistency across repeated queries helps teams justify added complexity, control costs, and assess production readiness.</p><p><strong><a href="https://booking.ai/ai-agent-evaluation-82e781439d97">https://booking.ai/ai-agent-evaluation-82e781439d97</a></strong></p><div><hr></div><h1>LinkedIn: Reimagining LinkedIn&#8217;s search tech stack</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cDF_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cDF_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic 424w, https://substackcdn.com/image/fetch/$s_!cDF_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic 848w, https://substackcdn.com/image/fetch/$s_!cDF_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic 1272w, https://substackcdn.com/image/fetch/$s_!cDF_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cDF_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic" width="1200" height="744" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:744,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18561,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185800885?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cDF_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic 424w, https://substackcdn.com/image/fetch/$s_!cDF_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic 848w, https://substackcdn.com/image/fetch/$s_!cDF_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic 1272w, https://substackcdn.com/image/fetch/$s_!cDF_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d32c84f-b93a-4142-a357-5ffe41156bcc_1200x744.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Traditional keyword-based search struggles to understand user intent, handle natural language queries, and bridge vocabulary gaps at scale. The article describes LinkedIn&#8217;s shift to a semantic search stack built on LLM-based query understanding, embedding-based retrieval, and small-language-model ranking, supported by LLM judges, model distillation, and continuous relevance measurement. </p><p><strong><a href="https://www.linkedin.com/blog/engineering/search/reimagining-linkedins-search-stack">https://www.linkedin.com/blog/engineering/search/reimagining-linkedins-search-stack</a></strong></p><div><hr></div><h1>Stefan Kecskes: Kafka Dead Letter Queue Triage: Debugging 25,000 Failed Messages</h1><p>The DLQ pattern in Event processing is well-known and somewhat widely adopted. However, I&#8217;m delighted to read about what happens after a message enters a failed state for the first time. Discard or fix it? The author provides practical tips for analyzing DLQ messages, cautions, and resiliency practices before rebroadcasting the message. </p><p><strong><a href="https://skey.uk/post/kafka-dead-letter-queue-troubleshooting-guide/">https://skey.uk/post/kafka-dead-letter-queue-troubleshooting-guide/</a></strong></p><div><hr></div><h1>Teads: The End of the Dashboard as We Know It: Designing for Insight in the Age of AI</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M-3h!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M-3h!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic 424w, https://substackcdn.com/image/fetch/$s_!M-3h!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic 848w, https://substackcdn.com/image/fetch/$s_!M-3h!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic 1272w, https://substackcdn.com/image/fetch/$s_!M-3h!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M-3h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic" width="1200" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:17429,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185800885?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M-3h!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic 424w, https://substackcdn.com/image/fetch/$s_!M-3h!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic 848w, https://substackcdn.com/image/fetch/$s_!M-3h!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic 1272w, https://substackcdn.com/image/fetch/$s_!M-3h!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c5c7f9a-2513-424e-bd74-51154a6e7d9c_1200x800.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Static dashboards fail to support decision-making because they require users to interpret large volumes of data manually. The article argues for AI-driven dashboards that act as proactive, conversational assistants by focusing on scenarios, personalized insights, and transparent recommendations rather than fixed screens. Measuring success by decisions enabled instead of data displayed reframes dashboards as adaptive systems that guide action and improve human&#8211;AI collaboration over time.</p><p><strong><a href="https://medium.com/teads-engineering/the-end-of-the-dashboard-as-we-know-it-designing-for-insight-in-the-age-of-ai-fec16bddf677">https://medium.com/teads-engineering/the-end-of-the-dashboard-as-we-know-it-designing-for-insight-in-the-age-of-ai-fec16bddf677</a></strong></p><div><hr></div><h1>Flipkart: High-Risk, High-Scale: Guaranteeing Ad Budget Precision at 1 Million Events/Second</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!okI5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!okI5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic 424w, https://substackcdn.com/image/fetch/$s_!okI5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic 848w, https://substackcdn.com/image/fetch/$s_!okI5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic 1272w, https://substackcdn.com/image/fetch/$s_!okI5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!okI5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic" width="1024" height="454" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:454,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14835,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185800885?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!okI5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic 424w, https://substackcdn.com/image/fetch/$s_!okI5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic 848w, https://substackcdn.com/image/fetch/$s_!okI5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic 1272w, https://substackcdn.com/image/fetch/$s_!okI5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F911a9812-5600-45ad-95e0-c3b88fb213d3_1024x454.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Real-time ad budget enforcement at scale is risky because latency can cause advertiser overspend, revenue loss, and delayed mobile events to be miscounted. The article describes Flipkart Ads&#8217; architecture that separates real-time enforcement from batch settlement using Apache Flink, stateful deduplication with RocksDB, and event-time processing with watermarking. Prioritizing availability over strict consistency enables the system to process nearly one million events per second while ensuring accurate budget capping and final financial reconciliation.</p><p><strong><a href="https://blog.flipkart.tech/high-risk-high-scale-guaranteeing-ad-budget-precision-at-1-million-events-second-cc23977796d7">https://blog.flipkart.tech/high-risk-high-scale-guaranteeing-ad-budget-precision-at-1-million-events-second-cc23977796d7</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Contracts: A Missed Opportunity]]></title><description><![CDATA[The Conversation We Should Have Had&#8212;Before Thought Leadership Replaced System Design]]></description><link>https://www.dataengineeringweekly.com/p/data-contracts-a-missed-opportunity</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-contracts-a-missed-opportunity</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Tue, 20 Jan 2026 18:31:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1ja9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Over the last couple of years, the data industry has been having a conversation about data contracts that never quite went where they needed to.</p><p>There was no shortage of activity around the topic. Definitions were proposed and refined. Conceptual boundaries were drawn and redrawn. Data contracts were compared to APIs, governance frameworks, data mesh primitives, and ideas teams already &#8220;sort of&#8221; implemented in practice. </p><blockquote><p><em>The discussion was energetic and well-intentioned, but it tended to stay at the level of classification rather than construction.</em></p></blockquote><p>What was largely absent was sustained engagement with the engineering consequences of taking data contracts seriously. Questions about enforcement, evolution, compatibility, and failure modes appeared only briefly before the conversation moved on. The result was an industry consensus that data contracts were &#8220;interesting,&#8221; without a shared understanding of what it would actually mean to build platforms around them.</p><p>In hindsight, this matters&#8212;not because the debate was unproductive, but because of what happened in parallel.</p><div><hr></div><h2><strong>The Shift Happening Elsewhere</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7mSh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7mSh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png 424w, https://substackcdn.com/image/fetch/$s_!7mSh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png 848w, https://substackcdn.com/image/fetch/$s_!7mSh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png 1272w, https://substackcdn.com/image/fetch/$s_!7mSh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7mSh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png" width="1456" height="801" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:801,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7mSh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png 424w, https://substackcdn.com/image/fetch/$s_!7mSh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png 848w, https://substackcdn.com/image/fetch/$s_!7mSh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png 1272w, https://substackcdn.com/image/fetch/$s_!7mSh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dcd1ddb-f6d6-4190-a234-23cc6febd34d_2048x1126.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While the data community was debating what data contracts <em>were</em>, the software engineering world was converging&#8212;quietly and pragmatically&#8212;on a different organizing principle: <strong>specifications as the primary unit of system design</strong>.</p><p>This wasn&#8217;t a philosophical shift so much as an operational one. As systems became more distributed, more automated, and more interdependent, informal agreements stopped scaling. Documentation drifted. Assumptions diverged. Human coordination became the bottleneck.</p><p>The response was not more process, but more precision.</p><p>APIs began with schemas rather than code. Infrastructure moved from scripts to declarative specifications. Compatibility rules were automatically encoded and enforced. In these systems, the specification was no longer an artifact produced alongside the system&#8212;it <em>was</em> the system.</p><p>More recently, AI agents have accelerated this trend. Agents do not operate on intent, convention, or context. They operate on explicit, machine-readable, and verifiable data. Where specifications exist, agents can reason deterministically. Where they do not, agents approximate&#8212;and approximation is rarely acceptable in core infrastructure.</p><p>This is where the connection to data contracts becomes unavoidable.</p><div><hr></div><h2><strong>Data Contracts as Specifications, Not Concepts</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1ja9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1ja9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!1ja9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!1ja9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!1ja9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1ja9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1ja9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png 424w, https://substackcdn.com/image/fetch/$s_!1ja9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png 848w, https://substackcdn.com/image/fetch/$s_!1ja9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!1ja9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe050308d-37ed-4cde-8b4e-ce1dfb8c784e_1536x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Viewed through the lens of spec-driven development, data contracts stop looking like a data-specific innovation and become a familiar pattern applied to a different domain.</p><p>A properly implemented data contract is a specification:</p><ul><li><p>It defines structure, semantics, and invariants</p></li><li><p>It establishes compatibility guarantees over time.</p></li><li><p>It is versioned, validated, and enforced programmatically.</p></li><li><p>It creates a stable interface between independently evolving systems.</p></li></ul><p>This is exactly what spec-driven development has optimized for in software engineering.</p><p>The difference is not conceptual&#8212;it is operational. Software engineering treated specifications as executable constraints. The data industry often treated contracts as descriptive artifacts. As a result, contracts were discussed as governance tools or communication mechanisms, rather than as interfaces with failure semantics.</p><p>That framing limited how far the idea could go.</p><div><hr></div><h2><strong>Where the Two Worlds Diverged</strong></h2><p>Spec-driven systems force early clarity around hard problems.</p><ul><li><p>How does change propagate?</p></li><li><p>What is allowed to evolve independently?</p></li><li><p>What breaks compatibility, and how is that detected?</p></li><li><p>Where does enforcement occur, and what happens when it fails?</p></li></ul><p>In software systems, these questions are answered in code and tooling. In data systems, they were often answered socially. Producer&#8211;consumer agreements existed, but they lived in tickets, meetings, and tribal knowledge rather than in executable form.</p><p>Many teams compensated by building partial solutions: strong schemas, upstream quality checks, and informal SLAs. These patterns worked, but they relied heavily on human intervention. They were resilient, but not legible to machines.</p><p>As long as humans were the primary integrators, this was manageable. As soon as AI agents enter the workflow, it becomes a constraint.</p><div><hr></div><h2><strong>Why Spec-Driven Thinking Changes the Next Phase</strong></h2><p>AI agents make an implicit demand of data platforms: <strong>make your rules explicit</strong>.</p><p>Agents can generate schemas, propose transformations, reason about compatibility, and enforce policy&#8212;but only if the platform exposes contracts in a form they can execute against. Without that, agents revert to inference, which introduces uncertainty precisely where determinism is required.</p><p>This is the practical implication of spec-driven development for data engineering. It&#8217;s not about adopting a new paradigm. It&#8217;s about recognizing that the platform already behaves like a system of interfaces&#8212;and formalizing those interfaces accordingly.</p><p>Teams that have already internalized contract discipline will find this transition incremental. Teams that have not will experience it as friction.</p><div><hr></div><h2><strong>What We Should Do Next</strong></h2><p>At this point, the terminology matters less than the mechanics.</p><p>Whether we call them data contracts, data interfaces, or executable schemas, the path forward is the same:</p><ul><li><p>Treat schemas as specifications, not documentation</p></li><li><p>Encode quality, semantics, and compatibility as executable rules</p></li><li><p>Enforce contracts at clear system boundaries, preferably early.</p></li><li><p>Version data interfaces with the same rigor as APIs</p></li><li><p>Make ownership and accountability explicit and machine-readable.<br></p></li></ul><p>This is not about adding process. It is about making systems legible to other systems.</p><div><hr></div><h2><strong>Closing Thought</strong></h2><p>The original data contracts conversation wasn&#8217;t wrong. It just stopped too early.</p><p>Spec-driven development has shown that explicit, enforceable interfaces are not optional in complex, automated systems. Data platforms are now at that same inflection point.</p><p>Data contracts were never the destination.</p><p>They were the missing layer that would have made everything else easier to build.</p><p>The opportunity is still there&#8212;but only if we&#8217;re willing to treat contracts as infrastructure, not ideas.</p><h2>References</h2><p><em><strong><a href="https://www.thoughtworks.com/insights/podcasts/technology-podcasts/data-contracts-what-why">https://www.thoughtworks.com/insights/podcasts/technology-podcasts/data-contracts-what-why</a></strong></em></p><p><strong><a href="https://airbyte.com/data-engineering-resources/data-contracts">https://airbyte.com/data-engineering-resources/data-contracts</a></strong></p><p><strong><a href="https://soda.io/blog/what-are-data-contracts">https://soda.io/blog/what-are-data-contracts</a></strong></p><p><strong><a href="https://atlan.com/data-contracts/">https://atlan.com/data-contracts/</a></strong></p><p><strong><a href="https://en.wikipedia.org/wiki/Spec-driven_development">https://en.wikipedia.org/wiki/Spec-driven_development</a></strong></p><p><strong><a href="https://medium.com/software-architecture-in-the-age-of-ai/why-interfaces-and-contracts-are-not-the-same-and-why-that-matters-with-10-examples-408524f6d17c">https://medium.com/software-architecture-in-the-age-of-ai/why-interfaces-and-contracts-are-not-the-same-and-why-that-matters-with-10-examples-408524f6d17c</a></strong></p><p><strong><a href="https://arxiv.org/abs/2507.21056">https://arxiv.org/abs/2507.21056</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #253]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-253</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-253</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 19 Jan 2026 05:20:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=01_18_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eded!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!eded!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!eded!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!eded!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eded!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic" width="1456" height="764" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:764,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28626,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=01_18_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eded!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic 424w, https://substackcdn.com/image/fetch/$s_!eded!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic 848w, https://substackcdn.com/image/fetch/$s_!eded!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic 1272w, https://substackcdn.com/image/fetch/$s_!eded!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ce3b171-dc97-4fbf-90ec-9af339e459bd_2400x1260.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Modernize your data platform for the age of AI.</h1><p>While 75% of enterprises experiment with AI, traditional data platforms are becoming the biggest bottleneck. Learn how to build a unified control plane that enables AI-driven development, reduces pipeline failures, and cuts complexity.<br><br>- Transform from Big Complexity to AI-ready architecture<br>- Real metrics from organizations achieving 50% cost reductions<br>- Introduction to Dagster Components: YAML-first pipelines that AI can build</p><p><strong><a href="https://dagster.io/ai-modernization-guide?utm_campaign=34120047-26-01-DMND_AI%20Modernization&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=ai_modernization_guide&amp;utm_content=01_18_data_engineering_weekly">Get the guide</a></strong></p><div><hr></div><h1>Lance Martin: Effective Agent Design</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rHCD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rHCD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic 424w, https://substackcdn.com/image/fetch/$s_!rHCD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic 848w, https://substackcdn.com/image/fetch/$s_!rHCD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic 1272w, https://substackcdn.com/image/fetch/$s_!rHCD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rHCD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic" width="1200" height="571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:571,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:13081,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rHCD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic 424w, https://substackcdn.com/image/fetch/$s_!rHCD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic 848w, https://substackcdn.com/image/fetch/$s_!rHCD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic 1272w, https://substackcdn.com/image/fetch/$s_!rHCD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c2b5850-64d1-4b82-8c56-dea594363247_1200x571.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>An effective agent design largely boils down to context management. The author proposes design patterns to effectively build an agent, including providing filesystem and shell access to the agents, using a multi-layer action space, and offloading memory to a filesystem rather than keeping everything in the context window.</p><p><strong><a href="https://x.com/RLanceMartin/status/2009683038272401719">https://x.com/RLanceMartin/status/2009683038272401719</a></strong></p><div><hr></div><h1>M&#233;d&#233;ric Hurier: Architecting the AI Agent Platform: A Definitive Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4KVN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4KVN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic 424w, https://substackcdn.com/image/fetch/$s_!4KVN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic 848w, https://substackcdn.com/image/fetch/$s_!4KVN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic 1272w, https://substackcdn.com/image/fetch/$s_!4KVN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4KVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic" width="1456" height="252" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:252,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42843,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4KVN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic 424w, https://substackcdn.com/image/fetch/$s_!4KVN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic 848w, https://substackcdn.com/image/fetch/$s_!4KVN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic 1272w, https://substackcdn.com/image/fetch/$s_!4KVN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb59d3136-ec03-40ba-acf0-995d2177c2fb_8970x1550.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The industry is shifting from simple LLMs and RAG (Retrieval-Augmented Generation) to AI Agents. The author proposes a 7-layer logical container architecture to build an AI agent platform. The 7-layer container architecture organizes the AI agent platform into logical levels&#8212;Interaction, Development, Core, Foundation, Information, Observability, and Trust&#8212;to manage the complexity of building production-grade systems. The structure enforces a separation of concerns, ensuring that user interfaces, execution engines, data management, and security governance are handled independently yet cohesively. </p><p><strong><a href="https://mlops.community/architecting-the-ai-agent-platform-a-definitive-guide/">https://mlops.community/architecting-the-ai-agent-platform-a-definitive-guide/</a></strong></p><div><hr></div><h1>Tidepool: Stop using natural language interfaces</h1><p>The user experience of chatbot-driven enterprise application flow is taking center stage in the product design. The author argues that pure natural language interfaces are inefficient due to the high latency of LLMs (often taking tens of seconds to respond). Instead, the author proposes a hybrid approach in which the LLM dynamically generates structured Graphic User Interfaces (GUIs)&#8212;such as popups with checkboxes, sliders, and forms&#8212;to interact with the user.</p><p><strong><a href="https://tidepool.leaflet.pub/3mcbegnuf2k2i">https://tidepool.leaflet.pub/3mcbegnuf2k2i</a></strong></p><div><hr></div><h1>Sponsored: Best practices for LLM development</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_18_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cl4f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!cl4f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!cl4f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!cl4f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cl4f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28114,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_18_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cl4f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!cl4f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!cl4f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!cl4f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ed55042-919c-421e-90a6-b2f541a8f8a4_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LLMs are transforming software development, but integrating them into real projects can be tricky when models don&#8217;t understand your codebase, pipelines, or conventions.<br><br><strong><a href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_18_data_engineering_weekly">Join Dagster on January 27th</a></strong> for a practical look at data engineering best practices, common pitfalls, and live demos of LLM developments.</p><p><strong><a href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_18_data_engineering_weekly">Save your spot</a></strong></p><div><hr></div><h1>Microsoft: SQL Telemetry &amp; Intelligence &#8211; How we built a Petabyte-scale Data Platform with Fabric</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hbl4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hbl4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic 424w, https://substackcdn.com/image/fetch/$s_!Hbl4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic 848w, https://substackcdn.com/image/fetch/$s_!Hbl4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic 1272w, https://substackcdn.com/image/fetch/$s_!Hbl4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hbl4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56106,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hbl4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic 424w, https://substackcdn.com/image/fetch/$s_!Hbl4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic 848w, https://substackcdn.com/image/fetch/$s_!Hbl4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic 1272w, https://substackcdn.com/image/fetch/$s_!Hbl4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc81949dc-2722-4949-bc2d-5b1ebe2b3f18_4725x2658.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Microsoft writes about how the SQL Telemetry &amp; Intelligence (T&amp;I) team built a 10+ petabyte Data Lake using Microsoft Fabric, processing real-time data from global SQL Server engines. The focus on CI/CD pipelines, testing optimization, local development, and data quality &amp; observability is an interesting system read. </p><p><strong><a href="https://blog.fabric.microsoft.com/en-us/blog/sql-telemetry-intelligence-how-we-built-a-petabyte-scale-data-platform-with-fabric">https://blog.fabric.microsoft.com/en-us/blog/sql-telemetry-intelligence-how-we-built-a-petabyte-scale-data-platform-with-fabric</a></strong></p><div><hr></div><h1>Vikram Sreekanti &amp; Joseph E. Gonzalez: Data is your only moat</h1><p>The ease of adopting a tool enables data collection, which in turn creates a defensive advantage hard for competitors to replicate. The authors make a solid argument that, for enterprise applications, the moat isn&#8217;t just about volume but about specificity. By deeply integrating with a company&#8217;s legacy systems, a product gathers data on exactly how that specific customer works. This creates &#8220;stickiness&#8221;&#8212;replacing the tool becomes difficult because a new competitor wouldn&#8217;t have that accumulated knowledge of the company&#8217;s unique workflows.</p><p><strong><a href="https://frontierai.substack.com/p/data-is-your-only-moat">https://frontierai.substack.com/p/data-is-your-only-moat</a></strong></p><div><hr></div><h1>Uber: Apache Hudi&#8482; at Uber: Engineering for Trillion-Record-Scale Data Lake Operations</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LlAO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LlAO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic 424w, https://substackcdn.com/image/fetch/$s_!LlAO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic 848w, https://substackcdn.com/image/fetch/$s_!LlAO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic 1272w, https://substackcdn.com/image/fetch/$s_!LlAO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LlAO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic" width="960" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12278,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LlAO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic 424w, https://substackcdn.com/image/fetch/$s_!LlAO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic 848w, https://substackcdn.com/image/fetch/$s_!LlAO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic 1272w, https://substackcdn.com/image/fetch/$s_!LlAO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4fcb20a2-d87f-423d-ad39-8947486fab85_960x540.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Uber writes about the criticality of Apache Hudi in their overall data lake operations, enabling the management of trillion-record ingestion.  Uber highlighted the addition of record indexes, Which Enable O(1) record lookups and allow efficient updates on tables with hundreds of billions of rows. Personally, this is a pretty cool feature from Apache Hudi. </p><p><strong><a href="https://www.uber.com/en-IN/blog/apache-hudi-at-uber/">https://www.uber.com/en-IN/blog/apache-hudi-at-uber/</a></strong></p><div><hr></div><h1>Etsy: How Etsy Uses LLMs to Improve Search Relevance</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lLNb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lLNb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic 424w, https://substackcdn.com/image/fetch/$s_!lLNb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic 848w, https://substackcdn.com/image/fetch/$s_!lLNb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic 1272w, https://substackcdn.com/image/fetch/$s_!lLNb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lLNb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic" width="720" height="402" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:402,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:10051,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lLNb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic 424w, https://substackcdn.com/image/fetch/$s_!lLNb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic 848w, https://substackcdn.com/image/fetch/$s_!lLNb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic 1272w, https://substackcdn.com/image/fetch/$s_!lLNb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c2b3256-d9b0-4bd3-bf3b-3b30f25c4500_720x402.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Etsy writes about upgrading its search capabilities by using LLMs to focus on semantic relevance, which prioritizes understanding a buyer's true intent over simple click data. Etsy uses high-quality human and LLM annotations train a lightweight "student" model that runs in real time. This model actively filters and ranks search results, successfully increasing the percentage of fully relevant listings shown to shoppers.</p><p><strong><a href="https://www.etsy.com/codeascraft/how-etsy-uses-llms-to-improve-search-relevance">https://www.etsy.com/codeascraft/how-etsy-uses-llms-to-improve-search-relevance</a></strong></p><div><hr></div><h1>AWS: How Slack achieved operational excellence for Spark on Amazon EMR using generative AI</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_jjA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_jjA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic 424w, https://substackcdn.com/image/fetch/$s_!_jjA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic 848w, https://substackcdn.com/image/fetch/$s_!_jjA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic 1272w, https://substackcdn.com/image/fetch/$s_!_jjA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_jjA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic" width="1370" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:492,&quot;width&quot;:1370,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7476,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_jjA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic 424w, https://substackcdn.com/image/fetch/$s_!_jjA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic 848w, https://substackcdn.com/image/fetch/$s_!_jjA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic 1272w, https://substackcdn.com/image/fetch/$s_!_jjA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32fefe3e-34cd-4a20-99f0-948768a21011_1370x492.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Slack writes about reaching operational excellence by replacing manual debugging with a custom monitoring framework that captures over 40 granular metrics from its EMR clusters. Slack exposed this data to generative AI models via Amazon Bedrock and a Model Context Protocol (MCP) server, enabling tools like Claude Code to analyze performance and suggest optimal configurations automatically. This automated system reduced compute costs by 30&#8211;50% and slashed developers' time spent tuning jobs by over 90%.</p><p><strong><a href="https://aws.amazon.com/blogs/big-data/how-slack-achieved-operational-excellence-for-spark-on-amazon-emr-using-generative-ai/">https://aws.amazon.com/blogs/big-data/how-slack-achieved-operational-excellence-for-spark-on-amazon-emr-using-generative-ai/</a></strong></p><div><hr></div><h1>Agoda: How Agoda Enhanced the Uptime and Consistency of Financial Metrics</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t0gz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t0gz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic 424w, https://substackcdn.com/image/fetch/$s_!t0gz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic 848w, https://substackcdn.com/image/fetch/$s_!t0gz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic 1272w, https://substackcdn.com/image/fetch/$s_!t0gz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t0gz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic" width="1400" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14271,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/185026992?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t0gz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic 424w, https://substackcdn.com/image/fetch/$s_!t0gz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic 848w, https://substackcdn.com/image/fetch/$s_!t0gz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic 1272w, https://substackcdn.com/image/fetch/$s_!t0gz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c0ed9f5-1723-4a3e-867d-54eac6ba5f82_1400x813.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Agoda writes about addressing inconsistencies in its financial reporting by consolidating multiple disjointed data pipelines into a single Financial Unified Data Pipeline (FINUDP) built on Apache Spark. Agoda talks about approaches to ensure reliability and accuracy, including automated freshness monitoring, shadow testing for all code changes, and strict data contracts with upstream providers. </p><p><strong><a href="https://medium.com/agoda-engineering/how-agoda-enhanced-the-uptime-and-consistency-of-financial-metrics-ef7d54c4e4f0">https://medium.com/agoda-engineering/how-agoda-enhanced-the-uptime-and-consistency-of-financial-metrics-ef7d54c4e4f0</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #252]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-252</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-252</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 12 Jan 2026 02:38:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_04_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aQpJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42230,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_04_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!aQpJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1><strong>Best practices for LLM development</strong></h1><p>LLMs are transforming software development, but integrating them into real projects can be tricky when models don&#8217;t understand your codebase, pipelines, or conventions.<br><br>Join Dagster on January 27th for a practical look at data engineering best practices, common pitfalls, and live demos of LLM developments.</p><p><strong><a href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_04_data_engineering_weekly">Reserve your spot now.</a></strong></p><div><hr></div><h1>Foundation Capital: AI&#8217;s trillion-dollar opportunity: Context graphs</h1><blockquote><p>Agents are cross-system and action-oriented. The UX of work is separating from the underlying data plane. Agents become the interface, but something still has to be canonical underneath.</p></blockquote><p>This will be a core construct of the next evolution of data engineering. A scalable data infrastructure that gives a unified view of the system of records and the analytical data, past decision traces, and a system of record that accepts high concurrent modifications. The promise of agents holds, but I don&#8217;t think our underlying infrastructure is ready for it.  </p><p><strong><a href="https://foundationcapital.com/context-graphs-ais-trillion-dollar-opportunity/">https://foundationcapital.com/context-graphs-ais-trillion-dollar-opportunity/</a></strong></p><div><hr></div><h1>ThoughtWorks: How to build the organizational muscle needed to scale AI beyond PoCs</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e7mu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e7mu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic 424w, https://substackcdn.com/image/fetch/$s_!e7mu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic 848w, https://substackcdn.com/image/fetch/$s_!e7mu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic 1272w, https://substackcdn.com/image/fetch/$s_!e7mu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e7mu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic" width="960" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18792,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/184268483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e7mu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic 424w, https://substackcdn.com/image/fetch/$s_!e7mu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic 848w, https://substackcdn.com/image/fetch/$s_!e7mu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic 1272w, https://substackcdn.com/image/fetch/$s_!e7mu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d7f9197-a6ff-445b-a679-68645500077b_960x540.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Thoughtworks argues that AI initiatives fail to scale beyond pilots because organizations hit compliance hurdles, data silos, and lack stakeholder engagement&#8212;problems that require building "organizational muscle" rather than buying technology solutions. The article recommends a "thin slice" approach that addresses five building blocks simultaneously for a single use case: starting with clear business outcomes instead of technology, building tech platforms incrementally based on concrete needs, creating repeatable MLOps paths to production through cross-functional product teams, and investing in AI literacy and human-collaborative tool design to drive sustained adoption.</p><p><strong><a href="https://www.thoughtworks.com/insights/articles/how-to-build-organizational-muscle-needed-to-scale-AI">https://www.thoughtworks.com/insights/articles/how-to-build-organizational-muscle-needed-to-scale-AI</a></strong></p><div><hr></div><h1>Sharon Campbell-Crow: Multi-Agent Systems: The Architecture Shift from Monolithic LLMs to Collaborative Intelligence</h1><p>Developers are moving away from monolithic LLM &#8220;God Prompts&#8221; toward multi-agent systems because single models suffer from context limits and lack built-in self-critique. Multi-agent systems use specialized, sometimes adversarial agents, improving factual accuracy by up to 23%. </p><p>The article describes four architectures&#8212;LangGraph for graph-based control and auditability, AutoGen for event-driven distributed agents, CrewAI for role-based content workflows, and OpenAI Swarm for stateless, high-scale routing&#8212;along with production patterns such as planner&#8211;executor separation, memory streams for relevance, and deferred execution to manage cost and latency.</p><p><strong><a href="https://www.comet.com/site/blog/multi-agent-systems/">https://www.comet.com/site/blog/multi-agent-systems/</a></strong></p><div><hr></div><h1><strong>Sponsored: The Scaling Data Teams Guide</strong></h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=01_04_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GZPK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GZPK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44723,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=01_04_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!GZPK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Building and scaling a data platform has never been more important or more challenging. Whether you&#8217;re just starting to build a data platform or leading a mature data organization, this guide will help you scale your impact, accelerate your team, and prepare for the future of data-driven products.<br><br>Learn how real data teams, from solo practitioners to enterprise-scale organizations, build.</p><p><strong><a href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=01_04_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Ly: Building a multi-agent pipeline for NL-to-SQL analytics</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hyYM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hyYM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic 424w, https://substackcdn.com/image/fetch/$s_!hyYM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic 848w, https://substackcdn.com/image/fetch/$s_!hyYM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic 1272w, https://substackcdn.com/image/fetch/$s_!hyYM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hyYM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic" width="991" height="245" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:245,&quot;width&quot;:991,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15446,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/184268483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hyYM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic 424w, https://substackcdn.com/image/fetch/$s_!hyYM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic 848w, https://substackcdn.com/image/fetch/$s_!hyYM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic 1272w, https://substackcdn.com/image/fetch/$s_!hyYM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3ee44acf-f5ba-4125-9444-6ba5e25b7ce1_991x245.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>LY Corp writes about migrating from a monolithic MCP-based NL-to-SQL system to a five-agent pipeline after encountering execution coupling, single-point-of-failure debugging, and oversized prompt contexts.</p><p>The new design adopts a Swarm-style orchestration model in which specialized agents handle routing, intent parsing, validation, SQL generation, query execution, and result presentation, using strict JSON interfaces and a tightly scoped context.</p><p>Preprocessed domain-specific data marts with normalized action units further reduce hallucinations by helping agents reliably map intents to the correct tables and columns.</p><p><strong><a href="https://techblog.lycorp.co.jp/en/building-a-multi-agent-pipeline-for-nl-to-sql-analytics">https://techblog.lycorp.co.jp/en/building-a-multi-agent-pipeline-for-nl-to-sql-analytics</a></strong></p><div><hr></div><h1>Vinted: Building a Global, Event-Driven Platform: Our Ongoing Journey</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jcIX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jcIX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic 424w, https://substackcdn.com/image/fetch/$s_!jcIX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic 848w, https://substackcdn.com/image/fetch/$s_!jcIX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic 1272w, https://substackcdn.com/image/fetch/$s_!jcIX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jcIX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic" width="512" height="289" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:289,&quot;width&quot;:512,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7310,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/184268483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jcIX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic 424w, https://substackcdn.com/image/fetch/$s_!jcIX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic 848w, https://substackcdn.com/image/fetch/$s_!jcIX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic 1272w, https://substackcdn.com/image/fetch/$s_!jcIX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fede1a8dc-cef1-4dfe-8d29-5ee998867df8_512x289.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Vinted Engineering describes migrating from a monolithic system handling 150k requests per second to a global, event-driven platform processing over 300k requests per second. The redesign applies Domain-Driven Design across nearly 300 domains and uses Saga-based orchestration to coordinate multi-step workflows, centralizes writes, and globally replicates read-only projections via event streams. Separating read and write paths enables low-latency features such as feeds and search to be close to users, but requires teams to design for eventual consistency, retries, and out-of-order events rather than assuming immediate consistency.</p><p><strong><a href="https://vinted.engineering/2026/01/09/building-global-event-driven-platform-part-1/">https://vinted.engineering/2026/01/09/building-global-event-driven-platform-part-1/</a></strong></p><p><strong><a href="https://vinted.engineering/2026/01/09/building-global-event-driven-platform-part-1/">https://vinted.engineering//2026/01/09/building-global-event-driven-platform-part-2/</a></strong></p><div><hr></div><h1>Lyft: Lyft&#8217;s Feature Store: Architecture, Optimization, and Evolution</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Cjo2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Cjo2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic 424w, https://substackcdn.com/image/fetch/$s_!Cjo2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic 848w, https://substackcdn.com/image/fetch/$s_!Cjo2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic 1272w, https://substackcdn.com/image/fetch/$s_!Cjo2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Cjo2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic" width="1400" height="1054" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1054,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54974,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/184268483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Cjo2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic 424w, https://substackcdn.com/image/fetch/$s_!Cjo2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic 848w, https://substackcdn.com/image/fetch/$s_!Cjo2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic 1272w, https://substackcdn.com/image/fetch/$s_!Cjo2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c123767-fd9e-45b0-9b68-f90969cfb2fc_1400x1054.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Lyft writes about a centralized Feature Store that maintains consistency between offline training and online inference across batch, streaming, and real-time serving. Batch features run on Spark SQL with auto-generated Airflow DAGs and Hive storage, streaming features use Apache Flink with Kafka and Kinesis, and online serving relies on DynamoDB with a ValKey write-through cache for low-latency access.</p><p><strong><a href="https://eng.lyft.com/lyfts-feature-store-architecture-optimization-and-evolution-7835f8962b99">https://eng.lyft.com/lyfts-feature-store-architecture-optimization-and-evolution-7835f8962b99</a></strong></p><div><hr></div><h1>Zeta Global: Zeta&#8217;s Lakehouse Journey: A Composable, Scalable, and Federated Architecture</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UNdA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UNdA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic 424w, https://substackcdn.com/image/fetch/$s_!UNdA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic 848w, https://substackcdn.com/image/fetch/$s_!UNdA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic 1272w, https://substackcdn.com/image/fetch/$s_!UNdA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UNdA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic" width="469" height="311" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:311,&quot;width&quot;:469,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5631,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/184268483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UNdA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic 424w, https://substackcdn.com/image/fetch/$s_!UNdA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic 848w, https://substackcdn.com/image/fetch/$s_!UNdA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic 1272w, https://substackcdn.com/image/fetch/$s_!UNdA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64b12aa8-f5bd-427f-b59c-b17ef3939984_469x311.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Zeta Global writes about moving to a composable, federated Lakehouse architecture to integrate a highly heterogeneous data landscape that traditional warehouses could not unify. The platform standardizes on object storage with Apache Iceberg for transactional guarantees. It uses AWS S3 Tables with AWS Glue as the control plane, allowing Spark, Snowflake, and Trino to operate on shared datasets.</p><p><strong><a href="https://medium.com/@zeta-decoded/zetas-lakehouse-journey-a-composable-scalable-and-federated-architecture-df0ab5f19c3a">https://medium.com/@zeta-decoded/zetas-lakehouse-journey-a-composable-scalable-and-federated-architecture-df0ab5f19c3a</a></strong></p><div><hr></div><h1>Google: Developer&#8217;s guide to multi-agent patterns in ADK</h1><p>Similar to the typical Enterprise Integration Pattern, the multi-agent integration pattern is emerging as agent architecture becomes more widely adopted. Google lists about 8 patterns emerging in multi-agent systems. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bkQe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bkQe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic 424w, https://substackcdn.com/image/fetch/$s_!bkQe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic 848w, https://substackcdn.com/image/fetch/$s_!bkQe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic 1272w, https://substackcdn.com/image/fetch/$s_!bkQe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bkQe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic" width="1456" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37121,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/184268483?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bkQe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic 424w, https://substackcdn.com/image/fetch/$s_!bkQe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic 848w, https://substackcdn.com/image/fetch/$s_!bkQe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic 1272w, https://substackcdn.com/image/fetch/$s_!bkQe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe27c9548-847d-4245-a9b9-9fc0d8d8d075_2752x1536.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p>Sequential Pipeline</p></li><li><p>Coordinator/Dispatcher</p></li><li><p>Parallel Fan-Out/Gather </p></li><li><p>Hierarchical Decomposition</p></li><li><p>Generator and Critic</p></li><li><p>Iterative Refinement</p></li><li><p>Human-in-the-loop</p></li><li><p>Composite Patterns</p></li></ol><p><strong><a href="https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/">https://developers.googleblog.com/developers-guide-to-multi-agent-patterns-in-adk/</a></strong></p><div><hr></div><h1>Ashpreet B: Memory: How Agents Learn</h1><blockquote><p>Most AI agents do not truly learn because they reset after each session.</p></blockquote><p>The author presents a &#8220;GPU-poor continuous learning&#8221; approach in which agents store and retrieve successful patterns from databases rather than retrain models, demonstrated using the agno library with SQLite for session context, a memory manager for user data, and vector databases with human review to curate high-quality learned memories.</p><p><strong><a href="https://www.ashpreetbedi.com/articles/memory">https://www.ashpreetbedi.com/articles/memory</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[A Critique of Iceberg REST Catalog: A Classic Case of Why Semantic Spec Fails]]></title><description><![CDATA[How a Semantically Correct API Becomes Operationally Unreliable at Scale]]></description><link>https://www.dataengineeringweekly.com/p/a-critique-of-iceberg-rest-catalog</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/a-critique-of-iceberg-rest-catalog</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Fri, 09 Jan 2026 05:57:38 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!t5GO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p><em><strong>&#8220;Latency is not just a performance characteristic; it is a fundamental part of correctness.&#8221; </strong></em><strong>&#8212; </strong><em><strong>Designing Data-Intensive Applications</strong></em></p></blockquote><p>In <em><strong><a href="https://dataintensive.net/">Designing Data-Intensive Applications</a></strong></em>, <strong><a href="https://martin.kleppmann.com/">Martin Kleppmann</a></strong> makes a subtle but critical point: the <strong><a href="https://en.wikipedia.org/wiki/CAP_theorem">CAP theorem</a></strong> omits latency, yet in real systems, latency often determines whether a system is usable at all. <strong>A system that is </strong><em><strong>correct but slow</strong></em><strong> is, in practice, incorrect.</strong></p><p>This observation is directly applicable to the <strong><a href="https://iceberg.apache.org/rest-catalog-spec/">Apache Iceberg REST Catalog specification</a></strong>. While the specification achieves semantic clarity, it fails to define the operational realities that enable distributed systems to remain predictable at scale. The result is a standard that is formally correct, yet operationally fragile.</p><div><hr></div><h2><strong>Semantic Interoperability Without Predictability</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t5GO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t5GO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic 424w, https://substackcdn.com/image/fetch/$s_!t5GO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic 848w, https://substackcdn.com/image/fetch/$s_!t5GO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic 1272w, https://substackcdn.com/image/fetch/$s_!t5GO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t5GO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic" width="1456" height="694" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:694,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:139678,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183990563?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t5GO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic 424w, https://substackcdn.com/image/fetch/$s_!t5GO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic 848w, https://substackcdn.com/image/fetch/$s_!t5GO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic 1272w, https://substackcdn.com/image/fetch/$s_!t5GO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F15ba451a-d2ee-45ad-9ada-284d6558ed60_3280x1564.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Over the past two years, the Iceberg REST Catalog specification has emerged as the de facto standard for metadata access in the Iceberg ecosystem. We have seen the outburst of the <strong><a href="https://materializedview.io/p/begun-the-catalog-wars-have">catalog war</a></strong> around the REST spec. It promises a universal interface that allows engines such as Trino, Spark, Flink, and StarRocks to interact with Iceberg tables via a common REST abstraction, independent of the underlying catalog implementation.</p><p>At the semantic level, this promise largely holds. The specification rigorously defines metadata structures: tables, schemas, snapshots, and namespace operations. A LoadTable or CreateNamespace request looks identical across implementations. This semantic interoperability has been critical to Iceberg&#8217;s rapid ecosystem adoption.</p><p>However, semantic interoperability alone is insufficient. The specification defines <em>what</em> metadata operations mean, but it avoids specifying how they must behave in real-world conditions, such as concurrency, latency sensitivity, and cross-catalog synchronization.</p><p>This gap&#8212;between semantic interoperability and operational interoperability&#8212;is where systems begin to fail in production.</p><div><hr></div><h2><strong>The Core Problem: No Operational SLA, No Predictability</strong></h2><p>The Iceberg REST Catalog specification is intentionally silent on performance guarantees. There are no latency expectations, no throughput baselines, and no service-level objectives. While this flexibility lowers the barrier to implementation, it creates an ecosystem where:</p><ul><li><p>Two catalogs can both be &#8220;compliant&#8221; yet differ by orders of magnitude in response time.</p></li><li><p>Clients cannot reason about metadata latency during query planning.</p></li><li><p>Synchronization behavior across catalogs becomes unpredictable.</p></li></ul><p>In distributed data systems, <strong>predictability matters more than raw performance</strong>. Without a strict operational SLA&#8212;or at least defined behavioral constraints&#8212;clients are forced into defensive, retry-heavy designs that amplify load and increase tail latency.</p><div><hr></div><h2><strong>The &#8220;List Tables&#8221; Problem: Cross-Catalog Sync Failure</strong></h2><p>The ListTables endpoint (<strong><a href="https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L118">GET /v1/namespaces/{namespace}/tables</a></strong>) is semantically straightforward. It allows clients to enumerate tables within a namespace and supports pagination through pageSize and pageToken.</p><p>The primary issue is not pagination itself. The real failure emerges when <strong>the same Iceberg tables are registered in multiple catalogs</strong>, a pattern that is increasingly common in hybrid and multi-platform deployments.</p><h3><strong>A Realistic Scenario</strong></h3><ul><li><p>An Iceberg table is registered in <strong>Catalog A</strong> and <strong>Catalog B</strong></p></li><li><p>Both catalogs point to the same underlying metadata and object storage.</p></li><li><p>One catalog is used by ingestion and streaming workloads.</p></li><li><p>Analytics engines or BI tools use the other.</p></li></ul><h3><strong>The Sync Pathology</strong></h3><p>When a client connects to Catalog B and issues a metadata discovery operation&#8212;such as listing tables or syncing namespace state&#8212;the catalog must:</p><ol><li><p>Enumerate all tables</p></li><li><p>Resolve metadata pointers</p></li><li><p>Validate access permissions</p></li><li><p>Reconcile the state with the underlying storage.</p></li></ol><p>Because the REST specification defines no operational expectations:</p><ul><li><p>There is no SLA for how long this sync should take</p></li><li><p>There is no distinction between a &#8220;lightweight&#8221; listing and a fully validated listing.</p></li><li><p>There is no mechanism to express intent (e.g., <em>names only</em>, <em>no ACL validation</em>)</p></li></ul><p>As table counts grow into the tens of thousands, synchronization latency grows non-linearly. In practice, sync operations can take minutes&#8212;or fail&#8212;causing engines to stall, time out, or repeatedly retry.</p><p>The result is not merely slow metadata access. It is <strong>system-wide unpredictability</strong>. Query engines cannot determine whether a delay is transient, systemic, or catastrophic.</p><div><hr></div><h2><strong>Latency Is Treated as an Implementation Detail&#8212;But It Is a Contract</strong></h2><p>The REST Catalog specification implicitly treats latency as an implementation concern. From a standards perspective, this is understandable. But in data-intensive systems, latency is part of the correctness contract.</p><p>The specification does not define:</p><ul><li><p>Upper bounds on metadata retrieval latency</p></li><li><p>Maximum metadata payload sizes</p></li><li><p>Limits on metadata fan-out operations</p></li><li><p>The number of round-trip required to plan a query</p></li></ul><p>As a result, a compliant catalog may require megabytes of JSON metadata and dozens of HTTP calls just to validate a single query plan. Engines appear slow and unstable, even though the root cause lies in an underspecified protocol.</p><p>This is precisely the class of problem Kleppmann warns about: correctness without latency guarantees is operationally meaningless.</p><div><hr></div><h2><strong>Commit Semantics Under Contention: Undefined and Unfair</strong></h2><p>Iceberg relies on optimistic concurrency control. When multiple writers attempt to commit simultaneously, conflicts are expected and resolved through retries.</p><p>The REST specification defines the 409 Conflict response, but stops there. It does not define:</p><ul><li><p>Backoff expectations</p></li><li><p>Retry fairness</p></li><li><p>Starvation prevention</p></li></ul><p>In a multi-engine environment, this creates asymmetric outcomes. A high-frequency streaming writer with aggressive retries can permanently starve batch compaction jobs that follow conservative retry policies. Over time, table health degrades due to file explosion and unbounded metadata growth.</p><p>Once again, the issue is not semantic correctness. It is the absence of operational guarantees.</p><div><hr></div><h2><strong>Caching Without a Freshness Model</strong></h2><p>While HTTP caching is permitted, it is not part of the correctness model. Support for conditional requests, ETags, or freshness validation is optional.</p><p>This forces clients into a pessimistic stance: always re-fetch, always revalidate, always assume staleness. The REST protocol degenerates into a chatty, high-latency control plane that negates its own architectural benefits.</p><p>Without a standardized freshness contract, caching becomes a gamble rather than a reliability tool.</p><div><hr></div><h2><strong>Behavioral Conformance Is Missing</strong></h2><p>The Iceberg ecosystem has strong conformance testing for table formats. It lacks an equivalent for catalog behavior.</p><p>Today, &#8220;REST Catalog compliant&#8221; means:</p><ul><li><p>The endpoints exist</p></li><li><p>The JSON schema is correct.</p></li><li><p>The happy path works.</p></li></ul><p>It does not mean:</p><ul><li><p>Predictable latency under load</p></li><li><p>Stable pagination during concurrent updates</p></li><li><p>Graceful overload signaling</p></li><li><p>Bounded retry amplification</p></li></ul><p>Without behavioral conformance tests, compliance guarantees syntax, not operability.</p><div><hr></div><h2><strong>Underspecification Is Still a Design Decision</strong></h2><p>The absence of operational constraints is not accidental. It reflects a deliberate choice to prioritize adoption and flexibility.</p><p>However, in distributed systems, underspecification pushes complexity downstream. It burdens clients, operators, and platform teams with the need to implement compensating logic. As Iceberg becomes core infrastructure rather than experimental tooling, this trade-off increasingly limits its reliability.</p><p>Semantic agreement without behavioral agreement leads to fragile systems.</p><div><hr></div><h2><strong>Toward Operational Interoperability</strong></h2><p>Operational interoperability does not require rigid SLAs or centralized control. It requires acknowledging that <strong>latency, retries, and fairness are part of the interface</strong>.</p><p>Concrete improvements could include:</p><ul><li><p>Defined operational profiles with minimum latency and concurrency expectations</p></li><li><p>Lightweight metadata views to avoid synchronization amplification</p></li><li><p>Standardized retry and backoff semantics for conflict scenarios</p></li><li><p>Explicit freshness and caching contracts</p></li></ul><p>Semantic interoperability enabled Iceberg&#8217;s success. Operational interoperability will determine whether it remains dependable at scale.</p><p>Until then, the Iceberg REST Catalog remains a textbook example of why <strong>semantic specifications alone are not enough</strong>.</p><div><hr></div><p><em>All rights reserved, Dewpeche Private Limited. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #251]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-251</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-251</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 05 Jan 2026 05:35:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_04_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aQpJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42230,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_04_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!aQpJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!aQpJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf32b3a6-8440-4a59-ab3d-c9500ac447d5_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>Best practices for LLM development</h1><p>LLMs are transforming software development, but integrating them into real projects can be tricky when models don&#8217;t understand your codebase, pipelines, or conventions.<br><br>Join Dagster on January 27th for a practical look at data engineering best practices, common pitfalls, and live demos of LLM developments.</p><p><strong><a href="https://dagster.io/events/best-practices-for-llm-dagster-development?utm_campaign=33680422-26-01-WBNR_DEEP_Dive_LLM_Dagster&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=deep_dive_llm_best_practices&amp;utm_content=01_04_data_engineering_weekly">Reserve your spot now.</a></strong></p><div><hr></div><h1>Editor&#8217;s Note: The Edition of Predictions!! Well, Mostly</h1><p>It is always exciting to read the predictions and look back on 2025. I put a lot of effort into collecting some of these predictions and bundling them in this edition. At DEW, we also reached a few existing milestones. We just published our 250th edition and reached 50,000 Substack followers. It is remarkable growth, considering how lazy I am on LinkedIn and how little I promote DEW. I&#8217;m looking to improve on it, and over the holidays, I tried a bit of Agent building on top of DEW, which I&#8217;m hoping to launch soon. I wish all the DEWers a prosperous 2025 and thank you for your continued support.</p><div><hr></div><h1>Ananth Packkildurai: DEW - The Year in Review 2025</h1><p>Why not start with our own year-in-review and a bit of predictions? Agent Engineering is undoubtedly becoming a discipline of its own in engineering, similar to the rise of data scientists. Both, funny enough, run into data inconsistency issues, and everyone does data engineering eventually. (<strong>Hello Context Engineering</strong>)</p><p>I wrote a bit about how the catalog becomes the new database as the adoption of Apache Iceberg and Knowledge Engineering increases. I&#8217;ve a lot of concern about the Iceberg Rest Catalog, which I will write about as a separate blog this week. Stay tuned. </p><p><strong><a href="https://www.dataengineeringweekly.com/p/dew-the-year-in-review-2025">https://www.dataengineeringweekly.com/p/dew-the-year-in-review-2025</a></strong></p><div><hr></div><h1>Sebastian Raschka: The State Of LLMs 2025: Progress, Problems, and Predictions</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tQAE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tQAE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic 424w, https://substackcdn.com/image/fetch/$s_!tQAE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic 848w, https://substackcdn.com/image/fetch/$s_!tQAE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic 1272w, https://substackcdn.com/image/fetch/$s_!tQAE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tQAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic" width="1082" height="580" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:580,&quot;width&quot;:1082,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6687,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tQAE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic 424w, https://substackcdn.com/image/fetch/$s_!tQAE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic 848w, https://substackcdn.com/image/fetch/$s_!tQAE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic 1272w, https://substackcdn.com/image/fetch/$s_!tQAE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff76b99b8-f92c-4a65-aba8-25f80749bbd9_1082x580.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The State of LLMs 2025 examines how DeepSeek R1's breakthrough demonstrated that reasoning models can be trained for approximately $5 million using Reinforcement Learning with Verifiable Rewards (RLVR), fundamentally shifting industry cost assumptions and development approaches. The field has pivoted from pure pre-training scale to inference-time scaling&#8212;where models allocate more compute during generation for complex tasks&#8212;while grappling with "benchmaxxing," which undermines evaluation reliability, and recognizing that LLMs enhance rather than replace human expertise in coding and writing.</p><p><strong><a href="https://magazine.sebastianraschka.com/p/state-of-llms-2025">https://magazine.sebastianraschka.com/p/state-of-llms-2025</a></strong></p><div><hr></div><h1>Ian Cook: 10 Predictions for Data Infrastructure in 2026</h1><p>The article forecasts the maturation of open standards like Apache Arrow and ADBC replacing legacy protocols (ODBC/JDBC) while Apache Iceberg transitions from hype to mainstream adoption. Organizations are shifting from monolithic data warehouses to composable, multi-engine architectures built on open formats and components like DuckDB and DataFusion, enabling analytical systems to power operational applications and provide AI agents with fast, governed access to structured data.</p><p><strong><a href="https://columnar.tech/blog/2026-predictions/">https://columnar.tech/blog/2026-predictions/</a></strong></p><div><hr></div><h1>Sponsored: The Scaling Data Teams Guide</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=01_04_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GZPK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GZPK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=01_04_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GZPK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!GZPK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F55247c6c-6af7-429c-b72f-35fdc63d981a_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Building and scaling a data platform has never been more important or more challenging. Whether you&#8217;re just starting to build a data platform or leading a mature data organization, this guide will help you scale your impact, accelerate your team, and prepare for the future of data-driven products.<br><br>Learn how real data teams, from solo practitioners to enterprise-scale organizations, build.</p><p><strong><a href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=01_04_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Ben Lorica: Data Engineering in 2026: What Changes?</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wHur!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wHur!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic 424w, https://substackcdn.com/image/fetch/$s_!wHur!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic 848w, https://substackcdn.com/image/fetch/$s_!wHur!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic 1272w, https://substackcdn.com/image/fetch/$s_!wHur!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wHur!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic" width="1456" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14190,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wHur!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic 424w, https://substackcdn.com/image/fetch/$s_!wHur!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic 848w, https://substackcdn.com/image/fetch/$s_!wHur!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic 1272w, https://substackcdn.com/image/fetch/$s_!wHur!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaf49aeb-7194-49c6-99d8-b799b1d43f3e_1456x486.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The article argues that agent-native platforms must move beyond tabular data to handle multimodal assets like text, images, and video, while treating documentation as versioned, queryable context stores that agents can understand. The platform shift requires pipeline-native safety guarantees, using write-audit-publish patterns in which agents write to isolated branches before merging, fundamentally changing data engineers' roles from manual ETL work to supervising agent fleets and defining governance policies.</p><p><strong><a href="https://gradientflow.substack.com/p/data-engineering-for-machine-users">https://gradientflow.substack.com/p/data-engineering-for-machine-users</a></strong></p><div><hr></div><h1>Stanford HAI: Stanford AI Experts Predict What Will Happen in 2026</h1><p>Stanford experts predict AI will shift from evangelism to evaluation in 2026, with organizations deflating the hype bubble by reporting failed projects and measuring actual Return on Investment rather than chasing AGI. The report highlights breakthroughs in medical AI enabled by self-supervised learning that eliminate the need for expensive expert-labeled data, alongside nations building sovereign infrastructure to reduce dependence on US providers.</p><p><strong><a href="https://hai.stanford.edu/news/stanford-ai-experts-predict-what-will-happen-in-2026">https://hai.stanford.edu/news/stanford-ai-experts-predict-what-will-happen-in-2026</a></strong></p><div><hr></div><h1>HuggingFace: Tokenization in Transformers v5: Simpler, Clearer, and More Modular</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tcG7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tcG7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic 424w, https://substackcdn.com/image/fetch/$s_!tcG7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic 848w, https://substackcdn.com/image/fetch/$s_!tcG7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic 1272w, https://substackcdn.com/image/fetch/$s_!tcG7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tcG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic" width="1456" height="749" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/caabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:749,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:32221,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tcG7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic 424w, https://substackcdn.com/image/fetch/$s_!tcG7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic 848w, https://substackcdn.com/image/fetch/$s_!tcG7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic 1272w, https://substackcdn.com/image/fetch/$s_!tcG7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcaabcfc4-7e2d-4e9b-80e3-fdc7a2f000a1_5873x3023.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Hugging Face's Transformers v5 redesigns tokenizers by separating the architecture from the trained vocabulary, similar to how PyTorch separates the model structure from the weights. The update consolidates confusing Python and Rust implementations into a single Rust-backed version while allowing users to inspect tokenizer internals directly and train custom tokenizers from scratch on their own data.</p><p><strong><a href="https://huggingface.co/blog/tokenizers">https://huggingface.co/blog/tokenizers</a></strong></p><div><hr></div><h1>Netflix: Towards Generalizable and Efficient Large-Scale Generative Recommenders</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k-ZX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k-ZX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic 424w, https://substackcdn.com/image/fetch/$s_!k-ZX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic 848w, https://substackcdn.com/image/fetch/$s_!k-ZX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic 1272w, https://substackcdn.com/image/fetch/$s_!k-ZX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k-ZX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic" width="1400" height="776" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:776,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53377,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k-ZX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic 424w, https://substackcdn.com/image/fetch/$s_!k-ZX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic 848w, https://substackcdn.com/image/fetch/$s_!k-ZX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic 1272w, https://substackcdn.com/image/fetch/$s_!k-ZX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce5de75f-8ecd-48d5-b386-bc2b24200df5_1400x776.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Netflix explains that generative recommendation models scale differently than standard LLMs because each task has an inherent unpredictability limit, which changes how compute should be allocated. The blog explains how it reduced training costs by using sampled softmax and projected output heads to handle extremely large catalogs with millions or billions of items. </p><p><strong><a href="https://netflixtechblog.medium.com/towards-generalizable-and-efficient-large-scale-generative-recommenders-a7db648aa257">https://netflixtechblog.medium.com/towards-generalizable-and-efficient-large-scale-generative-recommenders-a7db648aa257</a></strong></p><div><hr></div><h1>Vinted: Orchestrating Success</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YmU8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YmU8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic 424w, https://substackcdn.com/image/fetch/$s_!YmU8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic 848w, https://substackcdn.com/image/fetch/$s_!YmU8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic 1272w, https://substackcdn.com/image/fetch/$s_!YmU8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YmU8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic" width="1456" height="642" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:642,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18644,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YmU8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic 424w, https://substackcdn.com/image/fetch/$s_!YmU8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic 848w, https://substackcdn.com/image/fetch/$s_!YmU8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic 1272w, https://substackcdn.com/image/fetch/$s_!YmU8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F397cec3e-5e7d-48e5-a1b0-2420b327c57d_2426x1070.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The blog explains how decentralized data pipelines created coordination bottlenecks, shifting dependencies from code to endless meetings, and running dbt layers via Airflow lacked granularity, causing entire layers to fail when a single model broke. The team built an automated DAG generator that reads the dbt manifest to create task-per-model setups in Airflow, paired with an Asset Registry that manages cross-domain dependencies using ExternalTaskSensor and automatically marks waiting sensors as satisfied when upstream tasks finish late, eliminating manual restarts.</p><p><strong><a href="https://vinted.engineering//2025/12/29/orchestrating-success/">https://vinted.engineering//2025/12/29/orchestrating-success/</a></strong></p><div><hr></div><h1>Tim Castillo: How I Structure My Data Pipelines</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qvvz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qvvz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic 424w, https://substackcdn.com/image/fetch/$s_!qvvz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic 848w, https://substackcdn.com/image/fetch/$s_!qvvz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic 1272w, https://substackcdn.com/image/fetch/$s_!qvvz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qvvz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic" width="1456" height="797" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:797,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14490,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/183513915?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qvvz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic 424w, https://substackcdn.com/image/fetch/$s_!qvvz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic 848w, https://substackcdn.com/image/fetch/$s_!qvvz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic 1272w, https://substackcdn.com/image/fetch/$s_!qvvz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61811ed0-6d79-452d-8e8c-b4b1a4ea4cd1_1456x797.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The author proposes resolving Medallion Architecture's ambiguity by strictly combining it with Kimball dimensional modeling and Semantic Layers, mapping each methodology to specific pipeline layers. The approach defines Bronze for mechanical normalization and type casting without business logic, Silver for Kimball Facts and Dimensions representing authoritative business logic, and Gold as a first-class Semantic Layer using governed Metric Views that serve pre-calculated metrics to business users while allowing data scientists direct SQL access to Silver's full-grain data.</p><p><strong><a href="https://loglevelinfo.substack.com/p/how-i-structure-my-data-pipelines">https://loglevelinfo.substack.com/p/how-i-structure-my-data-pipelines</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Pvt Ltd, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #250]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-250</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-250</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 29 Dec 2025 05:16:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-28_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lRXy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!lRXy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!lRXy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!lRXy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lRXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:25368,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-28_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182828907?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lRXy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!lRXy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!lRXy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!lRXy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F221ea139-683f-41f6-822e-9c0817eb58c4_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>The Scaling Data Teams Guide</h1><p>Building and scaling a data platform has never been more important or more challenging. Whether you&#8217;re just starting to build a data platform or leading a mature data organization, this guide will help you scale your impact, accelerate your team, and prepare for the future of data-driven products.<br><br>Learn how real data teams, from solo practitioners to enterprise-scale organizations, build.</p><p><strong><a href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-28_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Thoughtworks: The Model Context Protocol&#8217;s impact on 2025</h1><p>Thoughtworks writes about Model Context Protocol's (MCP) transformative impact on 2025 software development, highlighting its role in accelerating agentic AI adoption by simplifying connections between AI systems and external data sources .The blog identifies emerging techniques including context engineering for systematic LLM information optimization, AI-powered UI testing via Playwright-mcp and mcp-selenium servers, and anchoring coding agents to reference applications to prevent code drift, while cautioning against security vulnerabilities (tool poisoning, cross-server shadowing) and antipatterns like naive API-to-MCP conversion.</p><p><strong><a href="https://www.thoughtworks.com/insights/blog/generative-ai/model-context-protocol-mcp-impact-2025">https://www.thoughtworks.com/insights/blog/generative-ai/model-context-protocol-mcp-impact-2025</a></strong></p><div><hr></div><h1>Uber: Powering Billion-Scale Vector Search with OpenSearch</h1><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZAyS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZAyS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic 424w, https://substackcdn.com/image/fetch/$s_!ZAyS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic 848w, https://substackcdn.com/image/fetch/$s_!ZAyS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic 1272w, https://substackcdn.com/image/fetch/$s_!ZAyS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZAyS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic" width="1258" height="134" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:134,&quot;width&quot;:1258,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5462,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182828907?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZAyS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic 424w, https://substackcdn.com/image/fetch/$s_!ZAyS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic 848w, https://substackcdn.com/image/fetch/$s_!ZAyS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic 1272w, https://substackcdn.com/image/fetch/$s_!ZAyS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffcced134-fa9b-40f3-8f00-7aa37e373cf9_1258x134.heic 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Uber Engineering migrated from Apache Lucene&#8217;s HNSW to Amazon OpenSearch for billion-scale vector search, addressing algorithm inflexibility and GPU support limitations when handling 1.5 billion items with 400-dimension embeddings for personalized recommendations and fraud detection. The implementation reduced indexing time by 79% (from 12.5 hours to 2.5 hours) through Spark batch ingestion, optimized flush/merge policies, while achieving 52% lower P99 latency (250ms to 120ms at 2K QPS) via shard-to-node ratio tuning, replica scaling, in-memory KNN graph optimization, and blue/green deployments.</p><p><strong><a href="https://www.uber.com/en-IN/blog/powering-billion-scale-vector-search-with-opensearch/">https://www.uber.com/en-IN/blog/powering-billion-scale-vector-search-with-opensearch/</a></strong></p><div><hr></div><h1>Andrew Hoblitzell: QConAI NY 2025 - Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery</h1><p>The blog narrates the QCon AI NYC 2025 presentation on treating agentic AI as an engineering problem requiring deterministic boundaries around probabilistic components, arguing that reliability improves when models handle interpretation and classification while deterministic systems execute actions and enforce constraints. The speaker outlined practical patterns including reducing schema complexity to limit query generation errors, using role-specialized agents, constraining tool catalogs to prevent "paradox of choice" degradation, and anchoring agents to deterministic runbooks for repeatable operations rather than allowing runtime process invention.</p><p><strong><a href="https://www.infoq.com/news/2025/12/qcon-nvidia-platform/">https://www.infoq.com/news/2025/12/qcon-nvidia-platform/</a></strong></p><div><hr></div><h1>Sponsored: The data platform playbook everyone's using</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-28_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Whk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!7Whk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!7Whk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!7Whk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Whk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24015,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-28_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182828907?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Whk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!7Whk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!7Whk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!7Whk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0484c75-2712-4917-b177-b988fce19b3a_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating undering a single platform. <br><br>In this book, you&#8217;ll learn:<br><br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-28_data_engineering_weekly">Download your free copy now</a>.</strong></p><div><hr></div><h1>Monday: The Power of Structured Data: Inside monday.com&#8217;s Data Entities</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w9L-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w9L-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic 424w, https://substackcdn.com/image/fetch/$s_!w9L-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic 848w, https://substackcdn.com/image/fetch/$s_!w9L-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic 1272w, https://substackcdn.com/image/fetch/$s_!w9L-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w9L-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic" width="1456" height="452" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:452,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15715,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182828907?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!w9L-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic 424w, https://substackcdn.com/image/fetch/$s_!w9L-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic 848w, https://substackcdn.com/image/fetch/$s_!w9L-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic 1272w, https://substackcdn.com/image/fetch/$s_!w9L-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffbe205a7-b8c4-4d6d-9f54-ab0fe6d748ee_1944x604.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Monday.com writes about Data Entities as a semantic layer to address fragmentation caused by boards functioning as unstructured spreadsheets, where business objects like projects, tickets, and deals lacked standardized definitions that hindered cross-board reporting and AI agent context understanding. The implementation uses field-and-policy-based models with hierarchical inheritance, Managed Columns for field standardization, and a new "data-entity" app feature in monday's open platform that enables entity-level updates to propagate across all associated boards.</p><p><strong><a href="https://engineering.monday.com/the-power-of-structured-data-inside-monday-coms-data-entities/">https://engineering.monday.com/the-power-of-structured-data-inside-monday-coms-data-entities/</a></strong></p><div><hr></div><h1>Meta: Python Typing Survey 2025: Code Quality and Flexibility As Top Reasons for Typing Adoption</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5q5K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5q5K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic 424w, https://substackcdn.com/image/fetch/$s_!5q5K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic 848w, https://substackcdn.com/image/fetch/$s_!5q5K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic 1272w, https://substackcdn.com/image/fetch/$s_!5q5K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5q5K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic" width="1456" height="561" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:561,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12991,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182828907?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5q5K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic 424w, https://substackcdn.com/image/fetch/$s_!5q5K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic 848w, https://substackcdn.com/image/fetch/$s_!5q5K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic 1272w, https://substackcdn.com/image/fetch/$s_!5q5K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27704d24-ce24-491b-af4a-0e8014515092_2428x936.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Python becomes the defacto langiage for data programming, and have a strong typing system brings more reliability to the codebase. The 2025 Typed Python Survey reveals that 86% of Python developers regularly use type hints, with adoption highest among developers with 5-10 years of experience (93%). Key challenges include third-party library support gaps (NumPy, Pandas, Django), complexity of advanced features (generics, TypeVar, decorators), tooling fragmentation between type checkers (Mypy at 58% usage, emerging Rust-based checkers at 20%), and lack of runtime enforcement.</p><p><strong><a href="https://engineering.fb.com/2025/12/22/developer-tools/python-typing-survey-2025-code-quality-flexibility-typing-adoption/">https://engineering.fb.com/2025/12/22/developer-tools/python-typing-survey-2025-code-quality-flexibility-typing-adoption/</a></strong></p><div><hr></div><h1>Alibaba: Say Goodbye to Manual Tracking! An In-Depth Analysis of Non-Intrusive Data Collection for Android Apps</h1><p>Data tracking a critical data engineering skill which the industry won&#8217;t talk or write about a lot. I&#8217;m glad to see this in-depth analysis on data collection from Alibaba. Alibaba writes about a non-intrusive Android monitoring solution using Gradle plugins, AGP API, and ASM bytecode instrumentation to automate APM data collection without manual SDK initialization or code modifications. The implementation uses dynamic AGP version adaptation, conflict-prevention strategies (blacklists for system packages and third-party APM tools, whitelist filtering, idempotent instrumentation), and defensive programming in injected agents (SDK initialization checks, exception isolation, silent returns when uninitialized), while supporting diverse collection scenarios including user behavior tracking via proxy listeners, network monitoring through OkHttp/HttpURLConnection hooks.</p><p><strong><a href="https://www.alibabacloud.com/blog/say-goodbye-to-manual-tracking-an-in-depth-analysis-of-non-intrusive-data-collection-for-android-apps_602758">https://www.alibabacloud.com/blog/say-goodbye-to-manual-tracking-an-in-depth-analysis-of-non-intrusive-data-collection-for-android-apps_602758</a></strong></p><div><hr></div><h1>PyTorch: Deploying Smarter: Hardware-Software Co-design in PyTorch</h1><blockquote><p><strong>People who are really serious about software should make their own hardware."</strong> &#8212; <strong>Alan Kay</strong></p></blockquote><p>Even if we don&#8217;t often build software and hardware together, it is vital to understand the software behaviour under certain hardware configurations. Arm released a collection of Jupyter notebook tutorials demonstrating hardware-software co-design techniques in PyTorch for efficient on-device AI, addressing the limitations of traditional post-training quantization that applies uniform precision across all neural network layers. </p><p><strong><a href="https://pytorch.org/blog/deploying-smarter-hardware-software-co-design/">https://pytorch.org/blog/deploying-smarter-hardware-software-co-design/</a></strong></p><div><hr></div><h1>DuckDB: Iceberg in the Browser</h1><p>DuckDB introduced browser-based access to Iceberg REST Catalogs through DuckDB-Wasm, enabling users to query and edit Iceberg tables directly from a browser tab without installing software or managing infrastructure. The implementation unified DuckDB's HTTP networking layer across native and WebAssembly environments through three changes: redesigning the core HTTP interface for consistent extension access, creating a JavaScript network wrapper for DuckDB-Wasm, and routing all Iceberg networking through this common layer, with initial Amazon S3 Tables support where all computations run locally in the browser.</p><p><strong><a href="https://duckdb.org/2025/12/16/iceberg-in-the-browser">https://duckdb.org/2025/12/16/iceberg-in-the-browser</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Pvt Ltd, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[DEW - The Year in Review 2025]]></title><description><![CDATA[From Digital Plumbers to Architects of Intelligence: The 7 Paradigm Shifts That Defined 2025]]></description><link>https://www.dataengineeringweekly.com/p/dew-the-year-in-review-2025</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/dew-the-year-in-review-2025</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 22 Dec 2025 22:43:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uS1U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uS1U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uS1U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic 424w, https://substackcdn.com/image/fetch/$s_!uS1U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic 848w, https://substackcdn.com/image/fetch/$s_!uS1U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic 1272w, https://substackcdn.com/image/fetch/$s_!uS1U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uS1U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic" width="1456" height="794" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:794,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50082,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182368564?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uS1U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic 424w, https://substackcdn.com/image/fetch/$s_!uS1U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic 848w, https://substackcdn.com/image/fetch/$s_!uS1U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic 1272w, https://substackcdn.com/image/fetch/$s_!uS1U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1668a0fa-db0f-455b-b17a-8b8388d4dded_2816x1536.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If 2023 was the year of &#8220;Shock&#8221; and 2024 was the year of &#8220;Hype,&#8221; 2025 will be remembered as the year of <strong>Engineering</strong>.</p><p>For the past decade, our industry has been obsessed with the mechanics of movement. We argued about &#8220;ETL vs. ELT.&#8221; We fought &#8220;Format Wars&#8221; over table specifications. We optimized commit protocols and debated the merits of various orchestrators. We were, fundamentally, digital plumbers ensuring the water reached the tap.</p><p>But in 2025, the mandate changed. The business no longer wants &#8220;data&#8221;; it demands &#8220;intelligence.&#8221; It demands systems that reason, agents that act, and infrastructure that guarantees truth in a non-deterministic world. The &#8220;Big Data&#8221; era of managing volume formally ended, replaced by the &#8220;Context Era&#8221; of managing meaning.</p><p>We are no longer just Data Engineers. We are the architects of the cognitive layer.</p><p>Here are the seven patterns that defined Data &amp; AI Engineering in 2025.</p><div><hr></div><h1><strong>1. Agent Engineering: The Inevitable Evolution of the Pipeline</strong></h1><p>The most significant shift of 2025 was the industry&#8217;s realization that &#8220;Agents&#8221; are not just fancy chatbots&#8212;they are the new compute engine. In 2024, we treated LLMs as text generators. In 2025, we started treating them as reasoning engines that execute logic we previously wrote in Python or SQL.</p><p>This birthed a new discipline: <strong>Agent Engineering</strong>.</p><p>We moved beyond the chaotic &#8220;vibes-based&#8221; coding of early experiments into structured, rigorous engineering. We stopped asking &#8220;Can AI write code?&#8221; and started asking &#8220;How do we architect a system where AI reliably executes complex workflows?&#8221;</p><h2><strong>The Rise of Context Engineering</strong></h2><p>The bottleneck for intelligent systems shifted from <em>model capacity</em> to <em>context management</em>. We realized that an agent is only as smart as the context you feed it.</p><p>Anthropic defined the year with their masterclass on <strong><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective Context Engineering</a></strong>, framing it as a discipline focused on managing the &#8220;attention budget&#8221; of models. It wasn&#8217;t enough to dump documents into a prompt. Engineers at Manus demonstrated that we must curate, compress, and dynamically retrieve tokens during inference to sustain coherent behavior over long horizons in their piece on <strong><a href="https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-Manus">Context Engineering for AI Agents</a></strong>.</p><p>We learned that &#8220;Context&#8221; is an information management problem. We saw teams optimizing &#8220;KV-cache hit rates&#8221; and treating context windows like precious RAM. The winning architecture wasn&#8217;t the one with the biggest model; it was the one that engineered the most relevant context.</p><h2><strong>The USB-C of Intelligence: Model Context Protocol (MCP)</strong></h2><p>History will likely view the introduction of the <strong>Model Context Protocol (MCP)</strong> as the moment agents became viable enterprise software. Before MCP, connecting an LLM to a database or API was a bespoke, brittle integration task.</p><p>In 2025, MCP standardized this connection. It became the &#8220;USB-C for Agents,&#8221; allowing developers to build a connector once and have it work across any MCP-compliant model or application, as detailed in Alibaba&#8217;s <strong><a href="https://www.alibabacloud.com/blog/a-comprehensive-analysis-and-practical-implementation-of-the-new-features-in-the-mcp-specification_602206">comprehensive analysis of MCP features</a></strong>. However, the rollout wasn&#8217;t without caution; TigerData engineers noted that while MCP solved interoperability, it introduced new attack vectors, arguing that <strong><a href="https://www.tigerdata.com/blog/three-tigerdata-engineers-told-us-the-truth-about-mcp-security-is-its-achilles-heel">security is its Achilles heel</a></strong>.</p><h2><strong>From Chatbots to Colleagues</strong></h2><p>The proof was in the production deployments. Uber revealed <strong><a href="https://www.uber.com/blog/enhanced-agentic-rag/">Genie</a></strong>, an internal agent that didn&#8217;t just answer questions but also acted as a &#8220;near-human&#8221; subject-matter expert. LinkedIn unveiled its <strong><a href="https://www.linkedin.com/blog/engineering/ai/accelerating-llm-inference-with-speculative-decoding-lessons-from-linkedins-hiring-assistant">Hiring Assistant</a></strong>, an agent that handled complex recruiting workflows using speculative decoding to accelerate inference.</p><p>These weren&#8217;t toys. They were engineered systems with rigorous orchestration, state management, and error handling. The industry formalized patterns like &#8220;Prompt Chaining,&#8221; &#8220;Routing,&#8221; and &#8220;Parallelization.&#8221; We stopped treating agents as magic boxes and started treating them as software components that required a new kind of engineering.</p><div><hr></div><h1><strong>2. &#8220;Evals&#8221; Are The New Unit Tests</strong></h1><p>If Agent Engineering was the engine of 2025, <strong>Evaluation (Evals)</strong> was the brakes&#8212;and the steering wheel.</p><p>The &#8220;Vibe Coding&#8221; era&#8212;where we judged models by looking at a few outputs and saying &#8220;looks good to me&#8221;&#8212;died a hard death. In 2025, organizations realized they could not ship non-deterministic software without rigorous, deterministic measurement.</p><h2><strong>The &#8220;Judge-LLM&#8221; Pattern</strong></h2><p>How do you test a system that gives a different answer every time? You build a machine to grade the machine.</p><p>The industry standardized around the <strong>Judge-LLM</strong> framework. Booking.com offers <strong><a href="https://booking.ai/llm-evaluation-practical-tips-at-booking-com-1b038a0d6662">practical tips for LLM evaluation</a></strong>, using a &#8220;stronger&#8221; model (trained on a &#8220;Golden Dataset&#8221; of human-verified answers) to grade the outputs of &#8220;weaker,&#8221; cheaper production models. Pinterest followed suit with its <strong><a href="https://medium.com/pinterest-engineering/llm-powered-relevance-assessment-for-pinterest-search-b846489e358d">LLM-powered relevance assessment</a></strong>, replacing costly manual labeling with fine-tuned LLMs that achieved high agreement with human experts.</p><p>This wasn&#8217;t just about checking for &#8220;correctness.&#8221; We developed specific metrics for <strong>Hallucination Rate</strong>, <strong>Instruction Following</strong>, and <strong>Tone Consistency</strong>. Uber built <strong><a href="https://www.uber.com/en-IN/blog/requirement-adherence-boosting-data-labeling-quality-using-llms/">Requirement Adherence systems</a></strong> that extracted rules from standard operating procedures (SOPs) and enforced them in real-time, reducing post-labeling audits by 80%.</p><h2><strong>Evaluation-Driven Development (EDD)</strong></h2><p>&#8220;Test-Driven Development&#8221; (TDD) evolved into <strong>Evaluation-Driven Development (EDD)</strong>. Engineers learned that you cannot optimize what you cannot measure.</p><p>Infrastructure teams integrated these evals directly into CI/CD pipelines. Databricks shared how they &#8220;shifted left&#8221; on reliability, <strong><a href="https://www.databricks.com/blog/databricks-databricks-scaling-database-reliability">scaling database reliability</a></strong> by embedding schema scorers into their build processes to catch data quality issues before they hit production.</p><p>The takeaway for every data engineer in 2025 was clear: If you don&#8217;t have an eval for it, it doesn&#8217;t exist. You aren&#8217;t &#8220;prompt engineering&#8221; until you have a metric that tells you if your changes made things better or worse.</p><div><hr></div><h1><strong>3. The Streaming-Lakehouse Merger: The End of Lambda Architecture</strong></h1><p>For fifteen years, we lived with the &#8220;Lambda Architecture&#8221;&#8212;maintain a fast streaming path (Kafka/Flink) and a slow batch path (Hadoop/Spark), and pray they match. In 2025, we finally merged the lanes.</p><p>The barrier between &#8220;Stream&#8221; and &#8220;Table&#8221; dissolved. We entered the era of the <strong>Streaming Lakehouse</strong>.</p><h2><strong>Stream-Table Duality</strong></h2><p>The concept of &#8220;Stream-Table Duality&#8221;&#8212;long preached by Kafka&#8217;s creators&#8212;became a reality in storage. New engines like <strong>Apache Paimon</strong> and <strong>Apache Fluss</strong> emerged to bridge the gap.</p><p>Alibaba championed <strong><a href="https://www.alibabacloud.com/blog/apache-paimon-real-time-lake-storage-with-iceberg-compatibility-2025_602485">Apache Paimon</a></strong> as a lake format designed specifically for real-time updates, offering the high-throughput ingestion of a stream with the query capabilities of a lakehouse table. Jack Vanlightly&#8217;s deep dive into <strong><a href="https://jack-vanlightly.com/blog/2025/9/2/understanding-apache-fluss">Understanding Apache Fluss</a></strong> revealed a system that combines log tablets with KV tablets, effectively creating a database that exposes its own changelog as a first-class citizen.</p><p>We stopped debating &#8220;Stream vs. Batch&#8221; and started designing architectures that ingest data once and make it immediately available for both real-time operational lookups and historical analytical queries.</p><h2><strong>The Zero-Copy Debate</strong></h2><p>However, this merger wasn&#8217;t without conflict. The buzzword of the year was &#8220;Zero-Copy&#8221;&#8212;the promise that you could point your data warehouse at your Kafka topic and query it without moving bytes.</p><p>But seasoned engineers pushed back. WarpStream argued <strong><a href="https://www.warpstream.com/blog/the-case-for-an-iceberg-native-database-why-spark-jobs-and-zero-copy-kafka-wont-cut-it">the case for an Iceberg-native database</a></strong>, claiming that coupling your operational message bus directly to your analytical engine violates separation of concerns.</p><p>The consensus that emerged? &#8220;Zero-Copy&#8221; is great for ad-hoc exploration, but for production, <strong>Materialization</strong> (making a copy) is still the price of performance and isolation.</p><h2><strong>Diskless Kafka and the Cloud-Native Log</strong></h2><p>Even Kafka itself couldn&#8217;t escape the modernization wave. The community rallied around <strong><a href="https://topicpartition.io/blog/kip-1150-diskless-topics-in-apache-kafka">KIP-1150 (Diskless Topics)</a></strong>, a proposal to re-architect Kafka for the cloud era.</p><p>We realized that in a world of S3 Express and high-speed networking, storing data on local broker disks was an expensive relic. The future of streaming is &#8220;Tiered Storage by Default,&#8221; where the broker is just a caching layer on top of infinite object storage. This shift promises to slash costs and make &#8220;infinite retention&#8221; streams a standard architectural pattern.</p><div><hr></div><h1><strong>4. The Efficiency Counter-Revolution: Small Data and Rust</strong></h1><p>While the AI teams were burning cash on GPUs, the Data Infrastructure teams were leading a quiet counter-revolution. After years of defaulting to massive, expensive distributed clusters (Spark/Hadoop) for every problem, 2025 was the year we right-sized our compute.</p><p>We realized that &#8220;Big Data&#8221; tools are overkill for &#8220;Medium Data&#8221; problems.</p><h2><strong>The Single-Node Renaissance</strong></h2><p>&#8220;Does this really need a 50-node cluster, or just a bigger laptop?&#8221;</p><p>That question dismantled pipelines across the industry. Tools like <strong>DuckDB</strong> and <strong>Polars</strong> graduated from &#8220;analyst favorites&#8221; to &#8220;production workhorses.&#8221; Decathlon shared a viral case study about being <strong><a href="https://medium.com/decathlondigital/polars-at-decathlon-ready-to-play-6abc4328d06c">Ready to Play with Polars</a></strong>, replacing massive Spark clusters with Polars scripts for datasets under 50GB and slashing infrastructure costs to near zero.</p><p>Benchmarks from Daniel Beach confirmed this, showing that for <strong><a href="https://dataengineeringcentral.substack.com/p/650gb-of-data-delta-lake-on-s3-polars">650GB of Data (Delta Lake on S3)</a></strong>, single-node engines often beat distributed clusters simply by avoiding network overhead. We stopped being ashamed of &#8220;vertical scaling&#8221; and started embracing it as a FinOps victory.</p><h2><strong>The Rust Rewrite</strong></h2><p>When we did need performance, we turned to <strong>Rust</strong>.</p><p>Agoda explained <strong><a href="https://medium.com/agoda-engineering/why-we-bet-on-rust-to-supercharge-feature-store-at-agoda-ed4a70d2efb7">why they bet on Rust to supercharge their Feature Store</a></strong>, achieving a 5x increase in traffic capacity.</p><p>The lesson was clear: The &#8220;Java Tax&#8221; is real. For critical, low-latency infrastructure, Rust&#8217;s safety and performance are worth the learning curve. We are entering a new era where the foundational tools of data engineering are being rebuilt, brick by brick, in Rust.</p><h2><strong>FinOps as Architecture</strong></h2><p>Efficiency wasn&#8217;t just about code; it was about architecture. Wix documented <strong><a href="https://www.wix.engineering/post/how-wix-slashed-spark-costs-by-60-and-migrated-5-000-daily-workflows-from-emr-to-emr-on-eks">how they slashed Spark costs by 60%</a></strong> not by rewriting code, but by migrating workloads from managed services (EMR) to Kubernetes (EKS).</p><p>We learned that &#8220;Serverless&#8221; often means &#8220;Wallet-less&#8221; if you aren&#8217;t careful. The smartest teams in 2025 were those who aggressively optimized their compute substrates, moving workloads to Spot instances, ARM processors, and single-node containers whenever possible.</p><div><hr></div><h1><strong>5. Lakehouse 2.0: The Catalog is the New Database</strong></h1><p>The &#8220;Format Wars&#8221; (Iceberg vs. Delta vs. Hudi) that dominated the early 2020s largely settled into a peace treaty in 2025. With the rise of interoperability layers, we stopped caring about data folder structure.</p><p>The battleground shifted &#8220;up the stack&#8221; to the <strong>Catalog</strong>. We realized that the &#8220;Lakehouse&#8221; is just a database turned inside out, and the Catalog is its operating system.</p><h2><strong>The Catalog Wars</strong></h2><p>In 2025, the Catalog stopped being just a &#8220;list of tables&#8221; and became the active control plane for the enterprise.</p><p>New concepts like <strong><a href="https://duckdb.org/2025/05/27/ducklake.html">DuckLake</a></strong> challenged the status quo, proposing that we use DuckDB itself as a catalog and metadata layer, replacing the heavy, complex Hive Metastore with a lightweight, transactional database.</p><p>Hyperscalers (AWS, Snowflake, Databricks) all converged on Managed Iceberg services. As Simon Sp&#228;ti noted in <strong><a href="https://www.ssp.sh/blog/open-table-format-revolution/">The Open Table Format Revolution</a></strong>, the major players stopped fighting the open format and started trying to own the metadata layer that manages it. The value proposition shifted from &#8220;we store your data&#8221; to &#8220;we govern your transactions.&#8221;</p><div><hr></div><h1><strong>6. The &#8220;Context&#8221; Supply Chain: Unstructured Data &amp; Knowledge Graphs</strong></h1><p>For decades, Data Engineering was about rows and columns. In 2025, we had to get good at text, images, and relationships. To support the GenAI revolution, we had to build the <strong>Context Supply Chain</strong>.</p><p>We aren&#8217;t just moving data anymore; we are moving <em>meaning</em>.</p><h2><strong>Knowledge Graphs Return</strong></h2><p>The most surprising comeback of 2025 was the <strong>Knowledge Graph</strong>. As we struggled with LLM hallucinations, we realized that probabilistic models need deterministic facts to ground them.</p><p>Netflix details its <strong><a href="https://netflixtechblog.com/uda-unified-data-architecture-6a6aee261d8d">Unified Data Architecture</a></strong> and how it is <strong><a href="https://netflixtechblog.medium.com/unlocking-entertainment-intelligence-with-knowledge-graph-da4b22090141">Unlocking Entertainment Intelligence with Knowledge Graph</a></strong> to provide a &#8220;source of truth&#8221; for its AI models.</p><p>We learned that &#8220;RAG&#8221; (Retrieval-Augmented Generation) isn&#8217;t just about vector search. It&#8217;s about &#8220;GraphRAG&#8221;&#8212;using the relationships between data points to provide richer, more accurate context to the model.</p><h2><strong>The Embedding Pipeline</strong></h2><p>Data Engineers had to master a new type of transformation: the <strong>Embedding</strong>.</p><p>We moved beyond simple &#8220;Word2Vec&#8221; tutorials. Milvus released guides on <strong><a href="https://milvus.io/blog/how-to-choose-the-right-embedding-model-for-rag.md">how to choose the right embedding model for RAG</a></strong>, helping teams select between LLM2Vec, BGE-M3, and others for specific domains. We built pipelines that chunked documents, generated embeddings, and stored them in vector databases, treating &#8220;semantic distance&#8221; as a first-class data type.</p><h2><strong>Unstructured Data Management</strong></h2><p>We formally recognized <strong><a href="https://piethein.medium.com/unstructured-data-management-at-scale-4c612f822f70">Unstructured Data Management at Scale</a></strong> as a core competency. Piethein Strengholt argued that it wasn&#8217;t enough to dump PDFs into an S3 bucket; we needed to parse, clean, chunk, and govern that data with the same rigor we apply to our financial tables.</p><p>The &#8220;Medallion Architecture&#8221; (Bronze/Silver/Gold) was adapted for unstructured data. We started talking about &#8220;Raw Documents&#8221; (Bronze), &#8220;Parsed &amp; Chunked Text&#8221; (Silver), and &#8220;Curated Embeddings&#8221; (Gold).</p><div><hr></div><h1><strong>7. Governance 2.0: The Safety Brake for Autonomous Agents</strong></h1><p>As AI agents began taking actions&#8212;booking interviews, executing code, modifying databases&#8212;Governance stopped being a &#8220;compliance checkbox&#8221; and became a &#8220;safety brake.&#8221;</p><p>In 2024, if a dashboard was wrong, a manager made a bad decision. In 2025, if an agent is wrong, it might delete a production database or leak PII to a competitor.</p><h2><strong>Privacy-Aware Infrastructure</strong></h2><p>Meta writes about <strong><a href="https://engineering.fb.com/2025/01/22/security/how-meta-discovers-data-flows-via-lineage-at-scale/">discovering data flows via lineage at scale</a></strong>. They didn&#8217;t just write policies; they wrote code that tracked data lineage and enforced purpose limitations at the exabyte scale. Meta introduced <strong><a href="https://engineering.fb.com/2025/07/23/security/policy-zones-meta-purpose-limitation-batch-processing-systems/">Policy Zones</a></strong> to ensure that data collected for one purpose (e.g., safety) couldn&#8217;t be used for another (e.g., ad targeting) without explicit permission.</p><p>This level of granularity is the future. We are moving toward systems where every piece of data carries its own &#8220;passport&#8221; of permissions, and every agent must present a &#8220;visa&#8221; to access it.</p><h2><strong>Shadow AI and the New Perimeter</strong></h2><p>The rise of &#8220;Shadow AI&#8221;&#8212;engineers spinning up local LLMs or using unapproved APIs&#8212;forced data teams to harden the perimeter.</p><p>We saw the emergence of <strong>Data Contracts</strong> as a defense mechanism. Grab implemented <strong><a href="https://engineering.grab.com/real-time-data-quality-monitoring">real-time data quality monitoring</a></strong> with strict contracts for its Kafka streams, ensuring that bad data was rejected before it could poison the downstream AI models.</p><p>Governance in 2025 is about <strong>Observability</strong>. It&#8217;s about knowing exactly which agent accessed which document, why, and what it did with it.</p><div><hr></div><h1><strong>Conclusion: The Era of the Context Engineer</strong></h1><p>Looking back at 2025, it is clear that the role of the Data Engineer has fundamentally changed.</p><p>We are no longer just building pipelines to move data from Point A to Point B. We are building the <strong>enterprise&#8217;s Cognitive Nervous System</strong>.</p><ul><li><p>We are <strong>Agent Engineers</strong>, designing the workflows that allow AI to reason and act.</p></li><li><p>We are <strong>Eval Architects</strong>, building the metric systems that keep AI honest.</p></li><li><p>We are <strong>Context Curators</strong>, ensuring that the &#8220;meaning&#8221; of our data is preserved and accessible.</p></li><li><p>We are <strong>Efficiency Experts</strong>, maximizing the ROI of every compute cycle in a world of expensive GPUs.</p></li></ul><blockquote><p>The tools will continue to change. Spark might yield to Polars; Kafka might yield to object storage. But the discipline of <em>engineering</em>&#8212;of rigor, measurement, and architecture&#8212;is stronger than ever.</p></blockquote><div><hr></div><p><em>All rights reserved, Dewpeche Pvt Ltd, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #249]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-249</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-249</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 22 Dec 2025 03:37:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-21_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uGuu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!uGuu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!uGuu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!uGuu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uGuu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19674,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-21_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182288574?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uGuu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!uGuu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!uGuu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!uGuu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9892d496-8193-4123-aae8-243a8d9b2788_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How to scale your data team</h1><p>Building and scaling a data platform has never been more important or more challenging. Whether you&#8217;re just starting to build a data platform or leading a mature data organization, this guide will help you scale your impact, accelerate your team, and prepare for the future of data-driven products.<br><br>Learn how real data teams, from solo practitioners to enterprise-scale organizations, build.</p><p><strong><a href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-21_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Andrej Karpathy: 2025 LLM Year in Review</h1><p>One year seems a decade in the LLM era. Gemini was still effectively at version 1.5, image models routinely failed at basic text rendering, and credible video generation had not yet arrived. DeepSeek R1 did not exist; o1 was only beginning to introduce test-time inference. The author highlights the significance of 2025 and the paradigm shift that altered the landscape. </p><p><strong><a href="https://karpathy.bearblog.dev/year-in-review-2025/">https://karpathy.bearblog.dev/year-in-review-2025/</a></strong></p><div><hr></div><h1>LangChain: State of Agent Engineering</h1><p>LangChain has published a survey of 1300 professionals on Agent Engineering in the industry. Key highlights for me,</p><ol><li><p>Customer service and productivity products dominate the AI adoption.</p></li><li><p>Quality of the output is the biggest barrier to entry for AI Agents.</p></li><li><p>The open-source model has an equal market share between Gemini and Claude.</p></li></ol><p><strong><a href="https://www.langchain.com/state-of-agent-engineering">https://www.langchain.com/state-of-agent-engineering</a></strong></p><div><hr></div><h1>Google: Introduction to AI Agents</h1><p>Just like a Data Scientist, Agent Engineering is becoming the hottest job in 2026. If you&#8217;re looking to get started in this space, Google publishes an excellent overview of an Introduction to AI Agents.  </p><p><strong><a href="https://drive.google.com/file/d/1C-HvqgxM7dj4G2kCQLnuMXi1fTpXRdpx/view">https://drive.google.com/file/d/1C-HvqgxM7dj4G2kCQLnuMXi1fTpXRdpx/view</a></strong></p><div><hr></div><h1>Sponsored: The data platform playbook everyone's using</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-21_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X_74!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!X_74!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!X_74!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!X_74!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X_74!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24015,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-21_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182288574?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X_74!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!X_74!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!X_74!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!X_74!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58456554-14b6-400e-aef2-300174f812ae_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating undering a single platform. <br><br>In this book, you&#8217;ll learn:<br><br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-21_data_engineering_weekly">Download your free copy now.</a></strong></p><div><hr></div><h1>Ben Lorica: &#8220;World Model&#8221; is a mess. Here&#8217;s how to make sense of it.</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ztLs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ztLs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic 424w, https://substackcdn.com/image/fetch/$s_!ztLs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic 848w, https://substackcdn.com/image/fetch/$s_!ztLs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic 1272w, https://substackcdn.com/image/fetch/$s_!ztLs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ztLs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic" width="1456" height="581" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:581,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16790,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182288574?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ztLs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic 424w, https://substackcdn.com/image/fetch/$s_!ztLs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic 848w, https://substackcdn.com/image/fetch/$s_!ztLs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic 1272w, https://substackcdn.com/image/fetch/$s_!ztLs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2d383b09-56ab-47e3-9fde-2988603aca2c_1456x581.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The article examines how the term "world model" has become an overloaded marketing label across AI vendors, encompassing seven distinct interpretations. The analysis highlights that the &#8220;world model&#8221; term captures a fundamental shift from token prediction to modeling geometry, physics, and long-horizon dynamics, and recommends that teams evaluate specific implementations based on actual inputs, maintained state, and decision-making outputs rather than accepting vendor terminology at face value.</p><p><strong><a href="https://gradientflow.substack.com/p/world-model-is-a-mess-heres-how-to">https://gradientflow.substack.com/p/world-model-is-a-mess-heres-how-to</a></strong></p><div><hr></div><h1>LanceDB: From BI to AI: A Modern Lakehouse Stack with Lance and Iceberg</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hmcl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hmcl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic 424w, https://substackcdn.com/image/fetch/$s_!Hmcl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic 848w, https://substackcdn.com/image/fetch/$s_!Hmcl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic 1272w, https://substackcdn.com/image/fetch/$s_!Hmcl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hmcl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic" width="1456" height="686" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:686,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28498,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182288574?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hmcl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic 424w, https://substackcdn.com/image/fetch/$s_!Hmcl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic 848w, https://substackcdn.com/image/fetch/$s_!Hmcl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic 1272w, https://substackcdn.com/image/fetch/$s_!Hmcl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbc65bf2e-f04a-495c-b71d-eadaf9e21f46_3840x1810.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Are Iceberg and LanceDB complementary formats? The blog highlights LanceDB's ability to add new columns without a full table rewrite, native multimodel data storage, built-in text and vector indexing, and the potential to change the BI landscape, bringing it closer to AI.  </p><p><strong><a href="https://lancedb.com/blog/from-bi-to-ai-lance-and-iceberg/">https://lancedb.com/blog/from-bi-to-ai-lance-and-iceberg/</a></strong></p><div><hr></div><h1>Dropbox: Inside the feature store powering real-time AI in Dropbox Dash</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ybnX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ybnX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic 424w, https://substackcdn.com/image/fetch/$s_!ybnX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic 848w, https://substackcdn.com/image/fetch/$s_!ybnX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic 1272w, https://substackcdn.com/image/fetch/$s_!ybnX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ybnX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic" width="1456" height="930" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:930,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20979,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182288574?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ybnX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic 424w, https://substackcdn.com/image/fetch/$s_!ybnX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic 848w, https://substackcdn.com/image/fetch/$s_!ybnX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic 1272w, https://substackcdn.com/image/fetch/$s_!ybnX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5677fc09-7db3-444b-98e7-d583abad07e9_2880x1840.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Dropbox describes a hybrid feature store for Dash's real-time AI ranking system, combining Feast for orchestration with a custom Go-based serving layer and Dynovault for storage, achieving sub-100ms latency requirements. The system processes thousands of concurrent feature lookups per search query through a three-part ingestion pipeline&#8212;batch processing with intelligent change detection (reducing writes from 100M+ to under 1M records), streaming for real-time signals, and direct writes for precomputed features&#8212;delivering p95 latencies of 25-35ms while maintaining feature freshness within minutes of user actions.</p><p><strong><a href="https://dropbox.tech/machine-learning/feature-store-powering-realtime-ai-in-dropbox-dash">https://dropbox.tech/machine-learning/feature-store-powering-realtime-ai-in-dropbox-dash</a></strong></p><div><hr></div><h1>Uber: How Uber Indexes Streaming Data with Pull-Based Ingestion in OpenSearch&#8482;</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Po4t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Po4t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic 424w, https://substackcdn.com/image/fetch/$s_!Po4t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic 848w, https://substackcdn.com/image/fetch/$s_!Po4t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic 1272w, https://substackcdn.com/image/fetch/$s_!Po4t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Po4t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic" width="1456" height="700" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:700,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15814,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182288574?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Po4t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic 424w, https://substackcdn.com/image/fetch/$s_!Po4t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic 848w, https://substackcdn.com/image/fetch/$s_!Po4t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic 1272w, https://substackcdn.com/image/fetch/$s_!Po4t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3b0b1a3b-c557-4ebd-adf1-f1f9057df4b4_1536x738.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Uber writes about its native pull-based ingestion contribution to OpenSearch 3.0 to power their multi-region search platform, replacing traditional push-based indexing with Kafka-backed streaming ingestion that eliminates translog overhead and enables durable replay for shard recovery. The system maps each OpenSearch shard one-to-one with Kafka partitions, uses a no-op translog by treating Kafka as the source of truth, and supports two ingestion modes&#8212;segment replication, where only primaries ingest, and all-active, where all shards consume independently&#8212;delivering consistent global views across regions. </p><p><strong><a href="https://www.uber.com/en-IN/blog/how-uber-indexes-streaming-data-with-pull-based-ingestion-in-opensearch/">https://www.uber.com/en-IN/blog/how-uber-indexes-streaming-data-with-pull-based-ingestion-in-opensearch/</a></strong></p><div><hr></div><h1>Grab: How Grab is accelerating growth with real-time personalization using Customer Data Platform scenarios.</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hR9U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hR9U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic 424w, https://substackcdn.com/image/fetch/$s_!hR9U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic 848w, https://substackcdn.com/image/fetch/$s_!hR9U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic 1272w, https://substackcdn.com/image/fetch/$s_!hR9U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hR9U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic" width="1267" height="442" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:442,&quot;width&quot;:1267,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12764,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182288574?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hR9U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic 424w, https://substackcdn.com/image/fetch/$s_!hR9U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic 848w, https://substackcdn.com/image/fetch/$s_!hR9U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic 1272w, https://substackcdn.com/image/fetch/$s_!hR9U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3dd91786-b216-4f65-8ed7-641172b66a6c_1267x442.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Grab writes about CDP Scenarios, a self-serve real-time personalization platform that processes user-initiated events via Apache Flink pipelines, combined with historical profile data from StarRocks and predictive models, to enable sub-15-second latency targeting. The system powers over 12 production use cases, achieving a 3% uplift in subscriber conversions for the Grab Unlimited signup abandonment scenario by delivering personalized nudges within 15 minutes of users dropping off the registration flow.</p><p><strong><a href="https://engineering.grab.com/cdp-scenarios">https://engineering.grab.com/cdp-scenarios</a></strong></p><div><hr></div><h1>Decathlon: Polars at Decathlon: Ready to Play?</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rGN0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rGN0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic 424w, https://substackcdn.com/image/fetch/$s_!rGN0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic 848w, https://substackcdn.com/image/fetch/$s_!rGN0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic 1272w, https://substackcdn.com/image/fetch/$s_!rGN0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rGN0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic" width="875" height="623" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3777b085-c418-47ab-a560-59b4e682060f_875x623.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:623,&quot;width&quot;:875,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:14220,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/182288574?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rGN0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic 424w, https://substackcdn.com/image/fetch/$s_!rGN0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic 848w, https://substackcdn.com/image/fetch/$s_!rGN0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic 1272w, https://substackcdn.com/image/fetch/$s_!rGN0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3777b085-c418-47ab-a560-59b4e682060f_875x623.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Apache Spark is an undisputed leader in batch processing, and we are starting to see native distributed dataframe processing engines like Polars gaining adoption. Decathlon writes about adopting Polars to process datasets under 50 GiB, replacing Apache Spark for smaller workloads and achieving near-zero infrastructure costs compared to their typical 180 GiB, 24-core Spark clusters.</p><p><strong><a href="https://medium.com/decathlondigital/polars-at-decathlon-ready-to-play-6abc4328d06c">https://medium.com/decathlondigital/polars-at-decathlon-ready-to-play-6abc4328d06c</a></strong></p><div><hr></div><h1>Zalando: Contributing to Debezium: Fixing Logical Replication at Scale</h1><p>Zalando writes about implementing a mechanism to flush LSNs via JDBC keepalives, even when no data changes occur, to prevent disk-exhausting WAL growth in low-traffic databases. By pairing this with a strategy that trusts the replication slot's position over stored Kafka offsets, the design eliminated the "offset mismatch" errors that previously triggered expensive and unnecessary full database re-snapshots.</p><p><strong><a href="https://engineering.zalando.com/posts/2025/12/contributing-to-debezium.html">https://engineering.zalando.com/posts/2025/12/contributing-to-debezium.html</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Pvt Ltd, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #248]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-248</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-248</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 15 Dec 2025 02:00:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-14_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fikm!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Fikm!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Fikm!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Fikm!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fikm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:20900,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-14_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/181631689?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fikm!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!Fikm!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!Fikm!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!Fikm!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf9dec7e-02c5-44fc-a636-d05419b51e81_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>The Scaling Data Teams Guide</h1><p>The latest eBook in our popular series is now available. <br><br>Building and scaling a data platform has never been more important or more challenging. Whether you&#8217;re just starting to build a data platform or leading a mature data organization, this guide will help you scale your impact, accelerate your team, and prepare for the future of data-driven products.<br><br>Learn how real data teams, from solo practitioners to enterprise-scale organizations, build.</p><p><strong><a href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-14_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>Jason Gorman: The Gorman Paradox: Where Are All The AI-Generated Apps?</h1><p>I had this very conversation recently. A friend of mine claimed that it is now easy to build a CRM and that everyone will build their own, thereby making all SaaS companies obsolete in the near future. It&#8217;s cheaper to manufacture software now, but SaaS companies don&#8217;t win on manufacturing. They win on distribution, trust, and operational burden. Building a CRM, perhaps commodized; operating one as a durable product is not. </p><p><strong><a href="https://codemanship.wordpress.com/2025/12/14/the-gorman-paradox-where-are-all-the-ai-generated-apps/">https://codemanship.wordpress.com/2025/12/14/the-gorman-paradox-where-are-all-the-ai-generated-apps/</a></strong></p><div><hr></div><h1>Gunnar Morling: You Gotta Push If You Wanna Pull</h1><p>Balancing between Query over Data (pull) at Rest and Query (push) over a Stream is often a challenging part of system design. The author points out that the balance is that push is an efficient way to keep the state fresh through incremental data processing, making pull more efficient. </p><p><strong><a href="https://www.morling.dev/blog/you-gotta-push-if-you-wanna-pull/">https://www.morling.dev/blog/you-gotta-push-if-you-wanna-pull/</a></strong></p><div><hr></div><h1>LangChain: Agent Engineering - A New Discipline</h1><p>Data Scientist &#8594; Analytical Engineer &#8594; Agentic Engineer. As the technology landscape evolves, we see new roles emerging that require specific skill sets. The blog lays the foundation of the emerging agentic engineering discipline in software engineering. </p><p><strong><a href="https://blog.langchain.com/agent-engineering-a-new-discipline/">https://blog.langchain.com/agent-engineering-a-new-discipline/</a></strong></p><div><hr></div><h1>Sponsored: The data platform playbook everyone&#8217;s using</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-14_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ruJn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!ruJn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!ruJn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!ruJn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ruJn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:21222,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-14_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/181631689?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ruJn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!ruJn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!ruJn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!ruJn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5d942f6-9bb3-4050-aac9-c56fa9748555_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We wrote an eBook on Data Platform Fundamentals to help you be like the happy data teams, operating undering a single platform. <br><br>In this book, you&#8217;ll learn:<br><br>- How composable architectures allow teams to ship faster<br>- Why data quality matters and how you can catch issues before they reach users<br>- What observability means, and how it will help you solve problems more quickly</p><p><strong><a href="https://dagster.io/how-to-build-data-platforms-ebook?utm_campaign=8626009-25-02-eBook_Data-Platform-Fundamentals&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=data_fundamentals_ebook&amp;utm_content=12-14_data_engineering_weekly">Download your free copy now.</a></strong></p><div><hr></div><h1>Mark Rittman: An Homage to Oracle Warehouse Builder, 25 Years Ahead of its Time in all its Java Thick-Client Glory</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yvoQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yvoQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic 424w, https://substackcdn.com/image/fetch/$s_!yvoQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic 848w, https://substackcdn.com/image/fetch/$s_!yvoQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic 1272w, https://substackcdn.com/image/fetch/$s_!yvoQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yvoQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic" width="834" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:834,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16538,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/181631689?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yvoQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic 424w, https://substackcdn.com/image/fetch/$s_!yvoQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic 848w, https://substackcdn.com/image/fetch/$s_!yvoQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic 1272w, https://substackcdn.com/image/fetch/$s_!yvoQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28304704-8c4f-4c39-87ae-c6109b8f6e06_834x356.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What goes around&#8230; Comes around is so true in data engineering. The blog is an excellent reminder of the quote, and, in fact, the author is making a subtle point: Oracle Warehouse Builder is still way ahead of modern data tools. </p><p><strong><a href="https://medium.com/@mark_37168/an-homage-to-oracle-warehouse-builder-25-years-ahead-of-its-time-in-all-its-java-thick-client-48d11eede8a2">https://medium.com/@mark_37168/an-homage-to-oracle-warehouse-builder-25-years-ahead-of-its-time-in-all-its-java-thick-client-48d11eede8a2</a></strong></p><div><hr></div><h1>Julien Le Dem: Column Storage for the AI Era</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Na2y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Na2y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic 424w, https://substackcdn.com/image/fetch/$s_!Na2y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic 848w, https://substackcdn.com/image/fetch/$s_!Na2y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic 1272w, https://substackcdn.com/image/fetch/$s_!Na2y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Na2y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic" width="1456" height="721" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:721,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:18048,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/181631689?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Na2y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic 424w, https://substackcdn.com/image/fetch/$s_!Na2y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic 848w, https://substackcdn.com/image/fetch/$s_!Na2y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic 1272w, https://substackcdn.com/image/fetch/$s_!Na2y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8cd6ce-1e13-4008-8684-46679f191f56_1926x954.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Apache Parquet is the de facto file format in data engineering. The blog offers an excellent, pragmatic view of Parquet, from its origins and the format's wins and adoption over the years, to some of its shortcomings, such as the underlying storage and compute semantics, and what the future holds for Apache Parquet. One of my favorite quotes in the blog is</p><blockquote><p><em>The cheapest data to transfer is still the data you skip entirely.</em></p></blockquote><p><strong><a href="https://sympathetic.ink/2025/12/11/Column-Storage-for-the-AI-era.html">https://sympathetic.ink/2025/12/11/Column-Storage-for-the-AI-era.html</a></strong></p><div><hr></div><h1>Uber: Blazing Fast OLAP on Uber&#8217;s Inventory and Catalog Data with Apache Pinot&#8482;</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3N7p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3N7p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic 424w, https://substackcdn.com/image/fetch/$s_!3N7p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic 848w, https://substackcdn.com/image/fetch/$s_!3N7p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic 1272w, https://substackcdn.com/image/fetch/$s_!3N7p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3N7p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic" width="1167" height="359" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:359,&quot;width&quot;:1167,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/181631689?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3N7p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic 424w, https://substackcdn.com/image/fetch/$s_!3N7p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic 848w, https://substackcdn.com/image/fetch/$s_!3N7p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic 1272w, https://substackcdn.com/image/fetch/$s_!3N7p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb8a16a63-f34f-427a-83a4-33cc4d5f161e_1167x359.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Uber writes about adopting Apache Pinot to power real-time analytics on over 10 billion catalog entities across their INCA (Inventory and Catalog) system, replacing slow batch processing with sub-second query latencies. The blog narrates how it leveraged Pinot&#8217;s upsert capabilities, handling hundreds of thousands of updates per second, combined with optimizations such as UUID primary key compression, Java 17 runtime upgrades, and the Small Segment Merger minion task, to achieve 75% latency reduction and 40% storage savings. </p><p><strong><a href="https://www.uber.com/en-IN/blog/blazing-fast-olap-on-ubers-inventory-and-catalog-data-with-apache-pinot/">https://www.uber.com/en-IN/blog/blazing-fast-olap-on-ubers-inventory-and-catalog-data-with-apache-pinot/</a></strong></p><div><hr></div><h1>Pinterest: LLM-Powered Relevance Assessment for Pinterest Search</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BKvx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BKvx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic 424w, https://substackcdn.com/image/fetch/$s_!BKvx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic 848w, https://substackcdn.com/image/fetch/$s_!BKvx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic 1272w, https://substackcdn.com/image/fetch/$s_!BKvx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BKvx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic" width="1400" height="460" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:460,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11894,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/181631689?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BKvx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic 424w, https://substackcdn.com/image/fetch/$s_!BKvx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic 848w, https://substackcdn.com/image/fetch/$s_!BKvx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic 1272w, https://substackcdn.com/image/fetch/$s_!BKvx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb34c07a-736c-4324-80a2-d74251939af0_1400x460.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Pinterest writes about fine-tuned open-source multilingual LLMs (XLM-RoBERTa-large) on human-annotated data to automate search relevance assessment, replacing costly manual labeling that previously limited measurement capabilities. The LLM-based system, which processes Pin text features through a cross-encoder architecture and employs stratified sampling by query interest and popularity, reduced minimum detectable effects from 1.3-1.5% to &#8804;0.25% while achieving 73.7% exact match with human labels and enabling 150,000 relevance predictions in 30 minutes on a single A10G GPU.</p><p><strong><a href="https://medium.com/pinterest-engineering/llm-powered-relevance-assessment-for-pinterest-search-b846489e358d">https://medium.com/pinterest-engineering/llm-powered-relevance-assessment-for-pinterest-search-b846489e358d</a></strong></p><div><hr></div><h1>Agoda: How Agoda Enhanced the Uptime and Consistency of Financial Metrics</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AwdQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AwdQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic 424w, https://substackcdn.com/image/fetch/$s_!AwdQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic 848w, https://substackcdn.com/image/fetch/$s_!AwdQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic 1272w, https://substackcdn.com/image/fetch/$s_!AwdQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AwdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic" width="1400" height="813" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:813,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:11139,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/181631689?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AwdQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic 424w, https://substackcdn.com/image/fetch/$s_!AwdQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic 848w, https://substackcdn.com/image/fetch/$s_!AwdQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic 1272w, https://substackcdn.com/image/fetch/$s_!AwdQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21979071-5b98-4880-8833-0e6dd95267ee_1400x813.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Agoda writes about consolidating multiple fragmented financial data pipelines into a centralized Financial Unified Data Pipeline (FINUDP) built on Apache Spark to eliminate inconsistencies caused by duplicate sources, conflicting transformation logic, and inconsistent data quality controls across teams. The unified pipeline, which processes millions of daily booking transactions, achieved 95.6% uptime through three-tier alerting (email, Slack, GoFresh monitoring), shadow testing in merge requests, automated data quality checks, data contracts with upstream teams, and ML-based anomaly detection, while reducing end-to-end runtime from five hours to 30 minutes through query optimization and infrastructure adjustments.</p><p><strong><a href="https://medium.com/agoda-engineering/how-agoda-enhanced-the-uptime-and-consistency-of-financial-metrics-ef7d54c4e4f0">https://medium.com/agoda-engineering/how-agoda-enhanced-the-uptime-and-consistency-of-financial-metrics-ef7d54c4e4f0</a></strong></p><div><hr></div><h1>Flipkart: Apache Spark Optimisations</h1><p>If you&#8217;re deep into data pipeline optimization, the blog is an excellent resource. The author identifies frequent patterns that can cause Spark performance issues and proposes an approach to address them. </p><p><strong><a href="https://blog.flipkart.tech/apache-spark-optimisations-c3464f71bd38">https://blog.flipkart.tech/apache-spark-optimisations-c3464f71bd38</a></strong></p><div><hr></div><h1>McDonald&#8217;s: Built to Scale: How a Config-Driven ETL Engine Is Powering Environmental, Social, and Governance Data Innovation</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!df5o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!df5o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic 424w, https://substackcdn.com/image/fetch/$s_!df5o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic 848w, https://substackcdn.com/image/fetch/$s_!df5o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic 1272w, https://substackcdn.com/image/fetch/$s_!df5o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!df5o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic" width="961" height="490" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1058d725-d838-441e-80d5-6e466094a796_961x490.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:490,&quot;width&quot;:961,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:15192,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/181631689?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!df5o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic 424w, https://substackcdn.com/image/fetch/$s_!df5o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic 848w, https://substackcdn.com/image/fetch/$s_!df5o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic 1272w, https://substackcdn.com/image/fetch/$s_!df5o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1058d725-d838-441e-80d5-6e466094a796_961x490.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>McDonald&#8217;s writes about building a reusable, config-driven ETL/ELT engine to address scalability and performance limitations in traditional manual ETL development, which couldn&#8217;t keep pace with growing ESG data requirements. The Python-based framework, which uses YAML configuration files instead of custom code and includes a Visual YAML Configuration Generator web tool, enables teams to build pipelines across multi-cloud environments with support for diverse transformations (column renaming, type conversion, aggregations, SQL functions), automated notifications, and comprehensive auditing while significantly reducing development time compared to traditional ETL methods.</p><p><strong><a href="https://medium.com/mcdonalds-technical-blog/built-to-scale-how-a-config-driven-etl-engine-is-powering-environmental-social-and-governance-d0cd2383554f">https://medium.com/mcdonalds-technical-blog/built-to-scale-how-a-config-driven-etl-engine-is-powering-environmental-social-and-governance-d0cd2383554f</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Pvt Ltd, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item><item><title><![CDATA[Data Engineering Weekly #247]]></title><description><![CDATA[The Weekly Data Engineering Newsletter]]></description><link>https://www.dataengineeringweekly.com/p/data-engineering-weekly-247</link><guid isPermaLink="false">https://www.dataengineeringweekly.com/p/data-engineering-weekly-247</guid><dc:creator><![CDATA[Ananth Packkildurai]]></dc:creator><pubDate>Mon, 08 Dec 2025 01:25:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AdQk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6f51c1bf-abc9-4cd3-ad69-22cc8e3f1ef2_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/events/building-trustworthy-ai-analytics-compass-cube-in-practice?utm_source=email&amp;utm_medium=sponsorship&amp;utm_campaign=29638450-25-12-wbnr_deep_dive_compass_cube&amp;utm_content=12-07_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KQM8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!KQM8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!KQM8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!KQM8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KQM8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24276,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/events/building-trustworthy-ai-analytics-compass-cube-in-practice?utm_source=email&amp;utm_medium=sponsorship&amp;utm_campaign=29638450-25-12-wbnr_deep_dive_compass_cube&amp;utm_content=12-07_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KQM8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!KQM8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!KQM8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!KQM8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F950da45f-9c01-49f2-a907-c8f2911c388a_3840x2160.heic 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h1>How to build trustworthy AI analytics</h1><p>If your team is relying on AI-driven insights, this upcoming webinar will show you how to make those insights more dependable, transparent, and explainable. In this 12/9 Deep Dive with our friends at Cube, you&#8217;ll learn:<br><br>- Why AI analytics fails without governance (and what that actually means)<br>- How semantic layers provide the guardrails AI needs to be trustworthy<br>- Technical implementation: how Compass + Cube work together to prevent hallucinations<br>- Live demo: governed self-service analytics that data teams can actually trust</p><p><strong><a href="https://dagster.io/events/building-trustworthy-ai-analytics-compass-cube-in-practice?utm_source=email&amp;utm_medium=sponsorship&amp;utm_campaign=29638450-25-12-wbnr_deep_dive_compass_cube&amp;utm_content=12-07_data_engineering_weekly">Save your spot now</a></strong></p><div><hr></div><h1>DoorDash: Beyond Single Agents: How DoorDash is building a collaborative AI ecosystem</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bmQn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bmQn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic 424w, https://substackcdn.com/image/fetch/$s_!bmQn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic 848w, https://substackcdn.com/image/fetch/$s_!bmQn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic 1272w, https://substackcdn.com/image/fetch/$s_!bmQn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bmQn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic" width="1011" height="568" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:568,&quot;width&quot;:1011,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:9840,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bmQn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic 424w, https://substackcdn.com/image/fetch/$s_!bmQn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic 848w, https://substackcdn.com/image/fetch/$s_!bmQn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic 1272w, https://substackcdn.com/image/fetch/$s_!bmQn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3de7ba5-d896-4699-af41-de7aa6a11fa2_1011x568.heic 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>DoorDash highlights the challenge of extracting reliable insights from fragmented knowledge systems and the limitations of single agents constrained by context, determinism, and long-horizon reasoning. The article details an evolutionary architecture that progresses from deterministic workflows to adaptive agents, hierarchical deep-agent systems with shared memory, and exploratory swarm-based A2A collaboration, all built on a unified platform featuring hybrid search, schema-aware SQL generation, multi-stage validation, and integrated guardrails. </p><p><strong><a href="https://careersatdoordash.com/blog/beyond-single-agents-doordash-building-collaborative-ai-ecosystem/">https://careersatdoordash.com/blog/beyond-single-agents-doordash-building-collaborative-ai-ecosystem/</a></strong></p><div><hr></div><h1>LinkedIn: The evolution of the Venice ingestion pipeline</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dz1m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dz1m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic 424w, https://substackcdn.com/image/fetch/$s_!dz1m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic 848w, https://substackcdn.com/image/fetch/$s_!dz1m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic 1272w, https://substackcdn.com/image/fetch/$s_!dz1m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dz1m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic" width="1024" height="375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f218e586-0386-4f5e-8537-b475265ff555_1024x375.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:375,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3836,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dz1m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic 424w, https://substackcdn.com/image/fetch/$s_!dz1m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic 848w, https://substackcdn.com/image/fetch/$s_!dz1m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic 1272w, https://substackcdn.com/image/fetch/$s_!dz1m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff218e586-0386-4f5e-8537-b475265ff555_1024x375.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LinkedIn writes about the challenge of scaling Venice ingestion to support massive bulk loads, hybrid Lambda-style stores, partial updates, and active/active replication while avoiding bottlenecks in producing, consuming, persisting, and compaction. The article details the end-to-end evolution of the ingestion pipeline, including partition scaling, shared consumer and writer pools, SST-based ingestion, RocksDB tuning with leveled compaction, BlobDB, Fast-Avro adoption, parallelized DCR processing, and adaptive throttling for deterministic latency. </p><p><strong><a href="https://www.linkedin.com/blog/engineering/infrastructure/evolution-of-the-venice-ingestion-pipeline">https://www.linkedin.com/blog/engineering/infrastructure/evolution-of-the-venice-ingestion-pipeline</a></strong></p><div><hr></div><h1>Dropbox: How Dash uses context engineering for smarter AI</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!30ne!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!30ne!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic 424w, https://substackcdn.com/image/fetch/$s_!30ne!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic 848w, https://substackcdn.com/image/fetch/$s_!30ne!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic 1272w, https://substackcdn.com/image/fetch/$s_!30ne!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!30ne!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic" width="1456" height="1173" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1173,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:28461,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!30ne!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic 424w, https://substackcdn.com/image/fetch/$s_!30ne!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic 848w, https://substackcdn.com/image/fetch/$s_!30ne!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic 1272w, https://substackcdn.com/image/fetch/$s_!30ne!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77f5db44-2e99-4f14-8831-ff9cbd6b64e3_2880x2320.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Dropbox writes about the transition from a traditional RAG search system to an agentic AI that must reason, plan, and act without being overwhelmed by excessive context, tool proliferation, or accuracy degradation in long-running tasks. The article details three core context-engineering strategies: </p><ol><li><p>Consolidating retrieval into a single universal search tool</p></li><li><p>Filtering context through a unified index and knowledge graph for relevance.</p></li><li><p>Delegating complex workflows, such as query construction, to specialized agents with focused prompts. </p></li></ol><p>The approach improves reasoning quality, reduces token and decision overhead, and enables faster, more accurate agentic execution across Dash&#8217;s growing AI capabilities.</p><p><strong><a href="https://dropbox.tech/machine-learning/how-dash-uses-context-engineering-for-smarter-ai">https://dropbox.tech/machine-learning/how-dash-uses-context-engineering-for-smarter-ai</a></strong></p><div><hr></div><h1>Sponsored: The guide to scaling your data team</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-07_data_engineering_weekly" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!awVq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!awVq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!awVq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!awVq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!awVq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19674,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:&quot;https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-07_data_engineering_weekly&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!awVq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic 424w, https://substackcdn.com/image/fetch/$s_!awVq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic 848w, https://substackcdn.com/image/fetch/$s_!awVq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic 1272w, https://substackcdn.com/image/fetch/$s_!awVq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fddef6372-2de8-4900-b62c-bc7695a60ea9_3840x2160.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Scaling Data Teams, the latest in our popular eBook series, is now available. Building and scaling a data platform has never been more important or more challenging. Whether you&#8217;re just starting to build a data platform or leading a mature data organization, this guide will help you scale your impact, accelerate your team, and prepare for the future of data-driven products.<br><br>Learn how real data teams, from solo practitioners to enterprise-scale organizations, build.</p><p><strong><a href="https://dagster.io/how-to-scale-data-teams-ebook?utm_campaign=27879954-25-11-DMND_eBook_Scaling_Data_Teams&amp;utm_source=email&amp;utm_medium=sponsorship&amp;utm_term=scaling_data_teams_ebook&amp;utm_content=12-07_data_engineering_weekly">Get the guide now</a></strong></p><div><hr></div><h1>LinkedIn: FishDB - a generic retrieval engine for scaling LinkedIn&#8217;s feed</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!soon!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!soon!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic 424w, https://substackcdn.com/image/fetch/$s_!soon!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic 848w, https://substackcdn.com/image/fetch/$s_!soon!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic 1272w, https://substackcdn.com/image/fetch/$s_!soon!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!soon!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic" width="1456" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6027,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!soon!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic 424w, https://substackcdn.com/image/fetch/$s_!soon!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic 848w, https://substackcdn.com/image/fetch/$s_!soon!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic 1272w, https://substackcdn.com/image/fetch/$s_!soon!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6262fcc-dc2b-43fb-99cc-227c56d11599_1496x382.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>LinkedIn writes about replacing its legacy Java-based feed infrastructure with FishDB, a custom Rust-based retrieval engine designed to eliminate Garbage Collection latency and reduce memory overhead by ~5x compared to JVM equivalents. By leveraging a lambda architecture for ingestion and graph-based document references for query execution, FishDB cut hardware usage by 50% while maintaining a strict 40ms p99 latency. The query language is very interesting, improving expressiveness, and it suits the retrieval engine well. </p><p><strong><a href="https://www.linkedin.com/blog/engineering/infrastructure/fishdb-a-generic-retrieval-engine-for-scaling-linkedins-feed">https://www.linkedin.com/blog/engineering/infrastructure/fishdb-a-generic-retrieval-engine-for-scaling-linkedins-feed</a></strong></p><div><hr></div><h1>Netflix: Integrating Netflix&#8217;s Foundation Model into Personalization applications</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FZZX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FZZX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic 424w, https://substackcdn.com/image/fetch/$s_!FZZX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic 848w, https://substackcdn.com/image/fetch/$s_!FZZX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic 1272w, https://substackcdn.com/image/fetch/$s_!FZZX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FZZX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic" width="960" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5262,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FZZX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic 424w, https://substackcdn.com/image/fetch/$s_!FZZX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic 848w, https://substackcdn.com/image/fetch/$s_!FZZX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic 1272w, https://substackcdn.com/image/fetch/$s_!FZZX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8228ea4-5f47-4e45-a510-7f73cb89c50d_960x540.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Netflix writes about integrating the personalization Foundation Model into diverse production systems that have different latency constraints, feature pipelines, and appetites for model complexity. The article details three integration patterns: using stable daily embeddings via the Embedding Store, embedding the Foundation Model as a fine-tunable subgraph within downstream models to provide fresh representations, and fully fine-tuning the Foundation Model to power product-specific objectives directly. </p><p><strong><a href="https://netflixtechblog.medium.com/integrating-netflixs-foundation-model-into-personalization-applications-cf176b5860eb">https://netflixtechblog.medium.com/integrating-netflixs-foundation-model-into-personalization-applications-cf176b5860eb</a></strong></p><div><hr></div><h1>Lyft: LyftLearn Evolution: Rethinking ML Platform Architecture</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tIQW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tIQW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic 424w, https://substackcdn.com/image/fetch/$s_!tIQW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic 848w, https://substackcdn.com/image/fetch/$s_!tIQW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic 1272w, https://substackcdn.com/image/fetch/$s_!tIQW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tIQW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic" width="1400" height="523" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:523,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:16277,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tIQW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic 424w, https://substackcdn.com/image/fetch/$s_!tIQW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic 848w, https://substackcdn.com/image/fetch/$s_!tIQW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic 1272w, https://substackcdn.com/image/fetch/$s_!tIQW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cac50b0-4217-4410-bae3-9f93607d8a51_1400x523.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Lyft writes about the challenge of scaling its Kubernetes-only LyftLearn ML platform as operational complexity, state management overhead, and cluster-level capacity tuning become bottlenecks for thousands of daily training jobs and notebooks. The article details a hybrid LyftLearn 2.0 architecture that keeps low-latency online serving on Kubernetes while moving offline compute to AWS SageMaker, enabled by cross-platform base images, a SageMaker manager service, and SQS/EventBridge&#8211;based state tracking. </p><p><strong><a href="https://eng.lyft.com/lyftlearn-evolution-rethinking-ml-platform-architecture-547de6c950e1">https://eng.lyft.com/lyftlearn-evolution-rethinking-ml-platform-architecture-547de6c950e1</a></strong></p><div><hr></div><h1>Grab: Real-time data quality monitoring: Kafka stream contracts with syntactic and semantic test.</h1><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ebFC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ebFC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic 424w, https://substackcdn.com/image/fetch/$s_!ebFC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic 848w, https://substackcdn.com/image/fetch/$s_!ebFC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic 1272w, https://substackcdn.com/image/fetch/$s_!ebFC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ebFC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic" width="1456" height="543" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:543,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:12551,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.dataengineeringweekly.com/i/180995676?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ebFC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic 424w, https://substackcdn.com/image/fetch/$s_!ebFC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic 848w, https://substackcdn.com/image/fetch/$s_!ebFC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic 1272w, https://substackcdn.com/image/fetch/$s_!ebFC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfa71f49-b9f4-4e26-b219-cee13de3306b_1991x743.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" style="height:20px;width:20px" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Grab writes about implementing a data contract in a real-time streaming engine with syntactic and semantic tests. The blog is essentially a testimonial for data contract frameworks and their role as an integral part of data pipeline engineering.  </p><p><strong><a href="https://engineering.grab.com/real-time-data-quality-monitoring">https://engineering.grab.com/real-time-data-quality-monitoring</a></strong></p><div><hr></div><h1>BlaBlaCar: Why We Built &#8220;BlaBlaCar Data Copilot&#8221;: Shifting Data Analysis Left</h1><p>BlaBlaCar writes about the bottleneck created by a hard boundary between software engineers and data analysts, where engineers avoid the warehouse, and analysts drown in ad hoc questions and fragile SQL, slowing feedback on product data and undermining true data ownership. The article presents BlaBlaCar <strong><a href="https://github.com/blablacar/data-copilot">Data Copilot</a></strong>, an AI-powered &#8220;junior analyst&#8221; inside the IDE that tunnels into BigQuery, uses curated queries and table previews via a lightweight zero-infrastructure RAG pattern, generates SQL/Python plus data health cards, and turns analyses into tested scripts reviewed via pull requests and stored as long-term repo memory. </p><p><strong><a href="https://medium.com/blablacar/why-we-built-blablacar-data-copilot-shifting-data-analysis-left-b4cc246faf52">https://medium.com/blablacar/why-we-built-blablacar-data-copilot-shifting-data-analysis-left-b4cc246faf52</a></strong></p><div><hr></div><h1>Vinted: Dense Retrieval</h1><p>The blog describes how low recall keyword searches on a highly visual, multilingual e-commerce catalog led to missed business opportunities and made it hard to safely roll out dense retrieval at scale under tight latency and consistency constraints. The article details a CLIP-based two-tower dense retrieval model trained with large-scale contrastive learning, exported to ONNX and embedded into a Vespa-based architecture, plus a long list of optimizations including index sharding by market, tuned ANN thresholds, approximate&#8594;exact retry strategies, global-phase reciprocal rank fusion to cap NN matches, and JVM tuning with GraalVM and ZGC. </p><p><strong><a href="https://vinted.engineering//2025/11/18/dense-retrieval/">https://vinted.engineering/2025/11/18/dense-retrieval/</a></strong></p><div><hr></div><p><em>All rights reserved, Dewpeche Pvt Ltd, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers&#8217; opinions.</em></p>]]></content:encoded></item></channel></rss>