Data Engineering Weekly

Share this post
Data Engineering Weekly #109
www.dataengineeringweekly.com

Data Engineering Weekly #109

The Weekly Data Engineering Newsletter

Ananth Packkildurai
Nov 28, 2022
7
Share this post
Data Engineering Weekly #109
www.dataengineeringweekly.com

Data Engineering Weekly Is Brought to You by RudderStack

RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.


Data Contracts for SaaS Developers with Benn Stancil

The Thanksgiving break gives me enough time to catch up on a few podcasts. I'm a fan of the "SaaS Developer Community" podcast and Benn's writing, and I can't miss any conversation about Data Contracts.😎

My thoughts on this conversation, Benn captured very well the overall goal of the Data Contract and the skepticism around it. I have a long list of thoughts on this conversation, which might need a blog post on its own. I want to address one comment in the conversation.

“Developer’s Job is to ship the application code, not to make your dashboard looks good”

I agree that shipping the application code is the priority. But What is an “Application Code”? Let’s take an example of Slack features, “Compose a DM,” Channel Selection," Invite Members,” or “Invite Reminder”? Machine Learning powers every application feature listed above. Maybe Slack is 1% of the company implementing data engineering effectively to drive the product feature, but that is the point of implementing data contract and shifting left for an efficient data creation process.

If you think Data is only for unknown dashboards & back office needs, and data is not part of your product strategy, Sure, you don’t need Data Contract. But if you want to be that 1% of the company that differentiates the product experience and business operation with data, you need to focus on implementing Data Contracts.


Ian Macomber: Data Systems Tend Towards Production

I've seen many data predictions for successive years, but I'm always a fan of folks writing a look back at what happened in the industry to light up the future trend. Possibly one of the best reads I have had recently in Data Engineering, the author highlights three emerging patterns in Data engineering.

  1. Systems Tend Towards Production

  2. Systems Tend Towards Blind Federation

  3. Systems Tend Towards Layerinitis

https://ian-macomber.medium.com/data-systems-tend-towards-production-be5a86f65561


Meta AI: CICERO - An AI agent that negotiates, persuades, and cooperates with people

Did Meta successfully privatize world peace? 🤔

Robert Downey Jr Privatised World Peace GIF - Robert Downey Jr Privatised World  Peace Tony Stark - Discover & Share GIFs

Meta writes about CICERO – the first AI to achieve human-level performance in the popular strategy game Diplomacy. CICERO demonstrated this by playing on webDiplomacy.net, an online version of the game. CICERO achieved more than double the average score of the human players and ranked in the top 10 percent of participants who played more than one game!!!

https://ai.facebook.com/blog/cicero-ai-negotiates-persuades-and-cooperates-with-people/


Airbnb: How AI Text Generation Models Are Reshaping Customer Support at Airbnb

Airbnb takes customer service from a simple customer service response template to the AI text generation model. The real-time agent assistant model is an exciting read.

https://medium.com/airbnb-engineering/how-ai-text-generation-models-are-reshaping-customer-support-at-airbnb-a851db0b4fa3


Sponsored: [Live Webinar] How JetBlue Builds Trust in Data and Improves Model Accuracy

The data team at JetBlue Airways, a leading carrier in the United States, is responsible for powering insights for the entire organization’s operational and customer service activities. Learn how JetBlue’s data engineering and data science teams leverage Monte Carlo and Snowflake together to accelerate data analysis and drive business value.

Save Your Seat


Myntra: Quicksilver - Near Real Time Platform at Myntra

Myntra writes about its near-real-time streaming platform built on top of Kafka, Flink & Spark. It is a great overview of streaming infrastructure characteristics.

https://medium.com/myntra-engineering/quicksilver-near-real-time-platform-at-myntra-9e8edf6ede91


Airbnb: Building Airbnb Categories with ML and Human-in-the-Loop

Data & Machine Learning are increasingly powering the applications and driving user experience. Airbnb writes one case about building Airbnb, building travel categories with ML and human-in-the-loop.

https://medium.com/airbnb-engineering/building-airbnb-categories-with-ml-and-human-in-the-loop-e97988e70ebb


Sponsored: It’s Time for the Headless CDP

In this piece RudderStack CEO, Soumyadeb Mitra, makes the case for a new approach to the customer data platform—the headless CDP. He defines the headless CDP as a tool with open architecture, purpose built for data and engineering teams, that makes it easy to collect customer data from every source, build your customer 360 in your own warehouse, then make that Data available to your entire stack.

https://www.rudderstack.com/blog/it-s-time-for-the-headless-cdp/


Becket Qin: Apache Flink SQL - Past, Present, and Future

Flink SQL made significant advancements in unifying the batch and the real-time computation. The blog captures the history of Flink SQL, its current state, and the challenges ahead of it. The stream-stream join is still expensive to operate; I’m excited to see the future progress of Flink SQL and how it can simplify operating streaming infrastructure.

https://www.ververica.com/blog/apache-flink-sql-past-present-and-future


LINE Engineering: A story of introducing data lineage into LINE's large-scale data platform

I thought Apache Atlas was largely forgotten at this stage; Line writes an exciting blog about its usage of Apache Atlas for data lineage. Too many data lineage visualizations can also confuse the users, and it is exciting that the Line data team highlighted the edge case and how it solved it.

https://engineering.linecorp.com/en/blog/data-lineage-on-line-big-data-platform


AutoTrader: Real-Time Personalisation of Search Results with Auto Trader's Customer Data Platform

Feature Snippets are a vital technique to elevate the search & discovery experience for the users. AutoTrader writes about the system design of its customer segmentation to drive the Feature Snippet in its search experience.

https://engineering.autotrader.co.uk/2022/11/23/enabling-real-time-personilsation-with-our-in-house-customer-data-platfom.html


Adrian Bednarz: DBT repository — to split or not to split?

Should we keep dbt monorepo in an organization or split it as multiple repos? The build systems like Bazel and Pants encourage monorepo, but that comes with operation and implementation costs. The author narrates how the dbt package helps to minimize code duplication and encourages multi-repo patterns.

https://techwithadrian.medium.com/dbt-repository-to-split-or-not-to-split-909d366d0998


Nic Crane: Type inference in readr and arrow

Every engineer has their own horror stories about their work with CSV files. We can write N number of blogs on Why You Don’t Want to Use CSV Files, But CSV format is widely used in data science and the simple human-readable format that is widely known and understood. The simplicity of CSV is its drawback; one such drawback is the lack of a type system. The author narrates how Apache Arrow infers types while reading the CSV file.

https://thisisnic.github.io/2022/11/21/type-inference-in-readr-and-arrow/


All rights reserved ProtoGrowth Inc, India. Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Share this post
Data Engineering Weekly #109
www.dataengineeringweekly.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing