Data Engineering Weekly

Share this post

Data Engineering Weekly #104

www.dataengineeringweekly.com

Data Engineering Weekly #104

The Weekly Data Engineering Newsletter

Ananth Packkildurai
Oct 23, 2022
6
Share this post

Data Engineering Weekly #104

www.dataengineeringweekly.com

Data Engineering Weekly Is Brought to You by RudderStack

RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.


Editor’s Note: DEW is the reader’s choice & Is Data Catalog living up to the hype?

Hello Data Friends, Welcome to another edition of Data Engineering Weekly. First, I’m thrilled to see this poll from AirByte. 48.5 of Data Engineers say they read Data Engineering Weekly to keep up with the data engineering landscape. Thank you all for your kind support. ❤️❤️❤️

Twitter avatar for @ananthdurai
Ananth Packkildurai @ananthdurai
Super thrilled to see that 48.5% of my dear data friends say that they use @data_weekly to keep up with the data engineering landscape ❤️❤️❤️
Image
8:04 PM ∙ Oct 18, 2022

The top of my mind for this week is Data Catalog. I'm one of the early advocates for Data Catalogs and am excited about the possibility of Data Catalogs. The Data Engineering Weekly even published a special Metadata Edition focusing on the historical development of the Data Catalog.

https://www.dataengineeringweekly.com/p/data-engineering-weekly-21-metadata

It is almost two years since we published the metadata edition, but I keep thinking back. Does Data Catalogs live up to the promise? Hence I published an open poll on LinkedIn to find out.

https://www.linkedin.com/posts/ananthdurai_dataengineering-datacatalog-activity-6989772780862873600-YFmA/

We will talk more about Data Catalog in the coming weeks. Meanwhile, share your thoughts about Data Catalog in the poll & comments section. Oh, a humble request to the Data Catalog vendors, Please abstain from the poll ❤️


Rittman Analytics: The dbt Semantic Layer, Data Orchestration, and the Modern Enterprise Data Stack

It has been an eventful last week for dbt with Coalesce conference. I missed attending in person this time but caught of tech talks via live streaming. My reaction to the conference,

Twitter avatar for @ananthdurai
Ananth Packkildurai @ananthdurai
@pdrmnvd I wish there were more dbt internals tech talks :-(, but some great case studies by practitioners compensated for it.
4:36 PM ∙ Oct 22, 2022
Twitter avatar for @ananthdurai
Ananth Packkildurai @ananthdurai
@pdrmnvd Top ones for me are, Json Schema from @aerialfly & @emilyhawkins__ data indigestion from @Spotify, @HubSpot design as a daily activity, outgrowing dbt run by @voxdotcom & is Kimball still relevant from @JayPeeDevlin
11:29 PM ∙ Oct 22, 2022

You can watch all the recordings of the talk here

Two significant announcements at the dbt conference

  1. Python language support in dbt core

  2. public preview of the semantic layer

The author narrates an in-depth view of the dbt semantic layer.

https://blog.rittmananalytics.com/the-dbt-semantic-layer-data-orchestration-and-the-modern-enterprise-data-stack-78d9d9ed5c18


Ben Rogojan: The Next Generation Of All-In-One Data Stacks

We debated a lot of bundling vs. unbundling. Is all-in-one data stacks the future? The article from Ben came timely as dbt unveils the semantic layer to play the hub of the analytical ecosystem. The author compares five available all-in-one data platforms and discusses their pros & cons.

https://medium.com/coriers/the-next-generation-of-all-in-one-data-stacks-f46069ad10fd


[LAST CALL] There's still time to RSVP for IMPACT 2022 The Annual Data Observability Summit on October 25-26, 2022!

Don't miss a chance to get candid with your data peers on the hottest topics in data, learn about 2023 trends, and hear from the biggest names in data and analytics about the ideas and technologies pioneering our industry. Featuring, founders and data leaders from dbt Labs, Fivetran, The New York Times, GitLab, Fox Corporation.

Data Engineering Weekly readers, Get Your Free Ticket!!!


Criteo: Highlights of RecSys 2022

RecSys is a leading conference focusing on industrial recommender engine implementation. Crtieo published the key takeaway for the 2022 RecSys conference. TIL about AI Mediated Communication and its impact.

https://medium.com/criteo-engineering/highlights-of-recsys-2022-c136a9b6fbd0


Netflix: Orchestrating Data/ML Workflows at Scale With Netflix Maestro

Netflix writes about Maestro, it’s workflow orchestrator that can schedule and manage workflows at a massive scale. The design is a fantastic system design read on how to build a scalable orchestration engine. It is one of the very few systems I wish open-sourced soon.

https://netflixtechblog.com/orchestrating-data-ml-workflows-at-scale-with-netflix-maestro-aaa2b41b800c


Sponsored: Soda - 🗣 Podcast: How To Build A Common Understanding Of Your Data Reliability Rules

Regardless of how data is being used, it is critical that the information is trusted. The practice of data reliability engineering has gained momentum recently to address that question. Soda Checks Language helps support the efforts of data teams with the corresponding Soda Core utility that acts on this new DSL. In this Data Engineering Podcast by Tobias Macey episode, Tom Baeyens explains their reasons for creating a new syntax for expressing and validating checks for data assets and processes, as well as how to incorporate it into your own projects.


Checkout.com: Testing & Monitoring the Data Platform at Scale

Data testing and Data Observability are vital components to keep the data quality in a complex data pipeline. Checkout.com writes about how it uses dbt tests, Monte Carlo, and Data dog to test & monitor the data pipeline.

https://medium.com/checkout-com-techblog/testing-monitoring-the-data-platform-at-scale-e22d9cf433e8


Kumu: ML Engineering at Kumu - Turning Models into Products

Kumu writes about its ML platform journey with the end-to-end Machine Learning Platform lifecycle. The ML platform is overwhelmingly complex, and the Kumu team suggests focusing on three basics to scale. 

  1. Code Maintainability

  2. Automated Tests and Deployment

  3. Deployment Governance

https://medium.com/@karlitodata/ml-engineering-at-kumu-turning-models-into-products-b2b4faeb2b40


Sponsored - RudderStack: How Shippit Achieved a Unified View of Customers with Snowflake and Rudderstack

Join us live on October 25th for a free deep-dive webinar featuring Nitt Chuenprateep, Business Systems and Data Manager at Shippit. Learn from the experts as they share the secrets of Shippit’s success and how they successfully utilized Snowflake and RudderStack to become warehouse-first.
Don’t miss this opportunity to gain expert advice on how to build your ideal data stack and achieve a unified view of your customers.
https://www.rudderstack.com/events/how-shippit-achieved-a-unified-view-of-customers-with-snowflake-and-rudderstack/


Shopify: How to Structure Your Data Team for Maximum Influence

We often joke that Data Team is the backend of the backend. Gaining visibility in an org is the first significant challenge any data team will face to influence a data-driven culture. Shopify writes an exciting blog narrating how to structure the data team to maximize the influence in an org.

https://shopifyengineering.myshopify.com/blogs/engineering/how-to-structure-data-teams


Analytics @ Meta: Analytics and Product-Market Fit

When developing new products, the big question we seek to answer is, “Does this product have product-market fit?”Analytics plays a central role in addressing this question. The Meta team writes an exciting blog how to approach PMF (Product-Market Fit) through analytical engineering.

https://medium.com/@AnalyticsAtMeta/analytics-and-product-market-fit-11efaea403cd


Dan Frank: Experimentation Platform in a Day

Experimentation plays a vital role in analytical engineering. Should one buy expensive software to run experimentation? Is it complex to build an in-house experimentation platform? The author writes a simple enough hack to start experimentation without waiting to build a platform or buying experimentation software.

https://medium.com/deliberate-data-science/experimentation-platform-in-a-day-c60646ef1a2


Data Engineering in 2022: Exploring dbt with DuckDB

DuckDB, an in-process database management system, has gained good traction recently. Selectively mixing DuckDB with Panda's workload improves the data join performance significantly. The author writes a step-by-step guide on using dbt and DuckDB.

https://rmoff.net/2022/10/20/data-engineering-in-2022-exploring-dbt-with-duckdb/


Prabhuk Karthi STB: 10 Key Takeaways From Google Cloud Next22

Bye Bye Google Studio!! Google going all in on Looker as a default BI layer for Google BigQuery. Google also announced support for unstructured data analytics & Big Query ML pipeline integration with Vertex AI.

https://medium.com/google-cloud/10-key-take-aways-from-google-cloud-next22-d5def84a3cf4


Fast.ai: 1st Two Lessons of From Deep Learning Foundations to Stable Diffusion

Fast.ai published its course content on From Deep Learning Foundations to Stable Diffusion. Since the introduction of Stable Diffusion, it gains a lot of momentum all the way to the possibility to render the Cinema 4D scene natively. I’m looking forward to take this course.

https://www.fast.ai/posts/part2-2022-preview.html


All rights reserved Pixel Impex Inc, India. Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Share this post

Data Engineering Weekly #104

www.dataengineeringweekly.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing