Data Engineering Weekly

Share this post

Data Engineering Weekly #77

www.dataengineeringweekly.com

Discover more from Data Engineering Weekly

The Weekly Data Engineering Newsletter
Over 20,000 subscribers
Continue reading
Sign in

Data Engineering Weekly #77

Weekly Data Engineering Newsletter

Ananth Packkildurai
Mar 7, 2022
2
Share this post

Data Engineering Weekly #77

www.dataengineeringweekly.com
Share

Data Council - Austin 2022

Data Council published the Austin 2022 schedule here. The data engineering weekly readers can get a 20% discount using promo code DataWeekly20

https://www.datacouncil.ai/austin


Datanami: Harvard’s New Data Storage Is to Dye For, Avoids DNA Storage Pitfalls

The explosion in data collection has led to challenges in storing enormous amounts of data, particularly for archival data. The Harvard researchers introduce a new container for long-term storage: dye!!

https://www.datanami.com/2022/02/14/harvards-new-data-storage-is-to-dye-for-avoids-dna-storage-pitfalls/


LinkedIn: Near real-time features for near real-time personalization

Establishing a faster feedback loop is vital in developing the recommendation engine. LinkedIn writes about the usage of Samza SQL and Apache Pinot to build near real-time personalization.

https://engineering.linkedin.com/blog/2022/near-real-time-features-for-near-real-time-personalization


eBay: Building a Deep Learning-Based Retrieval System for Personalized Recommendations

On a similar note with LinkedIn's previous blog, eBay writes about the maturity phases of the recommendation engine. The blog narrates the architecture style adopted by eBay from batch-only, batch & near real-time to a near-real-time (NRT) system.

https://tech.ebayinc.com/engineering/building-a-deep-learning-based-retrieval-system-for-personalized-recommendations/


Spotify: Search Journey Towards Better Experimentation Practices

Spotify writes about the adoption of its experimentation in its search product. Any product will go through its technology adoption lifecycle( Chasm theory), yet we rarely talk about it. Spotify narrates the adoption curve and the importance of starting and maintaining the momentum in adoption. 

https://engineering.atspotify.com/2022/02/search-journey-towards-better-experimentation-practices/

Spotify’s New Experimentation Platform (Part 1)

Spotify’s New Experimentation Platform (Part 2)


Future: Why SQL Needs Software Libraries

In the last two decades, the industry has attempted to reinvent the alternative for SQL with no success. The lack of a software library and the limitation in distributing SQL are some of the significant shortcomings of SQL. It's is an exciting conversation on SQL, software libraries, and dbt.

https://future.a16z.com/sql-needs-software-libraries/


Aliaksei Mikhailiuk: Nine Tools I Wish I Mastered before My Ph.D. in Machine Learning

A good collection of tools before starting working on machine learning & AI at the industrial scale engineering. 

Question to the readers: What would be the top 9 tools you wish you had learned before entering data/ analytical engineering? Please tweet back to @data_weekly.

https://towardsdatascience.com/nine-tools-i-wish-i-mastered-before-my-phd-in-machine-learning-708c6dcb2fb0


Sponsored: From First-Touch to Multi-Touch Attribution With RudderStack, Dbt, and SageMaker

Here, RudderStack provides a detailed overview of the architecture, data, and modeling required to assess the contribution to conversion in multi-touch customer journeys.

https://www.rudderstack.com/blog/from-first-touch-to-multi-touch-attribution-with-rudderstack-dbt-and-sagemaker


Mihail Eric: MLOps Is a Mess, But That's to be Expected

The ML & data landscape is fragmented, where each tool tries to solve niche problems. The author narrates the current state of the MLOps and a few predictions. The author predicts Increasing consolidation around end-to-end platforms, similar to the conversation in the data landscape with bunding & unbundling. 

https://www.mihaileric.com/posts/mlops-is-a-mess/


James Le: What I Learned From Attending Tecton apply(meetup) 2022

The author shared the notes from the Tecton conference. I didn't have a chance to go through the full notes, but there is tons of learning. Thanks, James, for sharing your notes.

https://data-notes.co/what-i-learned-from-attending-tecton-apply-meetup-2022-4b7be87e2f17


Monte Carlo: Building End-to-End Field Level Lineage for Modern Data Systems

Data lineage is a critical connector to establish end-to-end observability and explainability of the analytical pipeline. Monte Carlo writes about the importance of the column-level lineage of the SQL pipeline and the design journey to establish observability.

There are a lot of talks and effort on "Explainable AI" did we achieve "Explainable Analytics." Is there any tool a salesperson can use to understand the business logic of ARR computation without understanding SQL? Found anything, please tweet @data_weekly

https://www.infoq.com/articles/field-level-lineage-modern-data-systems/


Expedia: Handling Incompatible Schema Changes with Avro

A backward-incompatible schema change is painful, and I still remember fixing a Thrift incompatible change to make sure any backfilling in the future does not break. Expedia writes an exciting blog that narrates how to handle Avro incompatible schema changes.

https://medium.com/expedia-group-tech/handling-incompatible-schema-changes-with-avro-2bc147e26770


PyCaret: Multiple Time Series Forecasting with PyCaret

TIL about PyCaret, an open-source, low-code machine learning library and end-to-end model management tool built-in Python for automating machine learning workflows.

https://www.kdnuggets.com/2021/04/multiple-time-series-forecasting-pycaret.html


Sponsored: Rudderstack - The Data Stack Show Live: Is Reverse ETL Just Another Data Pipeline?

You’ve heard about Reverse ETL. Here’s your chance to learn all about the tooling from the folks who are creating it. Join Hosts Eric and Kostas for a live recording of The Data Stack Show on March 9th to get insights from experts at Census, Hightouch, and Workato.

https://datastackshow.com/livestream-registration-reverse-etl/


Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

2
Share this post

Data Engineering Weekly #77

www.dataengineeringweekly.com
Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing