Data Engineering Weekly

Share this post

Data Engineering Weekly #86

www.dataengineeringweekly.com

Data Engineering Weekly #86

The Weekly Data Engineering Newsletter

Ananth Packkildurai
May 16, 2022
3
Share this post

Data Engineering Weekly #86

www.dataengineeringweekly.com

Data Engineering Weekly Is Brought to You by RudderStack

RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.


Pedram Navid: We need to talk about dbt

It's a busy week in the dbt land, with Pedram Navid's blog post detailing the lack of clarity or transparency on the roadmap for dbt. It triggers exciting conversation in the data Twitter that leads to The response you deserve! It's a healthy sign of a robust community-driven system design in progress.

https://pedram.substack.com/p/we-need-to-talk-about-dbt


Anaconda: Welcome to the World PyScript

I know I'm probably late to talk about it, but "Python in the browser" is an exciting development. My First Impression Trying Python on Browser is an excellent follow-up on trying PyScript.

https://engineering.anaconda.com/2022/04/welcome-pyscript.html


Sarah Krasnik: Choosing a Data Catalog

A data catalog is essential for a collaborative analytical solution, but how do you choose one? Sarah writes about the spectrum of available data catalog solutions and their pros & cons.

https://sarahsnewsletter.substack.com/p/choosing-a-data-catalog?s=r


Twitter: Understanding Twitter conversations - A Wordle case study

Twitter shared an exciting blog about the Twitter conversation with Wordle as a case study. It's a good reference article on telling stories with data analytics.

https://blog.twitter.com/engineering/en_us/topics/insights/2022/understanding-twitter-conversations--a-wordle-case-study


Sponsored: Firebolt - How Vimeo Keeps Data Intact with 85 Billion Events Per Month

Lior Solomon, VP of Data Engineering at Vimeo shares his own experience on The Data Engineering Show: What made him recently build a new data ops team? How do you operate a data stack that supports 85 billion events per month and 2 PBs of data? What does Fatal Attraction have to do with all of this?

https://www.firebolt.io/blog/how-vimeo-keeps-data-intact-with-85b-events-per-month


DoorDash: How We Applied Client-Side Caching to Improve Feature Store Performance by 70%

The latency requirements bring unique challenges in adopting the prediction services. DoorDash writes about its client-side Caching to improve Feature Store performance by 70%.

https://doordash.engineering/2022/05/03/how-we-applied-client-side-caching/


Snowflake: Data Vault Techniques on Snowflake - Immutable Store, Virtual End Dates

The importance of the data model is often an undervalued process. Data modeling is challenging since most of the process depends on individual experience and opinion, but following standard techniques like Data Vault can bridge the gap. Snowflake writes an exciting blog on data vault techniques in Snowflake.

https://www.snowflake.com/blog/data-vault-technique-immutable-storage/


Jesse Paquette: What Is Well-Modeled Data for Analysis?

Staying with the importance of data modeling, why should one care about data modeling? The author narrates the various aspects of a well-defined data model.

https://towardsdatascience.com/what-is-well-modeled-data-for-analysis-28f73146bf96


Sponsored: Rudderstack - A Practical Guide to The Modern Data Stack: The Data Maturity Journey

Data maturity is rapidly becoming a matter of survival, but the modern data stack can be overwhelming. Here, RudderStack provides a helpful framework that places the tools of the modern stack in the context of a 4-stage journey to help you build the right stack at every stage.

https://www.rudderstack.com/blog/a-practical-guide-to-the-modern-data-stack-the-data-maturity-journey


Intuit: Data X-ray - Automated Data Quality Analysis Tool Streamlines Feature Selection Process for Machine Learning

Intuit writes about the challenges of too many features in the feature selection process and how Data X-ray data quality solutions help them. The blog narrates Data X-rays' automated data quality analysis of feature attributes analysis, feature selection analysis & feature pruning analysis.

https://medium.com/intuit-engineering/data-x-ray-automated-data-quality-analysis-tool-streamlines-feature-selection-process-for-machine-9c4a93e76cb6


Whatnot: Tuning Whatnot’s Data Platform for Speed and Scale

Whatnot writes about tuning its data platform for speed and scale, focusing on three founding principles.

  1. Build Modules, Not Monoliths

  2. Domains Own Their Data

  3. Automate Platform Processes

https://medium.com/whatnot-engineering/tuning-whatnots-data-platform-for-speed-and-scale-28d2b5993b42


Sponsored: Monte Carlo Data - The Modern Data Leader’s Playbook

Learn how today’s best data engineering and analytics leaders are staying ahead of the competition in our complete guide.

Download the modern data leader’s playbook


Pinterest: Manas HNSW Streaming Filters

Pinterest writes about HNSW (Hierarchical Navigable Small World graphs) streaming filters on top of its in-house search engine Manas. The streaming filtering abstracts away implementation details of how filtering is executed and relieves the client from the burden of over-fetch tuning.

https://medium.com/pinterest-engineering/manas-hnsw-streaming-filters-351adf9ac1c4


Back Market Tech: Data for Product Managers

How does data analytics translate into the day-to-day as a product manager? The article is excellent data for the product managers.

https://engineering.backmarket.com/data-for-product-managers-part-1-2-fd2967333c00


All rights reserved Pixel Impex Inc, India. Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Share this post

Data Engineering Weekly #86

www.dataengineeringweekly.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing