Data Engineering Weekly

Share this post

Data Engineering Weekly #88

www.dataengineeringweekly.com

Data Engineering Weekly #88

The Weekly Data Engineering Newsletter

Ananth Packkildurai
May 29, 2022
4
Share this post

Data Engineering Weekly #88

www.dataengineeringweekly.com

Data Engineering Weekly Is Brought to You by RudderStack

RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.


Ternary Data: Data Contracts & Domain Ownership w/ Ananth Packkildurai

Last week sit with Joe Reis & Matt Housley about data contracts and domain ownership. I talked about Schemata, the first open-source “data contract” framework. I believe the data contract & data sharing is the next big wave in data engineering. You can find more details about Schemata in schemata.app


GoCardless: Data Contracts at GoCardless — 6 Months On

Staying on Data Contracts, where Schemata.app solving the semantic layer of the data contracts, GoCardless writes about the data transportation using the Outbox Pattern.

https://medium.com/gocardless-tech/data-contracts-at-gocardless-6-months-on-bbf24a37206e

An excellent blog from Debezium on implementing CDC with Outbox pattern.

https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/


Shopify: Lessons Learned From Running Apache Airflow at Scale

Shopify writes about lessons learned from running Airflow at scale. Shopify runs 10,000 DAGs with an average of 400 concurrent tasks at any given point and 150,000 DAG runs per day!. The manifest file approach is exciting and reminds me of Slack's "SlackDAG," an implementation of Airflow DAG that helped solve ownership problems and enforce the best practices as part of the CI/ CD process.

https://shopifyengineering.myshopify.com/blogs/engineering/lessons-learned-apache-airflow-scale


Sponsored: Firebolt - Embedded Analytics vs. Data Apps

But Data Apps is still a loosely defined term, and there’s a lot of debate and confusion about what it really means, and how it differs from traditional dashboarding and embedded analytics. Boaz Farkash shares his point of view on the subject.

https://www.firebolt.io/blog/embedded-analytics-vs-data-apps


Vimeo: The evolution of event data collection at Vimeo, part 1: the Fatal Attraction era

Defining a well-regulated event field at the source will reduce the significant burden on the ETL system. Vimeo writes about its journey of building a scalable event tracking library. The blog narrates the pros & cons of “attribute-based” tracking vs. “user-action-based” tracking.

https://medium.com/vimeo-engineering-blog/the-evolution-of-event-data-collection-at-vimeo-part-1-the-fatal-attraction-era-9eae5f67e1bc


Expedia: Software Architectural Patterns in Data Engineering

Data Engineering has come a long way from click & drop tools to robust data frameworks to programmatically author data pipelines. It opens up adopting software engineering best practices. Expedia compares the data engineering tools/frameworks with the software architectural pattern.

https://medium.com/expedia-group-tech/software-architectural-patterns-in-data-engineering-5d3bf22106a0


Sponsored: Monte Carlo Data - The Modern Data Leader’s Playbook

Learn how today’s best data engineering and analytics leaders are staying ahead of the competition in our complete guide.

Download the modern data leader’s playbook


DoorDash: Meet Dash-AB — The Statistics Engine of Experimentation at DoorDash

An efficient experimentation framework is vital for the safe and faster iteration of the product. DoorDash writes about Dash-AB, a centralized library for statistical analysis.

https://doordash.engineering/2022/05/24/meet-dash-ab-the-statistics-engine-of-experimentation-at-doordash/


Netflix: A Survey of Causal Inference Applications at Netflix

Staying with the importance of the culture of experimentation, Netflix had an internal Causal Inference and Experimentation Summit. I thought this was amazing and could apply to other parts of the critical infrastructures. Netflix shares a sneak peek of the event in this blog post with a few selected talks.

https://netflixtechblog.com/a-survey-of-causal-inference-applications-at-netflix-b62d25175e6f


Grubbhub: Forecasting Grubhub Order Volume At Scale

Real-time demand forecasting in a supply chain system is always challenging. Grubhub writes about its demand forecasting data infrastructure and design principles.

https://bytes.grubhub.com/forecasting-grubhub-order-volume-at-scale-a966c2f901d2


Sponsored: RudderStack - Fireside Chat: The Future of Analytics on the Modern Data Stack

Join RuddersStack for a fireside chat on the future of analytics with Hex co-founder, Barry McCardel, and Transform co-founder, Nick Handel. They'll talk about bridging the gap between data and business functions, discuss the current state of analytics, and examine the next challenges to be tackled in data analytics.

https://www.rudderstack.com/video-library/future-of-analytics-on-the-modern-data-stack


MoMoTechnologies: MLOps at MoMo: Feature Store

Feature store becomes an essential part of the data infrastructure. MoMo writes about the need for a feature store and its evaluation of open-source feature stores. The blog narrates how MoMo developed ML workflow, ingestion, and data quality management.

https://tech.info-momo.com/mlops-at-momo-feature-store-e38e59da272e


Erick Reyes: This is how I onboarded more than 10 Data Engineers and got excellent reviews and feedback

A well-thought-through onboarding process boosts the developer's productivity and establishes an inclusive engineering culture. The author shared thoughts on approaches to onboard new data engineers. Please comment how is data engineering onboarding looks like in your team.

https://medium.com/@erickreyesr22/this-is-how-i-onboarded-more-than-10-data-engineers-and-got-excellent-reviews-and-feedback-c569bcfca8f9


Ifihanagbara Olusheye: How to Use PyScript – A Python Frontend Framework

PyScript generated a lot of excitement in the data community. I found this is an excellent tutorial on how to use PyScript

https://www.freecodecamp.org/news/pyscript-python-front-end-framework/


All rights reserved Pixel Impex Inc, India. Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Share this post

Data Engineering Weekly #88

www.dataengineeringweekly.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing