Data Engineering Weekly

Data Engineering Weekly

Share this post

Data Engineering Weekly
Data Engineering Weekly
Data Engineering Weekly #11
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from Data Engineering Weekly
The Weekly Data Engineering Newsletter
Over 35,000 subscribers
Already have an account? Sign in

Data Engineering Weekly #11

Weekly Data Engineering Newsletter

Ananth Packkildurai's avatar
Ananth Packkildurai
Oct 04, 2020
1

Share this post

Data Engineering Weekly
Data Engineering Weekly
Data Engineering Weekly #11
Copy link
Facebook
Email
Notes
More
Share

Welcome to the 11th edition of the data engineering newsletter. This week's release is a new set of articles that focus on data infrastructure trends 2020, seven principles of data ops, data quality, ML transference & performance tuning, and Samza runner for Beam from LinkedIn, Twitter, DoorDash, Airbnb, Shopify, Apache Pinot, Dagster.


Developer productivity and the ability to iterate through the correctness of a job is always challenging. The Airflow test utility took the first step to improve developer productivity. Dagster in this post brought to the next level, describing how to run PySpark in either EMR or Dagster with the mode switch. 

https://dagster.io/blog/pyspark


The seven principles of reliable data pipelines are an excellent read compares with the Google SRE principles. The author narrates the importance of adopting SLO & SLI, reducing the toil, the importance of monitoring the pipeline, and simplicity.

https://medium.com/toro-data-quality/seven-principles-for-reliable-data-pipelines-e82a82810e4f


The 2020 data & AI landscape is an excellent read. The author narrates some of the recent trends in the data infrastructure. The shift from Hadoop systems to the cloud warehouses like the snowflake, Google Big query, The gaining momentum for the data lineage and the discovery tools, The second generation orchestration tools Prefect & Dagster the rise of the AIOps are the exciting trends to look.

https://mattturck.com/data2020/


LinkedIn published the benchmarking results for Samza runner for Apache Beam. It's a good reference article on how to think performance improvement as a continuous process.

https://engineering.linkedin.com/blog/2020/building-a-better-and-faster-beam-samza-runner


Twitter writes a short and exciting blog about the recent image cropping transparency issue. It is a good reminder that machine learning is not always an answer and lets users choose what they want.

https://blog.twitter.com/en_us/topics/product/2020/transparency-image-cropping.html


Airbnb writes about how it builds the data platform to conduct Revenue Forecasting at Airbnb. The blog is an excellent narration of some practical challenges with the data infrastructure, supporting multiple query engines, dynamic metrics generation, late arrival data, and maintains SLA.

https://medium.com/@jerry.chu/airbnbs-data-platform-of-revenue-forecasting-2e95a01122e6


Doordash writes about its recent performance challenges with the search scoring and the ranking model infrastructure. The blog narrates its migration to the internal predication service, emphasizing the importance of the dedicated feature store.

https://doordash.engineering/2020/10/01/integrating-a-scoring-framework-into-a-prediction-service/


The critical site-facing analytical applications require high throughput and strict p99th query latency. Apache Pinot is an excellent OLAP engine to serve use facing analytical solutions, and the article narrates the challenges of doing concurrent, low latency SLA queries using Apache Pinot.

https://medium.com/apache-pinot-developer-blog/achieving-99th-percentile-latency-sla-using-apache-pinot-2ba4ce1d9eff


Data quality has been a consistent focus, as it often leads to issues that can go unnoticed for a long time, bring entire pipelines to a halt, and erode stakeholders' trust in the reliability of their analytical insights. Great Expectations writes an excellent narration of how data quality is key to the success of MLOps.

https://medium.com/@expectgreatdata/why-data-quality-is-key-to-successful-ml-ops-a18d6e373ca9


Descriptive statistics and correlations are data scientists' bread and butter, but they often come with the caveat that correlation isn't causation. In this blog post, Shopify narrates different causal inference methods and uses them to build great products.

https://engineering.shopify.com/blogs/engineering/using-quasi-experiments-counterfactuals


Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers' opinions.


Subscribe to Data Engineering Weekly

By Ananth Packkildurai · Launched 5 years ago
The Weekly Data Engineering Newsletter
1

Share this post

Data Engineering Weekly
Data Engineering Weekly
Data Engineering Weekly #11
Copy link
Facebook
Email
Notes
More
Share

Discussion about this post

User's avatar
Functional Data Engineering - A Blueprint
How to build a Recoverable & Reproducible data pipeline
Dec 22, 2022 â€¢ 
Ananth Packkildurai
73

Share this post

Data Engineering Weekly
Data Engineering Weekly
Functional Data Engineering - A Blueprint
Copy link
Facebook
Email
Notes
More
3
The Future of Data Engineering: DEW's 2025 Predictions
Emerging Innovations, Evolving Roles, and the Roadmap to Scalable AI-Driven Insights
Dec 19, 2024 â€¢ 
Ananth Packkildurai
47

Share this post

Data Engineering Weekly
Data Engineering Weekly
The Future of Data Engineering: DEW's 2025 Predictions
Copy link
Facebook
Email
Notes
More
2
Towards Composable Data Infrastructure
A Case for Federated Data Catalog
Apr 11 â€¢ 
Ananth Packkildurai
36

Share this post

Data Engineering Weekly
Data Engineering Weekly
Towards Composable Data Infrastructure
Copy link
Facebook
Email
Notes
More

Ready for more?

© 2025 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.