Data Engineering Weekly

Share this post
Data Engineering Weekly #70
www.dataengineeringweekly.com

Data Engineering Weekly #70

Weekly Data Engineering Newsletter

Ananth Packkildurai
Jan 17, 2022
1
Share this post
Data Engineering Weekly #70
www.dataengineeringweekly.com

Data Engineering Weekly - Brought to You by RudderStack - the Customer Data Platform for Developers

RudderStack Provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools.


Let’s start this week’s edition with some excellent SQL tips.

Twitter avatar for @ergestx
Ergest Xheblati 🦊 @ergestx
I’ve been writing SQL for ~15 years. I’ve seen hundreds of thousands of lines of code. Over time I developed a set of patterns and best practices I always come back to when writing queries. This is my attempt to decode them 👇👇👇
1:47 PM ∙ Jan 8, 2022
1,026Likes227Retweets

Galen B: Why Google Treats SQL Like Code and You Should Too

Analytics engineering practices are becoming standard across the analytics world. The author summarizes why we should treat SQL as code and the need for a version control system for analytics.

https://blog.devgenius.io/why-google-treats-sql-like-code-and-you-should-too-53f97925037e


Vicki Boykis: Git, SQL, CLI

Continuing on the analytical engineering topic, the author writes about some of the fundamental tools you’ll require and any technical job. It's nice to see SQL top there as required, not just for analytics but for all technical jobs.

https://vickiboykis.com/2022/01/09/git-sql-cli/

You can see a similar sentiment in this tweet.

Twitter avatar for @alexcwatt
Alex Watt @alexcwatt
SQL is one of those things I wish I'd learned earlier. I underestimated how useful it is in general.
11:11 PM ∙ Jan 6, 2022
137Likes14Retweets

Netflix: Auto-Diagnosis and Remediation in Netflix Data Platform

Efficient feedback with the auto-remediation system can save many on-call hours. Netflix writes about the regex rule engine to diagnose the most common batch and real-time system errors.

https://netflixtechblog.com/auto-diagnosis-and-remediation-in-netflix-data-platform-5bcc52d853d1


LinkedIn: A closer look at how LinkedIn integrates fairness into its AI products

LinkedIn writes about its algorithmic fairness and explainability design to measure and mitigate unfair bias at scale. The Fairness training toolkit and the continuous feedback look to measure the success of the Fair model analyzer are exciting reads.

https://engineering.linkedin.com/blog/2022/a-closer-look-at-how-linkedin-integrates-fairness-into-its-ai-pr


DoorDash: Introducing Fabricator - A Declarative Feature Engineering Framework

We've seen an increasing pattern of adopting declarative DSL patterns for end-to-end feature engineering in real-time and batch mode. Airbnb has written about its declarative feature engineering system in the past. DoorDash writes an exciting blog describing Fabricator, its declarative feature engineering framework.

https://doordash.engineering/2022/01/11/introducing-fabricator-a-declarative-feature-engineering-framework/


Metaphor: The Modern Metadata Platform - What, Why, and How?

Support for broad integration patterns like push-pull, and most importantly, analytics on top of the metadata is vital for the modern metadata platform. Answering questions like "show me all the datasets that contain PII, accessed directly or indirectly via lineage within the last three months" are vital to gain insight into the data management system. Metaphor writes about DataHub and how it supports the modern metadata platform capabilities.

https://metaphor.io/blog/the-modern-metadata-platform


Sponsored: New Year, Better Event Data with Avo & Rudderstack

Join RudderStack and Avo for a live webinar on January 27 @ 9am PT to learn how you can increase your event data quality and streamline your behavioral data pipelines.

https://www.avo.app/event-driven-infrastructure-webinar


Pedram Navid: Airflow, Prefect, and Dagster: An Inside Look

Airflow is one breakthrough system that brings code as a pipeline pattern to data engineering. Since then, orchestration engines like Prefect & Dagster have taken the concept to the next level with Airflow learning. The author takes some of the pain points of running Airflow and compares it with Dagster & Prefect.

https://towardsdatascience.com/airflow-prefect-and-dagster-an-inside-look-6074781c9b77

I shared my thoughts on the DAG model of data pipeline is obsolete here.

Twitter avatar for @ananthdurai
Ananth Packkildurai @ananthdurai
The future of DAG is NO DAG!!! The DAG approach for the data pipeline taking the focus off from data asset & asset lifecycle. Currently we are mitigating the asset mgmt with auxiliary systems like lineage & discovery which runs on asset model, not DAGs.
4:47 AM ∙ Dec 9, 2021
4Likes2Retweets

Trip.com: StarRocks efficiently supports high concurrent queries, dramatically reducing labor and hardware costs.

Trip.com writes about its expereince switching from ClickHouse to StarRocks for their real-time analytical database. TIL about Star rocks, and seems like an exciting system. I’m continuing to hear more performance issues with ClickHouse, and am curious to know folks experience with ClickHouse. Please DM @data_weekly if you’re using ClickHouse in production.

https://starrocks.medium.com/trip-com-starrocks-efficiently-supports-high-concurrent-queries-dramatically-reduces-labor-and-1e1921dd6bf8


CIDR : CIDR Conference 2022

Lastly, I enjoyed attending CIDR 2022 last week, and was delighted to see the latest research on the data ecosystem. All the CIDR talks and papers published here

http://cidrdb.org/cidr2022/program.html


Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Share this post
Data Engineering Weekly #70
www.dataengineeringweekly.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing