Welcome to the 50th edition of the data engineering newsletter. This week's release is a new set of articles that focus on Benn Stancil’s the third rail, BVP’s data infrastructure roadmap, ValidIO’s DBT and analytical engineering, Zalando’s using knowledge graph to accelerate master data management, RudderStack’s real-time personalization with Redis, Anna Geller’s top 10 common pitfalls of data modeling, Zoba’s scaling more cities with Airflow, and DevianArt’s CDC journey.
Data Engineering Weekly - Brought to You by RudderStack - the Customer Data Platform for Developers
RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools.
Benn Stancil: The Third Rail
Benn Stancil writes an insightful view on the analyst role in an organization, the importance of analyst as a business decision support system, how the engineering skills overshadow the analyst function, and the lack of practical training brings the gap in the industry.
Bessemer Venture Partners: Roadmap - Data Infrastructure
Bessemer Ventures writes the guideline for its investment strategy for the data infrastructure. The blog has a comprehensive overview of the data infrastructure trends, the adoption of the cloud, and the future direction.
ValidIO: dbt and the Analytics Engineer — what’s the hype about?
The rise of the modern data stack is expanding the scope of analytical engineering. The article narrates how DBT catalyzes the change and discusses the evolution of data warehouses & data lakes.
Zalando: Knowledge Graph Technologies Accelerate and Improve the Data Model Definition for Master Data
The adoption of data lakes pushes master data management to the backseat partly because the "schema-on-write" MDM systems go against the "schema-on-read" design principles. At the same time, the adoption of microservices architecture is prone to yield inconsistent entity states. Zalando writes an exciting blog on how it tackles the MDM challenges with a graph database.
Sponsored - RudderStack: Real-Time Personalization with Redis and RudderStack
Nailing personalization can mean increasing revenue by 15%, but technical challenges keep many companies stuck using basic methods. RudderStack writes a step-by-step guide on designing and implementing a real-time personalization engine using Redis and RudderStack.
Anna Geller: 10 Common Mistakes When Building Analytical Data Models
Data modeling is the most challenging problem in data. The author narrates the most common pitfalls of data modeling, from treating data modeling as the one-of task to blindly following data modeling rules.
Zoba: Scaling Zoba to more cities using Airflow
Zoba, the on-demand forecasting and optimization tool for mobility, writes about using Airflow to expand mobility to the newer markets. The workflow optimization to sync bootstrapping and periodic sync using Airflow's TaskFlow API is a good design reference for Airflow.
DevianArt: Change Data Capture at DeviantArt
DevianArt writes about its journey of adopting CDC infrastructure. Often the CDC narrations focusing on the choice of the CDC frameworks. The author did an excellent job focusing on describing the overall complexity of the ecosystem and troubleshooting.
Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers' opinions.