Data Engineering Weekly #39

Weekly Data Engineering Newsletter

Apr 25, 2021

Welcome to the 39th edition of the data engineering newsletter. This week's release is a new set of articles that focus on WePay's Offline to Online Data Pipeline, Why is self-serving still a problem?, ABN AMRO's data ingestion architecture, InfoQ's Data Gateway, Intuit's SuperGlue, Lyft's Data Science teams remote work, NuBanks' distributing the data teams, Adobe's one-stop anomaly detection shop, and subscribing to a SQL query changes.

WePay: An Offline to Online Data Pipeline at WePay

The fraud detection system requires offline computations or complex analysis performed by users integrated and propagated to an online serving system to fight fraud. WePay writes about its Fraud detection infrastructure on top of the Google Cloud Platform.

https://wecode.wepay.com/posts/an-offline-to-online-data-pipeline-at-wepay

Benn Stancil: Why is self-serve still a problem?

Self-serve analytics is a north star architecture for any data infrastructure, but what is self-serve analytics? The author made some excellent narratives on self-serve analytics and emphasis why opinionated simplicity is better than indifferent optionality.

https://benn.substack.com/p/self-serve-still-a-problem

ABN AMRO: ABN AMRO’s Data Integration Architecture

ABM AMRO writes about its digital integration and access layer architecture with the concept of data providers and the consumers. How does the integration and interoperability work between the data provider and the consumers? The blog narrates Digital Integration and Access Layer(DIAL) and the principles behind the architecture.

https://piethein.medium.com/abn-amros-data-integration-architecture-3d266a59fbdd

InfoQ: Data Gateways in the Cloud Native Era

Modern distributed application architectures created the need for API Gateways and helped popularize API Management and Service Mesh technologies. The growing adoption of data mesh increases the domain-centric ownership similar to Microservices. The blog narrates the current landscape of data gateway and various technologies available to establish data gateway.

https://www.infoq.com/articles/data-gateways-cloud-native/

Intuit: Superglue — Journey of Lineage, Data Observability & Data Pipelines

Intuit open sources SuperGlue, its lineage-tracking tool built to help visualize data propagation through complex pipelines composed of tables, jobs, and reports. The SQL parsing to understand the lineage, dependency recommendations, anomaly detection, and personalization are some of the exciting features of SuperGlue.

https://towardsdatascience.com/superglue-journey-of-lineage-data-observability-data-pipelines-23ffb2990b30

Github: https://github.com/intuit/superglue

Lyft: How Lyft Data Scientists Have Worked Remotely During the Pandemic

Lyft's data scientist team shared some of their experience working during this pandemic. The article contains some great learning on the tools and skills that helped Lyft's data science team.

https://eng.lyft.com/a-day-in-the-life-of-a-lyft-data-scientist-ffe6651f138b

NuBank: Distributing the data team to boost innovation reliably

One of the critical KPI for a data team is its impact on the organization's data-driven culture. As it scales, a data platform does not only need to adapt its technology stack but also the people organization around it. NuBank writes an exciting blog sharing their learning on building the data team.

https://building.nubank.com.br/distributing-the-data-team-to-boost-innovation-reliably/

Adobe: Introducing the “One-Stop Anomaly Shop” (OSAS)

Adobe’s security intelligence team open sources OSAS, a security intelligence toolset to detect security anomalies.

https://medium.com/adobetech/introducing-the-one-stop-anomaly-shop-osas-c27581ee1bd3

GitHub: https://github.com/adobe/OSAS

Paper: https://www.insticc.org/Primoris/Resources/PaperPdf.ashx?idPaper=7424dHp8E4k=

Ask HN: Is there a way to efficiently subscribe to an SQL query for changes?

An interesting Hacker News thread on the topic of subscribing to the incremental materialize views. There are some excellent comparisons of RethinDB, Materialize, and Postgres' proposals for incremental view maintenance.

https://news.ycombinator.com/item?id=26901352

Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers' opinions.

Data Engineering Weekly