Data Engineering Weekly

Share this post

Data Engineering Weekly #102

www.dataengineeringweekly.com

Data Engineering Weekly #102

The Weekly Data Engineering Newsletter

Ananth Packkildurai
Sep 26, 2022
3
Share this post

Data Engineering Weekly #102

www.dataengineeringweekly.com

Data Engineering Weekly Is Brought to You by RudderStack

RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.


Benn Stancil: Fine, let's talk about data contracts

Data Contract is the most discussed topic recently in the data world. Benn highlighted that he agrees that disagreement is a problem but disagrees that we need an agreement to solve it.

In Benn’s article,

Data contracts make exactly that trade. They replace a brittle technical system with a negotiating table. And the more that contracts depend on one another, the more people will want to be involved. I don’t know if that kills innovation, but it’s at least an annoying set of conversations that most people don’t want to have.

I'm afraid I have to disagree with this assessment. Change Management is all we do in software engineering.

Code Review, PRD, RFC, Sprint Planning; Everything is a negotiation in software engineering. Does it kill innovation? No, in fact, it accelerates industrial-scale innovation. So why the special treatment for Data Engineers?

https://benn.substack.com/p/data-contracts

It brings a question; Hey, Ananth. I’m a Data Engineering Leader. When should I focus on Data Contracts? I made a Magic Quadrant for you.

Ping me on LinkedIn. Curious to know your thoughts: https://www.linkedin.com/in/ananthdurai/


Chad Sanderson: The Production-Grade Data Pipeline

Chad talks about what it takes to build a production-grade data pipeline. The article focus on

  1. Collaborative design

  2. Contracts

  3. Expectations

  4. Monitoring

  5. Change Management

https://dataproducts.substack.com/p/the-production-grade-data-pipeline


Lauren Balik: How Fivetran + dbt actually fail

Does ELT is way more heavily rent-seeking than ETL? Did we shift right too far to do the data transformation? The author discusses Fivetran and dbt as an example of the ELT model.

https://medium.com/@laurengreerbalik/how-fivetran-dbt-actually-fail-3a20083b2506


Ben Rogojan: Onboarding For Data Teams

The onboarding process is easily the best time to learn about organizational culture. An effective onboarding process demonstrates strong empathetical and inclusive engineering practices. The author writes about the experience of data team onboarding processes.

https://medium.com/coriers/onboarding-for-data-teams-100e041a012c


Sponsored: Firebolt - Cloud Data Warehouse Costs: Look Before You Leap

Have you ever totally overrun your monthly budget for an analytics environment overnight? Here are a few thoughts on how we prepare ourselves for what lies ahead in the public cloud and in the economy. In this post, we look at factors to consider when building a data warehouse. Our goal is to point out the potholes you are most likely to hit from a cost perspective and what you can do to avoid them.

https://www.firebolt.io/blog/cloud-data-warehouse-costs-look-before-you-leap


Intuit: How to Drive Grassroots AI Innovation? Tap into a Diversity of Ideas

Bottom Up innovation is the best way to fuel and iterate a company's growth. Intuit writes about six steps to drive grassroots innovations. Seeing 74% of innovation paper submissions from IC (Individual Contributors)Engineers is impressive.

https://medium.com/intuit-engineering/how-to-drive-grassroots-ai-innovation-tap-into-a-diversity-of-ideas-f2e1ed6258e6


Murat Demirbas: SQLite: Past, Present, and Future

SQLLite is reaching the browser. I can’t wait to try analytics on edge.

Twitter avatar for @ChromiumDev
Chrome Developers @ChromiumDev
@tomayac Yes, the plan is to eventually remove Web SQL, but 🥁 our intention is to empower developers to create their own solutions for structured storage, and we're therefore working with the #SQLite team to create a SQLite implementation over Wasm. This solution will replace Web SQL 💪!
10:33 PM ∙ Aug 31, 2022
296Likes72Retweets

The author discusses the SQLite architecture, transaction guarantees in SQLite, and what is ahead of SQLite in the near future.

https://muratbuffalo.blogspot.com/2022/09/sqlite-past-present-and-future.html


Sponsored: Soda - Podcast: Data Mesh in Practice

Max Schultze, Data Engineering Manager at Zalando, and Prof. Dr. Arif Wider, Professor of Software Engineering at HTW Berlin, share their experience in bringing forward the practical side of data mesh from an engineer's perspective and answer challenging questions that tackle some of the common misconceptions of putting data mesh into practice.

https://directory.libsyn.com/episode/index/id/24095136


Robin Moffatt: Data Engineering in 2022: Storage and Access

Looking back at history and comparing the current state is always good. It is an exciting time for data engineering with the significant investment and progress in storing and querying data. The author compares the days of Hadoop/ HDFS to the LakeHouse architecture and progress made in data infrastructure.

https://rmoff.net/2022/09/14/data-engineering-in-2022-storage-and-access/


Dr.Vijay Srinivas Agneeswaran: Efficient transformers: Survey of recent work

Transformers become standards in NLP tasks such as machine translation, text summarization, question answering, etc. The author published the transformers' categorization based on a survey of efficient transformers. 

  1. Computational complexity

  2. Spectral complexity

  3. Robustness

  4. Privacy

  5. Approximation

  6. Model compression

https://medium.com/data-science-at-microsoft/efficient-transformers-survey-of-recent-work-75022cddc86a


Sponsored: Rudderstack - Better Customer Data Integration Management For Growing Teams

In this piece, Ben Rogojan outlines your options for solving data integration challenges as your company grows: building a scalable framework or architecting a stack with the right tools. Check it out for some practical advice on which approach to take.

https://www.rudderstack.com/blog/better-customer-data-integration-management-for-growing-teams


Slack: Recommend API - Unified end-to-end machine learning infrastructure to generate recommendations

Slack writes about its unified end-to-end machine learning infrastructure to generate recommendations. The article highlights some product experiences where machine learning provides a rich experience. The article is a classic example of how to use ML to drive product features and growth.

https://slack.engineering/recommend-api/


Netflix: Machine Learning for Fraud Detection in Streaming Services

Netflix switched from account sharing okay in 2016 to crack down on account sharing in 2022. I believe the article is the first gist of how fraud detection work behind the scene.

https://netflixtechblog.medium.com/machine-learning-for-fraud-detection-in-streaming-services-b0b4ef3be3f6


Picnic: MLOps Principles to build Picnic’s Data Science Platform

Every piece of infrastructure is driven by the basic principle of an organization toward achieving business goals. Picnic writes about its principles for building an internal data science platform.

https://blog.picnic.nl/mlops-principles-to-build-picnics-data-science-platform-851cbe2e8045


Marc Kelechava: Monitoring machine learning systems at Faire

Faire writes about its real-time ranking feature, challenges in monitoring real-time ranking model evaluation metrics in near real-time, and anomaly detection on critical metrics. The blog is an excellent read on how to build a reactive system to improve operational efficiency.

https://craft.faire.com/monitoring-machine-learning-systems-at-faire-6d5f8337e9e7


All rights reserved Pixel Impex Inc, India. Links are provided for informational purposes and do not imply endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.

Share this post

Data Engineering Weekly #102

www.dataengineeringweekly.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Ananth Packkildurai
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing