Discover more from Data Engineering Weekly
Data Engineering Weekly #126
The Weekly Data Engineering Newsletter
Data Engineering Weekly Is Brought to You by RudderStack
RudderStack provides data pipelines that make collecting data from every application, website, and SaaS platform easy, then activating it in your warehouse and business tools. Sign up free to test out the tool today.
Editor’s Note: One of the Edition talks about Generative AI & Event Tracking and Application Change Management
If you notice this week’s edition, most of the article talks about AI, specifically Generative AI. It’s purely accidental, and I’m amazed the pattern emerged while curating the article. 😀
While writing An Engineering Guide to Data Creation - A Data Contract Perspective, I thought a bit more about the application code changes and their impact on click stream events.
Application code changes too often, but click stream tracking event structure remains the same. Application developers can accidentally remove the click stream tracking event at any point in application code changes.
I’m so curious to know your take on this problem. Please vote and share your experience.
Stanford HAI: 2023 AI Index Report
Stanford HAI publishes the 2023 AI index report. Key highlights outlined by the report
Industry races ahead of academia.
Performance saturation on traditional benchmarks.
AI is both helping and harming the environment.
The world’s best new scientist … AI?
The number of incidents concerning the misuse of AI is rapidly rising.
The demand for AI-related professional skills is increasing across virtually every American industrial sector.
For the first time in the last decade, year-over-year private investment in AI decreased.
While the proportion of companies adopting AI has plateaued, the companies that have adopted AI continue to pull ahead.
Policymaker interest in AI is on the rise.
Chinese citizens are among those who feel the most positively about AI products and services. Americans...not so much.
AI trustability and its misuse will continue to be debated in the coming years. The Google research paper "Because AI is 100% right and safe": User Attitudes and Sources of AI Authority in India describes the challenges in AI & trustability.
Download the full report: https://aiindex.stanford.edu/wp-content/uploads/2023/04/HAI_AI-Index-Report_2023.pdf.
Confused bit: Simply explained: how does GPT work?
The article explains how GPT works in a simplified way taking from a naive probabilistic model of words, words to meaning and meaning to the relationship. The blog also discusses two burning questions everyone is debating.
Can the Model Think?
Will GPT destroy society?
I will leave the readers to think about these two burning questions. Personally, AI spreading misinformation amplifies human tribalism is much more troubling than the Terminator scenario.
MIT Tech Review: ChatGPT is about to revolutionize the economy. We need to decide what that looks like.
The Generative AI’s Gold Rush is here, and it’s just the beginning. ChatGPT and other recently released generative AI models promise to automate tasks previously thought to be solely in human creativity and reasoning, from writing to creating graphics to summarizing and analyzing data. The article discusses the positive and the negative side of AI’s Gold Rush.
LinkedIn: Our Learnings from the Early Days of Generative AI
LinkedIn writes about its learning from adopting GenerativeAI into the product features. I’m sure many companies trying to explore what is Generative AI means to their product. LinkedIn rightly pointed out how tooling and experimental engineering culture are key to democratizing new technologies in an engineering organization.
Sponsored: [Virtual Data Panel] Measuring Data Team ROI
As data leaders, one of our top priorities is to measure ROI. From tracking the efficacy of marketing campaigns to understanding the root cause of new spikes in user engagement, we’re tasked with keeping tabs on the health of the business at all levels. But what about the ROI of our own teams? Watch a panel of data leaders as they discuss how to build strategies for measuring data team ROI.
Meta: How Meta measures the management of its AI ecosystem
AI development ecosystems are increasingly complex and challenging to maintain, and technology companies need to develop highly efficient systems to build, serve, and improve their AI models for production applications. Meta writes about the measurement process it developed internally for specific metrics about AI systems to make managing the models effective and efficient.
Mitchell Hashimoto: Growth of AI Through a Cloud Lens
The author brings an interesting perspective on the growth of AI in comparison with AWS's growth. The author highlights
AI -- in particular, the advancements in large language models (LLMs) -- is starting to feel like the beginning of another platform shift. This isn't a shift from the cloud; it is a platform shift within a different category, but it has the same potential to change how we build and deliver software fundamentally.
Sponsored: Warehouse-first analytics and experimentation with RudderStack and Eppo
Find out how Phantom transitioned from siloed analytics to a warehouse-first stack that enables A/B experimentation directly on top of the data warehouse. You'll learn from Eppo founder Chetan Sharma, RudderStack DevRel leader Sara Mashfej, and Phantom Senior Data Engineer Ricardo Pinho.
Kostas Pardalis: MLOps is Mostly Data Engineering
The author takes an in-depth analysis of why MLOps is mostly data engineering. The author broadly split the MLOps into four categories and explains the relevance of data engineering.
Deployment & Serving of models, i.e., OctoML
Model Quality and Monitoring, i.e., Weights & Biases
Model training, i.e. AWS Sagemaker
Feature Stores, i.e., Tecton
Lyft: The Recommendation System at Lyft
Lyft writes about its use cases for the recommendation system by leveraging a set of machine learning models to predict a rider’s propensity to convert into each mode and customize the rankings based on it.
Conference Alert: Shape the future of real-time analytics
The Real-Time Analytic Summit is on April 25-26 in downtown San Francisco, CA. Come and hear talks from companies like StarTree, Confluent, LinkedIn, DoorDash, Imply, and Uber on how they are advancing the state-of-the-art in user-facing analytics delivered instantly.
Go to rtasummit.com and register with DEW30 for 30% off.
GoCardless: 3 Things Our Software Engineers Love About Data Contracts
There are many debates about the data contract can bring complexity to the developer workflow. The author writes about why GoCardless engineers love the data contract. The author narrates the developer experience in three categories of delight
Freebies [ The pub-sub system and DLQ]
Autonomy [ decision on the structure of the data internal to the development team]
Golden Path [Best practices enforced out of the box]
Pardis Noorzad: The state of data exchange
Okay, folks, enough talking about Generative AI and its impact. The author asks a real question. Can we get rid of SFTP? Possible?
The author writes an excellent summary of the state of intra-company data exchanges and the complexity associated with them.
Paul Kinsvater: Metrics Store in Action
Looker recently announced the Looker Modeler, a standalone metric layer. It is exciting to see the innovation around the metrics store is accelerating, and I can see many application-driven startups coming on top of it. The author demonstrates how to build a metric store using MetricFlow, Python, DuckDB, dbt, and Streamlit.
All rights reserved ProtoGrowth Inc, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employers’ opinions.