Foundation Capital: A System of Agents brings Service-as-Software to life
software is no longer simply a tool for organizing work; software becomes the worker itself, capable of understanding, executing, and improving upon traditionally human-delivered services.
The author narrates that multiple agents working together achieve better results than one. It is certainly an exciting phase in software development.
https://foundationcapital.com/system-of-agents/
Airbnb: Automation Platform v2 - Improving Conversational AI at Airbnb
Every workflow software tries to incorporate Gen AI in some form. However, it is important to know which part of the workflow of the Gen AI will be powerful and where it is not. Airbnb discusses the trade-off between traditional and AI-driven workflow and narrates how it incorporated Gen AI to improve conversational experience.
Shimin Zhang: Evaluating LLM-based chatbots: A comprehensive guide to performance metrics
One of the industry's obvious questions is how one can evaluate chatbots using the non-deterministic nature of LLM.
The author narrates seven key areas to measure the performance of a chatbot.
Search Performance for RAG-based Chatbots
Response Quality
User Engagement Metrics
Latency and Performance
Error Handling and Robustness
Scalability and Resource Utilization
Security and Privacy Compliance
Event Alert: IMPACT Summit
If you haven't registered for the IMPACT Summit yet, now's the perfect time 🔈
Here’s what we’ve got in store:
- A half-day virtual event created to elevate your 2025 data strategy
- Sessions jam-packed with industry experts sharing how they're driving data and AI adoption
- Practical tips and best practices from Monte Carlo customers
- Opportunities to connect and network with other data professionals
- Giveaways and raffles for attendees, including three All-Access subscriptions to DataExpert.io!
- And more!
What are you waiting for? Register for IMPACT today!
Thumbtack: What we learned building an ML infrastructure team at Thumbtack
Thumbtack shares valuable insights from building its ML infrastructure team. The blog emphasizes the importance of starting with a clear client focus to avoid over-engineering and ensure user-centric development. Strategic pauses and adaptability are crucial for delivering timely, relevant solutions, while transparent communication builds stakeholder trust. Finally, a dedicated, flexible team drives momentum and expertise, balancing efficiency with selective commitments.
CapitalOne: Serverless ML - Lessons from Capital One
CapitalOne writes about its experience building Serverless ML on top of AWS Lambda. The challenges around memory, data size, and runtime are exciting to read. Sampling is an obvious strategy for data size, but the layered approach and dynamic inclusion of dependencies are some key techniques I learned with the case study.
https://medium.com/capital-one-tech/serverless-ml-lessons-from-capital-one-4b262f848e25
Event Alert: MLOps World/ Gen AI World - Austin, TX - Nov 7-8
The Gen AI Summit, consisting of a wider group of 20,000 Engineers, AI entrepreneurs, and Scientists, will host 1,000 AI teams in Austin, TX, November 7-8. Join for two days of sessions, socials, case studies, and workshop tutorials. Passes include app-brain-date networking, birds of a feature, post-event parties, etc. 60+ speakers from LinkedIn, Shopify, Amazon, Lyft, Grammarly, Mistral, et al.
Data Engineering Weekly readers get 15% discount by registering the following link,
Gustavo Akashi: Building data pipelines effortlessly with a DAG Builder for Apache Airflow
Every code-first data workflow grew into a UI-based or Yaml-based workflow.
I believe this is the hard truth, as we have seen repeatedly in the industry with several internal tools from various companies. The author writes about the yaml abstraction to build Airflow DAG to simplify the DAG builder experience.
Gunnar Morling: Revisiting the Outbox Pattern
The blog is an excellent summary of the path we crossed with the outbox pattern and the challenges ahead. Though the outbox pattern provides many benefits in integrating event-driven architecture, the system's complexity is undeniable. It's good to know about Dapr and restate.dev.
https://www.decodable.co/blog/revisiting-the-outbox-pattern
Uber: Enabling Infinite Retention for Upsert Tables in Apache Pinot
Uber writes about the implementation details of Pinot’s upsert operation, with the newer deletion support and the challenges in maintaining the in-memory hashmap that maps Record-Primary keys to Record locations. Pinot internally maintains a state of Primary Key → distinct-segment-count in the redesign. This new feature tracks how many segments a record for a given primary key exists in. This count helps to ensure data consistency when deleting and compacting segments.
For example, if the count is less than or equal to 1, Pinot allows the deletion of metadata on the record. Pinot then marks the validDocId as invalid, allowing for the compaction of the deleted record and ensuring the removal of records in other segments.
https://www.uber.com/blog/enabling-infinite-retention-for-upsert-tables/
ClickHouse: How we built a new powerful JSON data type for ClickHouse
ClickHouse discusses the challenges of efficiently storing and processing semi-structured JSON data in a column-oriented database and introduces two key building blocks: the Variable and Dynamic types. The article details how these building blocks are used to implement the JSON type, which provides support for dynamically changing data, high-performance storage, scalability, and tuning options. It also showcases the advantages of the JSON type in terms of data compression and query performance and outlines the roadmap for future enhancements.
https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse
All rights reserved ProtoGrowth Inc, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.