Wix: The Emerging Economy of LLMs
The LLM, known as the token economy, is emerging as an agent-driven workflow tool across the industry. The author narrates why tokens are the new currency in the LLM economy.
I recently took a survey from a productivity tool I paid for, asking if I’m willing to pay double the subscription cost to use the LLM feature. I was like, hell no. From a consumer perspective, I want to pay the same but expect a much better experience. I’m still on the edge of the LLM economy, but I'm optimistic the tools will be LLM-driven.
https://medium.com/wix-engineering/the-emerging-economy-of-llms-883f2ab13067
Apple: GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Can LLM develop mathematical reasoning capabilities? The paper from Apple evaluated the current leading LLMs and said no, it can’t. Key reasons for them are,
LLM relies on probabilistic pattern matching; hence, instead of understanding the underlying mathematical concepts, LLMs might simply replicate patterns they observed in their training data.
Small changes in input tokens can significantly alter model outputs, revealing token bias and fragility.
LLM performance deteriorates with increased complexity.
https://machinelearning.apple.com/research/gsm-symbolic
Joe Reis: Field Notes, Early Fall 2024 Edition
Joe Reis provides a great overview of what is happening in the data industry. LLM is entering the PoC phase; people are still confused but in the upskilling phase. A key highlight for this is 👇🏼
Data’s still a mess. Most data initiatives fail. Data teams are seen as a cost center and not getting the support they deserve. Same as it ever was.
Sponsored: IMPACT Summit
If you haven't registered for the IMPACT Summit yet, now's the perfect time 🔈
Here’s what we’ve got in store:
- A half-day virtual event created to elevate your 2025 data strategy
- Sessions jam-packed with industry experts sharing how they're driving data and AI adoption
- Practical tips and best practices from Monte Carlo customers
- Opportunities to connect and network with other data professionals
- Giveaways and raffles for attendees, including three All-Access subscriptions to DataExpert.io!
- And more!
What are you waiting for? Register for IMPACT today!
Uber: Genie - Uber’s Gen AI On-Call Copilot
Internal support is a disrupting but essential part of building a successful platform. Uber writes about Genie, a Gen AI on-call Copilot. Genie addresses these challenges by providing quick and accurate answers to questions, retrieving relevant information from internal knowledge bases, and reducing the need for constant interaction with on-call engineers.
https://www.uber.com/blog/genie-ubers-gen-ai-on-call-copilot/
Grab: Leveraging RAG-powered LLMs for Analytical Tasks
Grab writes about Data-Arks, an internal platform that houses frequently used SQL queries and Python functions. Data-Arks serves as a vital component in integrating Large Language Models (LLMs) into the analytics workflow, streamlining processes like generating regular metric reports and conducting fraud investigations
https://engineering.grab.com/transforming-the-analytics-landscape-with-RAG-powered-LLM.
Jack Vanlightly: Table format comparisons - Change queries and CDC
Incremental data processing is vital for an efficient and cost-effective data infrastructure. The author categorizes these queries into four types: append-only, upsert, min-delta (CDC), and full-delta (CDC). The article explores how each table format handles these queries, analyzing their strengths and limitations.
https://jack-vanlightly.com/blog/2024/9/19/table-format-comparisons-change-queries-and-cdc
Lak Lakshmanan: What goes into bronze, silver, and gold layers of a medallion data architecture?
If I understand correctly, the gist of the article is where you position the common data model/ metrics that can be used across the organization. I think these layers are a guiding principle instead of a strict framework. The common data models are considered the “core” domain, which is itself a kind of data mart. The article is a good reminder to focus on the “sharable core domain” in data modeling regardless of whether or not to expand the medallion architecture.
Expedia: Enhancing Data Reliability With An SLO Platform
Expedia Group Technology designed a new SLO platform to enhance data reliability, leveraging Kafka for event streaming, PostgreSQL for data storage, and APIs for querying. The platform efficiently ingests and enriches data from multiple sources with internal metadata, providing near real-time access and seamless integration with DataDog for proactive monitoring and real-time alerting.
https://medium.com/expedia-group-tech/enhancing-data-reliability-with-an-slo-platform-de00249756f6
GumGum: Boosting Batch Scoring Efficiency with BigQuery ML and ONNX
GumGum’s data engineering team optimized batch scoring by integrating BigQuery ML with ONNX, streamlining a previously complex workflow. Moving scoring directly into BigQuery eliminated the need for external Python-based containers, reducing both time and costs. This solution leverages Scikit-Learn models in ONNX format, allowing efficient, SQL-based batch scoring directly in BigQuery, significantly improving scoring performance on large datasets.
All rights reserved ProtoGrowth Inc, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.