dbt: 2024 State of Analytics Engineering
The 2024 dbt’s state of analytical engineering report is out. Poor data quality and unlcear data ownership remains the top challenges for the data teams. Data Mesh continuously gaining popularity among the enterprises. It is a stark difference from the Gartner report about data mesh. I guess only the time will tell who wins in the data mesh vs data fabric war.
https://www.getdbt.com/resources/reports/state-of-analytics-engineering-2024
Matt Turck: Full Steam Ahead: The 2024 MAD (Machine Learning, AI & Data) Landscape
Coninue the week of insights into the world of data & AI landscape, the 2024 MAD landscape is out. The report pointed out the rise of LLM makes the unstructured data more important than ever, pressure to the Modern Data stack will continue to intensify as the cost of integration remains high, and the rise of “Modern AI Stack”
https://mattturck.com/mad2024/
EvalPlus: EvalPlus Leaderboard - EvalPlus evaluates AI Coders with rigorous tests
Will AI replace the coders? What will the future of software engineers be? EvalPlus builds a leadership board to demonstrate the efficiency of leading AI coder models.
https://evalplus.github.io/leaderboard.html
Sponsored: Cloud Academy's Solution to Enhanced Embedded Analytics
Cloud Academy, a SaaS e-learning platform, needed to deliver a seamless, highly available embedded analytics experience for their enterprise customers. They knew they needed a flexible caching layer and zero-downtime deployments.
“With Cube, we’ve been able to speed up time to release a new data model to production by 5x and decrease analytics downtime by 90%. ”
Dive deeper to learn how Cloud Academy sped up data model development and query performance with a semantic layer.
https://cube.dev/case-studies/cloud-academy-and-cube
Chase Roberts: Data Council 2024 - The future data stack is composable, and other hot takes
The author reflects Data Council 2024 conversations, the most popular data conference in the USA. The emerging of composable data stack, and open data stack is certainly an interesting trend to watch. A key highlight for me,
I spoke to multiple data people stuck in legacy systems and still inching their way to the cloud. VCs have moved on from data catalogs, yet practitioners told me they look forward to solving data discovery.
Pinterest: How we built Text-to-SQL at Pinterest
Last week Intuit shared its key learning building Text 2 SQL, and Pinterest publishes the tech deep dive on how its internal Text2SQL work. The highlight for me is,
There is an ongoing table standardization effort at Pinterest to add tiering for the tables. We index only top-tier tables, promoting the use of these higher-quality datasets.
I strongly believe the concept of Data Product will play a bigger role in data engineering. It is evident that it will become the foundation of trusted sources, which is essential to taking advantage of advancements from LLMs.
https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff
Sponsored: Data Integration Leader Virtual Event Feat: Speakers from Doordash, LiveRamp, and Clearwater Analytics
Join us for this free data integration webinar featuring speakers Nikita (Director of Engineering at Doordash), Abhishek (Platform Architect at LiveRamp), and Darrel (Distinguished Engineer at Clearwater Analytics). RSVP now for practical advice on how to overcome complex data integration challenges such as:
Accelerating GenAI, analytics, & data product use cases/initiatives;
Accessing data from various sources and a variety of formats;
Transforming and preparing the data for downstream use cases;
Delivering data to various destinations with high quality and scale.
https://offers.nexla.com/data-integration-leader-series-webinar-04162024-1
Spotify: Data Platform Explained
In the ever-evolving landscape of data-driven decision-making, a well-structured data platform emerges as a critical asset. Spotify shares some of the critical triggers in an organization that leads to build data platform.
https://engineering.atspotify.com/2024/04/data-platform-explained/
Replit: Building LLMs for Code Repair
Replit has developed a native AI model for code repair, leveraging the Language Server Protocol (LSP) diagnostics and operational transformations (OTs) to train a large language model (LLM) that fixes code errors directly within its IDE. This initiative aims to significantly reduce developers' time spent on debugging by improving the AI's understanding and interaction with the development environment. The model is trained using a dataset of code-diagnostic pairs and fine-tuned to predict line diffs that correct LSP-identified errors, showing promising results against larger models and existing benchmarks.
https://blog.replit.com/code-repair
Hussein Jundi: Data Engineering - Architectures & Strategies for Handling Sensitive Data
The rapid adoption of AI brings challenges for data engineering to design systems to handling sensitive data. The author writes a comprehensive article on strategies to handle sensitive data, maturity level of each organizations and how the solution differ for each maturity levels.
Picnic: YAML developers and the declarative data platforms
Forget Modern Data Stack, Have you ever wonder what is Declarative Data Stack? The blog takes an example of SQL as an evidence of the success of a declartive language. For the lack of better wording, we should further classify declartive languages as dynamic and static. SQL is a dynamic declarive language where one can express complex constraints, where YAML pretty much a static rule engine.
https://blog.picnic.nl/yaml-developers-and-the-declarative-data-platforms-4719b7a1311c
All rights reserved ProtoGrowth Inc, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.
Nice work!