Try Fully Managed Apache Airflow for FREE
Astro is the fully-managed DataOps platform powered by Apache Airflow. With Astro, you can build, run, and observe your data pipelines in one place, ensuring your mission critical data is delivered on time.
JetBrains: State of Developer Ecosystem Report 2024
JetBrains published its annual developer survey, and there is tons of insight on the developer adoption of various programming languages. TypeScript, Python, and Rust are the fastest-growing programming languages, whereas others hold their position as it is.
https://www.jetbrains.com/lp/devecosystem-2024/
Christina Garcia: AI Agents Survey Results
“Agents all the way” is a popular prediction for 2025. This blog captures the current state of Agent adoption, emerging software engineering roles, and the use case category.
https://yougot.us/news/2024-12-28-AI-Agents-Survey-Results
Chip Huyan: Agents
A comprehensive overview of what an agent is, a description of the environment and the tools, the type of tools and their impact, and so on. If you are starting agent development, this is a must-read article.
https://huyenchip.com//2025/01/07/agents.html
Sponsored: Apache Airflow® Best Practices: Running Airflow at Scale
The scalability of Airflow is why data teams at companies like Uber, Ford, and LinkedIn choose it to power their data ops. Learn practical strategies to optimize Airflow performance and streamline operations:
- Fine-tune configurations to enhance workflow efficiency
- Automate Airflow deployments and manage users seamlessly
- Monitor system health with advanced observability tools and alerts
Join this live session and learn how to scale Airflow efficiently.
Chirag Shah & Ryen W. White: Agents Are Not Enough
The paper comes at an interesting time when agents took center stage. It argues that agents alone are insufficient for widespread success due to limited value generation, a lack of adaptable personalization, trustworthiness concerns, social unacceptability, and a lack of standardization. To address these shortcomings, the authors propose a new ecosystem that includes Sims (user representations) and Assistants (user-facing programs) alongside agents, facilitating enhanced privacy, personalization, and interaction.
https://arxiv.org/pdf/2412.16241
InfoQ: Key Takeaways from QCon & InfoQ Dev Summits with a Look ahead to 2025 Conferences
The conferences are a great way to interact and explore new ideas. I always like a good overview of the conference's learning. I wish to attend QCon, but the price is too high. I’m glad to see an excellent overview of 2024 QCon.
https://www.infoq.com/news/2024/12/takeaways-qcon-infoq-dev-summits/
Netflix: A Survey of Analytics Engineering Work at Netflix
Netflix publishes the third edition of its internal analytical summit talks. The third part focuses on Dashboard Design Tips, Learnings from Deploying an Analytics API at Netflix, and the guest talk from Benn Stancil. You can read Part 1 here and Part 2 here.
https://netflixtechblog.com/part-3-a-survey-of-analytics-engineering-work-at-netflix-e67f0aa82183
Uber: How Uber Uses Ray® to Optimize the Rides Business
Uber writes about a hybrid Spark and Ray system to optimize the budget allocation of its ride-sharing business. Facing performance bottlenecks with their existing Spark-based system, Uber leveraged Ray's Python parallel processing capabilities for significant speed improvements (up to 40x) in their optimization algorithms.
https://www.uber.com/blog/how-uber-uses-ray-to-optimize-the-rides-business/
Canva: The foundations of Canva’s continuous data platform with Snowpipe Streaming
Canva writes about its migration from AWS Data Firehose to Snowpipe Streaming, driven by the need to reduce costs, which consume nearly 50% of its data platform budget. Despite implementation challenges with Kinesis integration, the switch proved successful, resulting in a 45% reduction in cloud spending, enhanced query performance, and eliminated intermediary S3 storage requirements.
https://www.canva.dev/blog/engineering/snowpipe-streaming/
Gradient Flow: Paradigm Shifts in Data Processing for the Generative AI Era
data processing pipelines haven't kept pace with the rapid advancement of AI models
The article highlights the growing importance of preprocessing data pipelines, but the pipeline processing techniques do not match the demand. Generative AI demands the processing of vast amounts of diverse, unstructured data (e.g., meeting recordings and videos), which contrasts with traditional SQL-centric systems for structured data. The fundamental shift from traditional SQL-centric to AI-centric data processing further widened the efficiency gap.
https://gradientflow.substack.com/p/paradigm-shifts-in-data-processing
Jack Vanlightly: Table format interoperability, future or fantasy?
Last week, I had an interesting conversation with Vinoth Chandar, founder and CEO of Onehouse. Vinoth made excellent points on the challenges of adopting an open table format and the need for the format to move towards a LakeDB. Apache XTable and Delta Lake UniForm are attempting to provide interoperability among table formats, but is that a reality? The author rightly questions if cross-publishing has a long-term benefit, as mere metadata copying won’t provide the exact performance.
https://jack-vanlightly.com/blog/2024/9/26/table-format-interoperability-future-or-fantasy
All rights reserved ProtoGrowth Inc, India. I have provided links for informational purposes and do not suggest endorsement. All views expressed in this newsletter are my own and do not represent current, former, or future employer” opinions.