Data Engineering Weekly
Data Engineering Weekly
Knowledge, Metrics, and AI: Rethinking the Semantic Layer with David Jayatillake
4
0:00
-41:34

Knowledge, Metrics, and AI: Rethinking the Semantic Layer with David Jayatillake

From BI Lock-In to Invisible Semantics – The Future of the Semantic Layer
4

Semantic layers have been with us for decades—sometimes buried inside BI tools, living in analysts’ heads. But as data complexity grows and AI pushes its way into the stack, the conversation is shifting. In a recent conversation with David Jayatillake, a long-time data leader with experience at Cube, Delphi Labs, and multiple startups, we explored how semantic layers move from BI lock-in to invisible, AI-driven infrastructure—and why that matters for the future of metrics and knowledge management.


What Exactly Is a Semantic Layer?

Every company already has a semantic layer. Sometimes it’s software; sometimes it’s in people’s heads. When an analyst translates a stakeholder’s question into SQL, they’re acting as a human semantic layer. A software semantic layer encodes this process so SQL is generated consistently and automatically.

David’s definition is sharp: a semantic layer is a knowledge graph plus a compiler. The knowledge graph stores entities, metrics, and relationships; the compiler translates requests into SQL.


From BI Tools to Independent Layers

BI tools were the first place semantic layers showed up: Business Objects, SSAS, Looker, and Power BI. This works fine for smaller orgs, but quickly creates vendor lock-in for enterprises juggling multiple BI tools and warehouses.

Independent semantic layers emerged to solve this. By abstracting the logic outside BI, companies can ensure consistency across Tableau, Power BI, Excel, and even embedded analytics in customer-facing products. Tools like Cube and DBT metrics aim to play that role.


Why Are They Hard to Maintain?

The theory is elegant: define once, use everywhere. But two big issues keep surfacing:

  1. Constant change. Business definitions evolve. A revenue formula that works today may be obsolete tomorrow.

  2. Standardization. Each vendor proposes their standard—DBT metrics, LookML, Malloy. History tells us one “universal” standard usually spawns another to unify the rest.

Performance complicates things further—BI vendors optimize their compilers differently, making interoperability tricky.


Culture and Team Ownership

A semantic layer is useless without cultural buy-in. Product teams must emit clean events and define success metrics. Without it, the semantic layer starves.

Ownership varies: sometimes product engineering owns it end-to-end with embedded data engineers; other times, central data teams or hybrid models step in. What matters is aligning metrics with product outcomes.


Data Models vs. Semantic Layers

Dimensional modeling (Kimball, Data Vault) makes data neat and joinable. But models alone don’t enforce consistent definitions. Without a semantic layer, organizations drift into “multiple versions of the truth.”


Beyond Metrics: Metric Trees

Semantic layers can also encode metric trees—hierarchies explaining why a metric changed. Example: revenue = ACV × deals. If revenue drops, metric trees help trace whether ACV or deal count is responsible. This goes beyond simple dimension slicing and powers real root cause analysis.


Where AI Changes the Game

Maintaining semantic layers has always been their weak point. AI changes that:

  • Dynamic extensions: AI can generate new metrics on demand.

  • Governance by design: Instead of hallucinating answers, AI can admit “I don’t know” or propose a new definition.

  • Invisible semantics: Users query in natural language, and AI maintains and optimizes the semantic layer behind the scenes.

Executives demanding “AI access to data” are accelerating this shift. Text-to-SQL alone fails without semantic context. With a semantic layer, AI can deliver governed, consistent answers instantly.


Standardization Might Not Matter

Will the industry settle on a single semantic standard? Maybe not—and that’s okay. Standards like Model Context Protocol (MCP) allow AI to translate across formats. SQL remains the execution layer, while semantics bridge business logic. Cube, DBT, Malloy, or Databricks metric views can all coexist if AI smooths the edges.


When Do You Need One?

Two clear signals:

  1. Inconsistency: Teams struggle to agree on fundamental metrics such as revenue or churn.

  2. Speed: Stakeholders wait weeks for analyst queries that could be answered in seconds with semantic + AI.

If either pain point resonates, it’s time to consider a semantic layer.


Looking Ahead

David sees three big shifts coming soon:

  • Iceberg is a universal storage format—true multi-engine querying across DuckDB, Databricks, and others.

  • Invisible semantics. Baked into tools, maintained by AI, no more “selling” semantic layers.

  • AI-native access. Semantic layers are the primary interface between humans, AI, and data.


Final Thoughts

Semantic layers aren’t new—they’ve quietly powered BI tools and lived in analysts’ heads for years. What’s new is the urgency: executives want AI to answer questions instantly, and that requires a consistent semantic foundation. As David Jayatillake reminds us, the journey is from BI lock-in to invisible semantics—semantic layers that are dynamic, governed, and maintained by AI. The question is no longer if your organization needs one, but when you’ll make the shift—and whether your semantic layer will keep pace with the AI-driven future of data.

Discussion about this episode

User's avatar