TL;DR. The semantic layer covers what a metric is. The context layer covers why anyone defined it that way. Most teams have the first. Few have the second. At the dinner we hosted with Orchestra last week, fifteen data leaders agreed the gap is real, and split twice on how to close it.
“dbt descriptions are great way to describe what you’re doing but useless for explaining why.”
That came up at a dinner we co-hosted with Orchestra in May, at The Ivy Granary Square in King’s Cross. Fifteen people around one table, a good mix of CTOs and data leaders from European scale-ups alongside heads of data from larger corporates. No slides, no pitches, Chatham House rules.
Two months ago, after the Omni dinner, we landed on a three-layer model and called the third one, the context layer, the one nobody had cracked. Since then we’ve gone deep on the second layer, in where the semantic layer should actually live. We thought last week’s dinner would close the loop on the third layer.
It didn’t. The room agreed on the problem and split on the answer, twice. This piece walks through both splits and the open questions underneath them.

Where context lives today
My take: nobody has built a context layer yet. Not really.
The context lives in the 500-page Notion nobody reads, the 50 Slack channels with metric definitions buried in DMs, and the data team’s heads. Every senior analyst is a context store with legs. They leave, it leaves. Sometimes it sits in missing notes from an argument 18 months ago between finance and growth, and there’s nothing to retrieve.
Two cleaner ways the night reframed this for me.
First, ours: treat documentation as the thing someone wrote down at the time. Context on the other hand is what they actually meant.
Second, Hugo Lu put the same point from the engineering side. Every team running dbt Core has thousands of model descriptions nobody reads. That’s metadata pretending to be a context layer. Piling everything into description blocks is an excuse, not a strategy.
This is why “put it in dbt descriptions” keeps coming up as the wrong answer. dbt descriptions are fine for what. They are useless for why. The why is what the AI agent will get wrong, and what the new starter will spend three months reverse-engineering. Benn Stancil makes the same point: you can’t onboard an analyst by giving them logins to a bunch of tools, and you can’t make a good AI bot by giving it the same access. Both need to be taught how the business works.
Fault line one: is tacit context a feature or a bug?
This is where Tasman and Orchestra split (only a little bit, promised!)
What an engineer might say: the only context an agent can use is context expressed as code. The rest is roleplay. If your context can’t be tested, your agent can’t be trusted with it. Anything that lives in someone’s head, in a Slack thread, in a half-written Notion page is institutional folklore, not infrastructure.
Tasman’s position: a lot of context is tacit and social. It lives between people, not in systems. Trying to encode all of it as code is the wrong goal. Once you encode it, you lose what you wanted in the first place: the trade-offs, the disagreements, the parts people deliberately left ambiguous. Some context belongs in code. Some belongs in a decision log. Some belongs in a five-minute conversation that nobody bothered to minute.
Both views work in different conditions. Both have failure modes we’ve watched.
The “context-as-code or it doesn’t count” failure mode: brittle, exhaustive, and nine months out of date because nobody refactored when the business shifted. We’ve seen Atlan setups that were technically immaculate and three months stale.
The “context is social, leave it tacit” failure mode: your senior analyst leaves, half the institutional knowledge leaves with her, and the AI agent confidently fills the gap with whatever it inferred from the schema.
Where we’re hedging. Our working bet, without enough evidence to defend it yet, is that the answer is layered. A thin, tested, code-expressed layer for the metrics that matter most: LTV, CAC, gross margin, active user. A lightweight written layer (decision logs, metric changelogs) for the surrounding why. And an explicit acceptance that some context will stay social, with the cost of that recorded on the risk register rather than hidden. Atlan’s context graphs and Omni’s ai_context fields are both early attempts at the code-expressed end. Neither solves the social bit. We haven’t met anyone who has.
How this breaks AI in production
The worked example that came up most often.
Someone asks an agent about LTV. It returns an answer. It is confident. It is wrong.
The agent isn’t wrong because it doesn’t understand the data. The schema is fine, the SQL is fine, the semantic layer may even be fine. It is wrong because it doesn’t know about the argument 18 months ago between finance and growth, and which side won. It doesn’t know that marketing reverted to the old definition last quarter and never updated the dashboard label. It doesn’t know that the CFO no longer trusts any LTV number that includes the trial cohort.
The semantic layer encodes the what. The why (the trade-off, the decision, the side that won) lives somewhere else, or nowhere.
A useful reframe from the room: vague metric definitions are an inconvenience to humans. They are dangerous to agents. Humans hedge, ask, double-check. Agents commit. Hugo’s version: if your context can’t be tested, your agent can’t be trusted with it. The agent resolves the ambiguity without flagging it, and you read the wrong number in a board pack a week later.
This is the failure pattern a16z wrote up and the one we’ve seen across more than sixty client engagements. The data is fine. The models are fine. The strategy is fine. The why behind each metric definition isn’t written down anywhere.
Where we’re hedging. Better models don’t fix this. More context does. Add it to the prompt, to the scaffolding, to the systems the agent can touch. Heavier-scoped agents outperform general-purpose ones. Constraints help. Freedom doesn’t. None of that gets you to a working context layer on its own. It stops the failure mode from being catastrophic while you build one.

Fault line two: what does “production” even mean?
The second topic of the evening, and the second place we didn’t agree.
Orchestra has the engineering rigorous position: production is whatever your control plane can see and react to. Three things: it runs, it tells you when it doesn’t, and you can roll it back. Most stacks calling themselves production fail on at least two. Agentic makes it worse: non-deterministic outputs, no rollback, observability built for a world where the same inputs produced the same outputs. “Users haven’t complained” is how data teams get blindsided. Regulated industries are ahead on this because they get fined when they’re not.
I think at Tasman we’ve learned over the years that the engineering side is true, but production is also a trust property. A pipeline isn’t in production. A relationship between the data team and the business is in production. The pipeline is plumbing. Going live isn’t shipping a dashboard. Going live is when someone makes a decision differently because of the dashboard, without going back to the analyst to double-check. Most companies have thirty “production” pipelines and three numbers they trust.
Both ladder up to the same conclusion. Most things people call production aren’t.
The implications diverge. The engineering frame says invest in observability, contracts, rollback, testing: the control plane. Ours says invest in the business relationship: the brief, the metric-definition workflow, the analyst on the call when the number comes up. In practice you need both. Most teams have neither. Start with whichever gap is biting hardest.
Where we’re hedging. Nobody knows how to test agentic outputs in a way that scales. Hugo’s three-things rule is hard enough for deterministic pipelines. For agents, where the same input produces a different answer next Tuesday, it gets harder still. “AI in production” is a term we’re using because we don’t have a better one, not because anyone in the room knew what it meant.
What we’re testing over the next few months
We’re trying three things with clients right now.
The first is a lightweight rules.md-style file dropped into the agent’s context window. Crude, but the Nao team’s benchmarks suggest a simple rules file outperforms more formal setups. We’ve seen the same.
The second is embedding ai_context fields directly in the semantic layer where the BI tool supports it. This is the Omni pattern. This works well when the analytics team already lives in the tool, which is the argument we made in the semantic layer location piece.
The third is treating the metric-definition workflow as the context-capture moment. PR template, decision log, metric changelog. The least sexy of the three. The most durable, because nobody has to remember to write things down separately.
None of these are the answer. They’re testable hypotheses, and a few will turn out to be wrong.
The bigger bet, which we won’t be able to validate this year: AI deployment costs will make the why-behind-the-metric expensive enough that organisations write it down. Hex’s State of Data Teams 2026 puts data trust as the number-one concern around AI adoption. Whether that pressure overcomes the documentation-incentive problem is the question that decides which approach scales.
Closing thought
Documentation is what someone wrote down at the time. Context is what they meant. We have good tools for the first. We have almost nothing for the second. That is the work for the next few years. The teams that build it well will be the ones whose AI analytics gets used in earnest rather than admired in pilot.
Thanks to Hugo and the Orchestra team for co-hosting, and to everyone who came. We’re running more of these dinners through 2026. If you’d like to be on the list for the next one, get in touch.

Frequently asked
What is the context layer in data analytics?
The context layer captures why a metric is defined the way it is, who defined it, what changed and when, and what the disagreements were. It sits above the semantic layer, which captures the what. Most teams have a partial semantic layer. Few have a context layer at all.
Why does AI need a context layer?
An AI agent answering a question about a metric doesn’t know about the argument 18 months ago between finance and growth, or which side won. Without that context, the agent infers a meaning, runs the SQL, and returns a confident wrong answer. Better models don’t fix this. More context does.
Who owns the context layer?
Nobody owns it by default, and that’s the problem. At Tasman we argue the analytics team should own the semantic layer because they’re on the hook when a definition drifts. The context layer is harder because the why behind a metric isn’t an analytics artefact, it’s a business artefact. Our working bet is on a workflow-based answer: PR templates, decision logs, metric changelogs. We don’t have a settled view yet.
What does AI in production mean for a data team?
It depends who you ask. Orchestra’s view: production is whatever the control plane can see and react to (it runs, it tells you when it doesn’t, you can roll it back). Tasman’s view: production is a trust property (the business uses the output without re-checking). Both views agree that most things called production aren’t.