Back to Blog
AIAnalyticsData Strategy

The Model Was Never the Hard Part

A frontier model alone answered real analytics questions at 21% accuracy. With context, it jumped past 95%. The number tells you where the value in AI analytics actually lives, and it isn’t the prompt.

Dashfeed Research7 min readJune 5, 2026

The number that matters isn’t 95%

Anthropic recently published how its own data team automated 95% of business analytics queries with Claude, at roughly 95% accuracy. It’s worth reading in full. But the most important figure in the piece isn’t 95%. It’s 21%. By Anthropic’s own measurement, the model on its own, without the surrounding context, didn’t exceed 21% accuracy on their evals. Adding context took it consistently above 95%. Same model, same warehouse, same questions. The only variable was the context the model could draw on.

The model was never the hard part

There’s a comforting myth that better models will eventually make analytics trivial: point a smart enough AI at your warehouse, ask a question in plain English, get the right answer back. Anthropic, the company building one of the most capable models on earth, just measured what that delivers on its own. Twenty-one percent.

A raw natural-language-to-SQL layer bolted onto a warehouse lives in that 21% world. It doesn’t know which of your revenue columns is the canonical one. It doesn’t know that “active customer” has a specific definition your team agreed on last quarter. It doesn’t know the table was stale this morning. So it does what models do when they lack context: it returns a confident, well-formatted, wrong answer. At scale, that is worse than no answer at all.

What actually moved the number

The jump from 21% to 95% didn’t come from a bigger model. It came from the layers built around it. Five of them:

  • Cross-layer context. Modeling, metrics, and documentation living together so the AI sees the whole pipeline, not a disconnected query box.
  • A semantic layer. Compiled, canonical definitions of every metric, so “revenue” means one thing, every time.
  • Monitoring and freshness. The system knows when data is current and when it isn’t.
  • Validation. Evals, adversarial review, and provenance, so answers get checked instead of taken on faith.
  • Skills. Reusable procedural knowledge written in plain language, the exact thing that carried those evals from 21% to 95%.

If that list looks familiar, it should. It’s how we built Dashfeed.

This is the bet we made

We didn’t build a chat box on top of a warehouse. We built the stack underneath it: native ingestion and transformation, a semantic ontology that gives every metric a single definition, continuous monitoring that watches your data for you, validation so insights are checked rather than guessed, and skills that encode how your business actually reasons about its numbers.

We made that bet because we believed the context layer, not the model, was where analytics would be won. Anthropic just published the eval data that proves it. The teams that win at AI analytics won’t be the ones with the cleverest prompt. They’ll be the ones who own enough of the stack to give the AI the context it needs to be right.

Accurate, and proactive

There’s one more thing. Even at 95%, a query system still waits to be asked. You have to know the question first.

But most of the value in a business hides in the questions nobody thought to ask. That’s why Dashfeed doesn’t wait. It monitors your metrics, detects what changed, and delivers the insight to your team before anyone goes looking. The same whole-stack context that gets you to 95% on the questions you ask is what lets us surface the ones you didn’t.

The bottom line

The model was never the hard part. The context is. The frontier labs are now telling you what their own eval data does: the accuracy lives in the semantic layer, the monitoring, the validation, and the skills, not in the prompt. The platforms that own those layers are the ones that will be right when it counts. We built for that from day one.

Context is the product

Dashfeed owns the whole stack, ingestion, semantic layer, monitoring, validation, and skills, so the AI has the context it needs to be right, then pushes what changed to your team before anyone asks.