ask hev: an agentic search box for Astro docs

The new site post had a bullet about the chat agent docked at the bottom of this site, grounded in my own writing. While I was building the docs site for hev layer (coming soon), I wanted the same idea in a different shape: not a chat box, but a search box that’s smarter than keyword match when the question deserves it. That turned into ask hev, an Astro integration I’m packaging as open source.

It’s still under development. But the shape is settled enough to write down, and the design is the interesting part.

Two speeds, one box

Most search is either fast and dumb or slow and smart. ask hev is both, and it lets the reader pick without thinking about it.

While you type: instant keyword search, running locally on the server. No API key, no model, no round trip to anyone. It’s the fast path and it’s the default.
When you press Enter: a bounded agent takes the query, decides its own sub-queries, and ranks the results itself.

The toggle is the keyboard. Type and hit Enter, you get the agent. Type, arrow down to a hit, hit Enter, you open that hit. The fast path stays fast; you only pay for the model when you ask a real question.

Results are anchors, not pages

A search result that drops you at the top of a 3,000-word page and makes you Ctrl-F is barely a result. So ask hev chunks every doc by heading and a result is an anchor — /docs/concepts#kubernetes-autoscaling, not /docs/concepts.

The trap here is slug drift. Astro generates heading IDs with its own slugger; if my anchors don’t match byte-for-byte, every deep link 404s to the top of the page. So I use the same github-slugger Astro does, and there’s a mechanical verifier in CI that builds the site and asserts every chunk anchor exists as an id= in the rendered HTML. It’s the one dependency I let into an otherwise zero-dep package, because correct deep links beat the zero-dep aesthetic.

A knowledge graph, built offline by Opus

The agent is only as good as what retrieval hands it, and plain keyword overlap is brittle — a reader types “k8s scaling,” the docs say “Kubernetes autoscaling,” and they miss each other.

So at build time, Opus 4.8 reads the whole corpus once and emits a small knowledge graph: a compact orientation of the product and how its pieces relate, plus a glossary of terms with aliases. That artifact gets committed to git as reviewable JSON. It’s hash-gated, so it only regenerates when the content actually changes — no model spend on a build that didn’t touch the docs. And if there’s no API key at build time, the build doesn’t fail; it ships the last committed graph.

That graph earns its keep two ways at runtime. The orientation goes into the agent’s system prompt (prompt-cached, so it’s nearly free per query). The glossary expands query terms before retrieval — “k8s” pulls in “kubernetes” so the keyword pass surfaces the right chunks for the agent to rank.

The loop

On Enter, the query goes to a small model — Haiku 4.5 by default, configurable — inside a bounded tool-use loop:

It has a search tool that runs the glossary-expanded keyword pass and returns candidate sections.
It decides how many searches to run (capped at three or four): vary the terms, try synonyms, decompose a multi-part question.
When it has enough, it calls present_results with the best sections, best first, each with a one-line snippet.

It’s a results generator, not a chat assistant — it never answers the question itself, it just ranks the sections that do. The overlay shows a faint searched: autoscaling, control loops line so you can see the queries it actually ran. Worst case is a few Haiku round trips, a couple of seconds; the instant keyword path is always there as the fast escape hatch.

What it isn’t

Retrieval is still keyword-and-glossary token overlap — no embeddings, no vector store, no external index. The agent ranks independently, but it can’t rank what retrieval never surfaced, so paraphrase recall has a ceiling. Embeddings are the deferred upgrade if that ceiling starts to hurt; I’d rather ship the dependency-free version, run it on real docs, and find out whether I actually need them. So far, on the layer corpus, I haven’t.

Where it’s going

ask hev runs on the layer docs site today, and it has a home of its own: askhev.com — the docs there search themselves with it. Next it comes home to hevmind — the writing archive you’re reading is exactly the kind of corpus it’s built for.

It’s the same instinct as the rest of this site: as much agent as the question needs, no more, grounded in real material, and tested. If you’re building something where search has to be smarter than grep but you don’t want to stand up a vector pipeline to get there, let’s talk.