I'm Breaking SEO Best Practices. And It Feels Good.

I created something called the LLM Sitemap. It's not a best practice. SEO tools are throwing errors at me. And I'm keeping it anyway.

TL;DR

XML sitemaps, HTML sitemaps, and llms.txt don't help LLMs understand what you're an authority on
I built an "LLM Sitemap" – a semantic HTML page with comparison tables, service data, and strategic FAQs
FAQs designed to trigger citations: "What makes Growtika different?" instead of "What services do you offer?"
Added it to robots.txt. Tools are angry. Client saw crawled pages go from ~50 to 1,000+ after implementation
See it live → · Implementation guide →

Here's the thing. I've been doing SEO for over a decade. I know what the spec says. I know what the tools expect. I know the "right" way to do things.

But the spec was written for a different era. And nobody seems to be updating it.

• • •

The Actual Problem

LLMs need to understand websites. Not just crawl them. Understand them. What does this company do? What are they actually an authority on? How does their content relate to itself?

The tools we have weren't built for this. Let me be specific:

The Evolution of Sitemaps (And The Gap)

Tap to enlargeClick to enlarge

XML sitemaps (2005): A list of URLs with lastmod timestamps. Designed for Googlebot's crawl budget optimization. LLMs don't consume these directly. Zero semantic information.

HTML sitemaps: The kind Apple still maintains. Human-readable directory of pages organized by section. Better, but still just titles and links. No context about what each page covers or why it matters.

llms.txt (2024): Jeremy Howard's proposal. Markdown file with curated links and brief context. Smart design, but intentionally sparse. For a 200-page site with multiple content verticals, a handful of curated links leaves most content orphaned.

None of these help an LLM answer: "What is this company actually an authority on? When should I cite them? How does their content relate to itself?"

So I built something to fill that gap.

What I Built

I call it an "LLM Sitemap." It's a semantic HTML page. Human-readable. Crawlable. But structured specifically to help LLMs understand a site well enough to cite it accurately.

The structure:

The Actual LLM Sitemap (Simplified View)

Tap to enlargeClick to enlarge

It's a single HTML page that contains:

Entity context upfront. Not "We do marketing" but structured explanation of exactly what we do, who we serve, and what makes our approach different. The kind of context that helps an LLM decide whether to cite us.

Comparison tables with actual data. Service timelines, outcomes, pricing tiers. When someone asks an LLM "how long does X take?" it can pull real numbers instead of hallucinating.

First-person FAQs. This is the key insight. More on this below.

Cross-reference links. How our GEO services relate to SEO. How cybersecurity clients differ from fintech. The semantic graph of our expertise.

A meta section explaining itself. Tells crawlers "this page is intentionally structured for LLM consumption." Whether they use that signal is an open question.

Then I did the thing that made my tools angry: I added a reference to it in robots.txt.

The file validates fine. Crawlers can still read it. The "error" is just that I used a directive the spec doesn't recognize. The tool doesn't know what to do with LLM-Sitemap: so it flags it.

I'm keeping it anyway.

The First-Person FAQ Hypothesis

Here's the non-obvious insight that drove the design.

Traditional FAQ format: "What are the benefits of GEO for B2B companies?"

What I wrote instead: "My SaaS isn't appearing in ChatGPT. Why?"

The reasoning: that's how people actually prompt LLMs. They don't ask formal third-person questions. They describe their situation in first person. "I'm struggling with X." "We're seeing Y but not Z."

If an LLM is doing retrieval-augmented generation and someone asks "My competitors show up in ChatGPT but I don't, what's wrong?" and my page has that exact question with a structured answer, it's a much closer semantic match than a generic FAQ about "AI visibility benefits."

I'm essentially pre-writing the answers I want LLMs to give.

No documentation told me to do this. It's a hypothesis based on how these systems actually work.

Query Matching: Third-Person vs First-Person FAQs

Tap to enlargeClick to enlarge

The Checklist Problem

Here's the thing. I run an SEO agency. Every client we work with gets solid technical foundations: proper sitemap.xml, clean heading structure, smart internal linking, crawlability sorted. That stuff matters. It's not optional.

But does every company need the exact same checklist? The same 47-point audit? The same playbook applied identically?

What about finding the edge? What's unique about this specific company, this specific market, this specific moment? Where's the room to try something different?

SEO has become extremely compliance-focused. Every strategy is a variation of what someone else published. Every tool measures how well you followed documented specs. "Innovation" usually means automating existing workflows, not questioning whether those workflows still make sense for a specific situation.

The specs we're all following were written when Google was the only game that mattered. Now we have ChatGPT, Claude, Perplexity, Gemini, Copilot. Each with different retrieval mechanisms, different training data sources, different citation behaviors.

And our response as an industry? Keep optimizing for the same signals we've been optimizing for since 2015.

When's the last time you saw something genuinely novel in SEO? Not a new tool. Not a new metric. An actually new idea about how to help machines understand content?

I can't remember either. The internet would be a lot more interesting if more people were trying things.

What Happened to Experimentation

Early SEO was messy. People tried things. Some worked, some didn't. The whole field evolved through experimentation.

Now it's been professionalized into compliance checklists. Site audit scores. "Best practice implementations." The risk tolerance for trying anything non-standard has dropped to near zero.

I get it. SEO has real business impact. Getting it wrong has consequences. But somewhere we crossed from "be careful" to "never deviate from documented patterns."

Meanwhile, the landscape has shifted fundamentally. LLMs are becoming a primary discovery channel. Zero-click is eating organic traffic. The rules are being rewritten in real-time.

And we're all focused on whether our canonical tags are correctly implemented.

Honest Assessment

Will this work at scale? I genuinely don't know yet.

But here's what I do know: we added an LLM Sitemap to a client who was struggling with indexing. Within days, search engines and LLMs started discovering pages that had been invisible for months.

Crawled pages increased from ~50 to 1,000+ after LLM Sitemap implementation

Tap to enlargeClick to enlarge

Correlation isn't causation. They were doing other things too. But the timing was hard to ignore.

The LLM Sitemap might be completely ignored by some crawlers. First-person FAQs might not improve retrieval matching in every case. The whole thing might turn out to be situational.

But here's my thinking:

1. The downside is minimal. It's one HTML page. Some angry tool warnings. Not going to tank your rankings.

2. The hypothesis is reasonable. LLMs do use semantic context. They do perform retrieval. Giving them better-structured information should help, even if I can't prove it universally yet.

3. Nobody else is trying this. In a field where everyone copies everyone, that alone makes it worth testing.

I'd rather run original experiments that might fail than perfectly execute last year's playbook.

The organizations that figure out LLM optimization early will have a significant head start. The ones waiting for Google to publish official guidelines will be 2-3 years behind. That's the bet I'm making.

Try Breaking Something

I'm not advocating for ignoring all standards. Technical SEO fundamentals exist for good reasons. Don't break things that are working.

But if you have a hypothesis about how to help machines understand content better, test it. Even if it's non-standard. Even if the tools complain. The spec isn't sacred. It's just the best answer we had at the time.

The LLM Sitemap is my hypothesis. I'll report back on whether it actually affects anything measurable.

• • •

The LLM Sitemap is live at /llm-sitemap. The full implementation guide with code examples is at /blog/llm-sitemap.

If you try something similar, I'd genuinely like to hear what you learn. The more people experimenting, the faster we figure out what actually works.

Yuval Halevi

Helping SaaS companies and developer tools get cited in AI answers since before it was called "GEO." 10+ years in B2B SEO, 50+ cybersecurity and SaaS tools clients.