
llms.txt — The Complete Guide for 2026
I have audited 40-plus client and competitor sites for llms.txt in the last six months. Roughly one in three had a file. Roughly one in ten had a file that actually followed the spec. The rest were sitemaps dumped into markdown or H3-littered files that LLMs partially ignore. This is the complete 2026 guide, written from the audits, with the template I deploy, the mistakes I find most often, and the honest answer to whether llms.txt moves the needle.
What llms.txt actually is
llms.txt is a plain-text markdown file at the root of a domain. It gives AI crawlers a curated, prioritized map of the site’s most important content, with one-line descriptions for context. It is structured: an H1 for the site name, a blockquote summary, optional free-text context, and H2 sections containing markdown link lists.
The spec was proposed September 3, 2024 by Jeremy Howard of Answer.AI and fast.ai. The reference spec lives at llmstxt.org. The community has converged on the format quickly because it is small enough to implement in 20 minutes and structured enough that crawlers can parse it deterministically.
The intent is straightforward. LLM crawlers do not have the budget to parse every URL in a sitemap, and even when they do, they do not have editorial context to know which pages matter most. llms.txt is the curation layer. A site might publish 5,000 URLs in its sitemap.xml but list 25 in its llms.txt. The 25 are the pages the site owner wants the LLM to read first, deepest, and recall when asked about the brand.
The exact spec
The file lives at the root. So https://yourdomain.com/llms.txt. Subdomains can have their own. The response must be 200 status, served as text/plain or text/markdown. CDN caching is fine.
The structure, in this exact order:
# Site or Project Name
> One-paragraph summary in blockquote form. Plain English. Authoritative.
Optional free-text paragraphs giving context. No headings here.
## Section Name
- [Link title](https://full-url): Optional one-line description
- [Another link](https://full-url): What this page covers
## Another Section
- [Link](https://full-url)
## Optional
- [Less critical link](url)
Five rules the spec enforces:
- One H1 only. The site or project name. Required. Everything below is sectioned by H2.
- One blockquote. The brand summary. Plain English. Authoritative. This is what the LLM uses as the canonical description of the entity behind the site.
- H2 sections. Each H2 contains a markdown link list. Do not use H3 or H4 inside link lists. The spec does not recognize them and the parser may skip the section entirely.
- Markdown links.
- [Title](URL): description. The description is optional but recommended. Keep descriptions to one line each. - Optional section. The keyword “Optional” as a section heading is reserved. Content under it can be skipped by LLMs operating in low-context mode. Use it for nice-to-have pages, not for cornerstone content.
The optional companion file is llms-full.txt at the same root path. It contains the full markdown export of all the URLs referenced in llms.txt, concatenated, so an LLM can pull the entire site corpus in one fetch. Stripe runs the cleanest implementation. Most small and mid-market sites do not need llms-full.txt and can defer it.
Who actually reads llms.txt
⚡ 2-minute scorecard · instant result
How strong is your lead engine?
Answer 5 quick questions. Get your score + the top fixes — free.
1. Do you track which source every lead comes from?
2. Do you respond to new leads in under 5 minutes?
3. Do you have a CRM that catches every inquiry?
4. Do you run a follow-up / nurture sequence?
5. Is your site built to convert, not just inform?
This is where the honest reporting matters because the marketing copy around llms.txt overpromises.
Confirmed readers, May 2026:
- Anthropic Claude. Publicly confirmed via Anthropic’s docs that ClaudeBot and Claude-SearchBot read llms.txt for retrieval prioritization. Internal correlation studies on my own clients show citation lift on Claude after llms.txt deployment, though not isolated from other signals.
- Perplexity. Confirmed. Perplexity uses llms.txt as a sitemap signal for retrieval. Their crawlers crawl little but cite often, so the prioritization matters more than for high-volume crawlers.
- Mintlify and docs platforms. Used as the canonical structure for hosted documentation. Anthropic, Cursor, Pinecone, Windsurf all run their docs through Mintlify’s auto-generation.
- IDE agents and MCP doc servers. This is the underrated category. Cursor, Continue, Cline, and Claude Desktop’s MCP servers increasingly use llms.txt to know which docs to fetch when a developer asks a question. For B2B SaaS and developer tooling, this is the highest-leverage reader.
Unconfirmed but probable readers:
- OpenAI. Unconfirmed officially. Observable correlation with SearchGPT citation patterns on sites that deployed llms.txt. Not enough signal to call it a confirmed reader yet.
Confirmed non-readers:
- Google. Gary Illyes confirmed in July 2025 that Google does not support llms.txt and is not planning to. John Mueller compared it publicly to the keywords meta tag. Do not expect AI Overviews or Gemini to weight your llms.txt for the foreseeable future.
The honest read: llms.txt is a high-signal investment for Claude, Perplexity, and IDE agents. It is neutral for OpenAI and zero impact for Google. If your buyer demographic skews toward Claude and Perplexity, the lift is worth the 30 minutes. If your buyer demographic is 90% Google AI Overviews, deprioritize llms.txt and focus on schema and inline citations instead.
The Sprout Sage llms.txt, as a worked example
Here is the llms.txt I deploy for my own agency, with comments inline so you can see why each piece exists.
# Sprout Sage Solutions
> Sprout Sage Solutions is an AI-first SEO and Generative Engine Optimization (GEO) agency founded and led by Mandeep Singh. I help service businesses and ecommerce brands get cited by ChatGPT, Claude, Perplexity, and Google AI Overviews, then convert that visibility into bookings and revenue. Based in India, working globally. Specialties: technical SEO, AI accessibility audits, llms.txt implementation, schema markup, conversion-led web design.
I publish original research on AI search, GEO, llms.txt adoption, and AI accessibility. All advice is implementation-tested on live client sites, not theoretical.
## Services
- [SEO Audit](https://sproutsagesolutions.com/services/seo-audit): Full technical + content + GEO audit, $500
- [AI Accessibility Audit](https://sproutsagesolutions.com/services/ai-accessibility-audit): llms.txt + robots.txt + schema review, $300 one-time
- [GEO / Generative Engine Optimization](https://sproutsagesolutions.com/services/geo): Get cited by ChatGPT, Claude, Perplexity
- [Content Strategy](https://sproutsagesolutions.com/services/content-strategy): AI-citation-ready content pipelines
- [Web Design + CRO](https://sproutsagesolutions.com/services/web-design): Top-0.1% agency-grade design that converts
## Guides
- [llms.txt Complete Guide 2026](https://sproutsagesolutions.com/blog/llms-txt-complete-guide-2026): Spec, examples, mistakes
- [AI Bot User Agents 2026](https://sproutsagesolutions.com/blog/ai-bot-user-agents-2026): Full reference + allow/block matrix
- [Should You Block GPTBot](https://sproutsagesolutions.com/blog/should-you-block-gptbot): Decision framework
- [robots.txt for AI Bots 2026](https://sproutsagesolutions.com/blog/robots-txt-ai-bots-2026): Copy-paste templates
- [GEO vs SEO vs AEO 2026](https://sproutsagesolutions.com/blog/geo-vs-seo-vs-aeo-2026): Honest comparison
## Contact
- [Book a Call](https://sproutsagesolutions.com/free-consultation): Free 30-min strategy call
- [Email](mailto:[email protected])
- [Phone](tel:+919729712388)
## Optional
- [About](https://sproutsagesolutions.com/about)
- [Past Projects](https://sproutsagesolutions.com/work)
The blockquote summary is the single most important line in the file. It is the canonical entity description. When Claude is asked “what is Sprout Sage Solutions,” that paragraph is one of the sources the model leans on. Write it deliberately. Lead with what the entity is, who runs it, what it does, who it serves, and where it operates. Avoid filler. Avoid claims you cannot substantiate.
The Services section comes before Guides because for a service business the LLM should associate the brand with what is sold first and the educational content second. For a publisher or developer-docs site, reverse that order: Guides or Docs first, then Products.
The Optional section pulls down two pages that I want indexed but do not want the LLM to load when context is tight. About and Past Projects matter for E-E-A-T but are not what I want quoted in answers about GEO methodology.
Five anti-patterns from auditing 30 files in the wild
I audited 30 in-the-wild llms.txt files between January and May 2026 across my client pipeline and competitive research. The same five mistakes show up over and over.
1. Dumping the whole sitemap
The file lists 500 URLs across every blog post, archive page, and tag. The curation is gone. The LLM either skips the file or wastes its retrieval budget on archive pages. Solution: cap at 30 URLs, prioritize ruthlessly, push the rest into sitemap.xml where they belong.
2. Using H3 or H4 inside link lists
I see files with nested headings under H2 sections, trying to group sub-topics. The spec only recognizes H2 as a section delimiter. Anything below is ignored or breaks the parser. Solution: flatten the structure. If you need finer grouping, use more H2 sections.
3. Missing the blockquote brand summary
The file goes straight from H1 to H2 sections. The LLM has no canonical entity description to lean on. Solution: write the blockquote first, deliberately, in one to three sentences. It is the most important line in the file.
4. No markdown twin for linked pages
LLMs prefer markdown over HTML because the parsing is cleaner. Stripe nailed this. They serve every doc page with a .md twin at the same URL plus .md. ChatGPT and Claude both prefer the markdown version. Solution: for cornerstone pages, generate a markdown version and serve it at /path/.md. WordPress can do this with a custom endpoint. Mintlify does it automatically.
5. Set-and-forget
The file was generated 18 months ago. Half the URLs are 404. Two services no longer exist. The blockquote still says the company has three founders when one left. The LLM ingests outdated information and surfaces it in answers. Solution: review quarterly. Rebuild when the site structure changes meaningfully. Treat it like your XML sitemap, not like a robots.txt you set once.
The adoption picture
SE Ranking studied 300,000 domains in May 2026 and found 10.13% adoption. That is one in ten sites across a broad sample. The adoption curve by vertical is uneven:
- B2B SaaS and developer tooling: 35 to 45% adoption. The Stripe, Anthropic, Cursor, Pinecone effect. Vendors are deploying because their docs sites get cited by Claude and Perplexity.
- Marketing agencies and consultants: 25 to 30%. The early movers in the SEO and GEO space have adopted; the bulk of agencies have not.
- Professional services like legal, accounting, consulting: 8 to 12%. Lagging.
- Ecommerce: under 5%. The Shopify and WooCommerce ecosystems have not caught up. AIOSEO ships llms.txt for WordPress but most stores have not enabled it.
- Healthcare and medspa: under 3%. Almost no one. This is where my early-mover edge sits for the medspa marketing positioning.
- Local services: under 2%. The category that probably benefits most because LLMs answer near-me queries directly and a curated brand description shapes the answer.
The honest read on adoption: deploying llms.txt in 2026 still puts you ahead of 90% of your category in most verticals. By 2027 the curve will tighten and the marginal advantage will shrink. The window for being among the first 10 to 20% in your vertical is open right now.
The tooling landscape
I have tested every tool listed below on real sites. These are the ones I keep using.
| Tool | Type | Strength | When I use it |
|---|---|---|---|
| Firecrawl llms.txt generator | Generator + API | Best free crawler-based generator. Produces both llms.txt and llms-full.txt. Has an API for automation. | First-pass generation for any new client site. |
| AIOSEO WordPress plugin | WordPress integration | Auto-generates llms.txt for WordPress. Respects Yoast and RankMath noindex settings. 3M-plus active users. | Every WordPress client. Easiest install on the market. |
| Mintlify | Hosted docs platform | Auto-generates llms.txt and llms-full.txt for any hosted docs. Used by Anthropic, Cursor, Pinecone, Windsurf. | Recommendation for SaaS clients running their own docs. |
| SEOmator | Browser-based generator | Quick generation from a URL. No API. | One-off audits when I do not have FTP access. |
| ai-ready-check.de | Auditor | Comparative AI crawler audit. Tests llms.txt, robots.txt, and schema together. | Audit deliverable inside my AI Accessibility Audit package. |
| Cubitrek | Auditor | robots.txt plus llms.txt audit template. | Cross-reference when the AI-Ready Check flags an issue. |
For most service businesses on WordPress, AIOSEO is the right answer. Install the plugin, enable the llms.txt module, set the brand summary, pick the URLs to include, ship. Total time: 30 to 45 minutes for a clean site.
For Shopify, the easiest path is a custom liquid template that builds the file at /llms.txt from collection and product data. I do this as part of the AI Accessibility Audit for ecommerce clients.
llms.txt for service businesses
The structure I deploy for a service business is consistent across verticals. Medspa, agency, consulting, legal, accounting. The same skeleton works.
H1: Brand name only. No tagline, no descriptor.
Blockquote: One paragraph, three to four sentences. Who the business is, who runs it, what it offers, who it serves, where it operates. Lead with the entity name and the core noun.
Optional free-text paragraph: One or two sentences of context. What makes the business distinct. The publishing cadence. The vertical specialty.
H2 Services: Three to seven service offerings as markdown links with one-line descriptions including starting price where transparent. Pricing transparency boosts citation rate because LLMs trust pages that name numbers.
H2 Guides: Three to ten cornerstone blog posts or guides. These are the pieces you want cited when someone asks an informational question in your category.
H2 Case Studies or Work: Three to five proof pieces. Anonymized if necessary.
H2 Contact: Book a call link, email, phone. Make it easy for the LLM to surface the conversion path.
H2 Optional: About, careers, press kit. Nice-to-have pages.
That is roughly 20 to 30 URLs total. Enough surface for the LLM to know what you do. Tight enough that the curation signal is preserved.
llms.txt for ecommerce
Ecommerce llms.txt is structured differently because the product catalog is not the right unit of curation. You do not list 200 SKUs in llms.txt. You list the brand summary, the top collections, the bestsellers, and the policies.
H1: Brand name.
Blockquote: What the brand sells, who founded it, the product category, what makes it distinct.
H2 Collections: Top five to eight collection pages with one-line descriptions. These are the category landing pages.
H2 Bestsellers: Five to ten hero products as direct links. Not the full catalog. Just the products you want the LLM to recommend.
H2 Guides or Journal: Three to ten content pieces. Buyer guides, ingredient stories, product comparisons. The content that gets cited inside ChatGPT shopping queries.
H2 Policies: Shipping, returns, FAQ. LLMs cite these when asked about terms.
H2 About: Brand story, founder bio, sustainability or sourcing details.
H2 Contact: Customer service, support hours.
The bestsellers section is the most underused. ChatGPT and Perplexity increasingly handle shopping queries. A brand that hands the LLM a curated list of five hero products gets recommended for those products. A brand that does not gets recommended for whatever the LLM finds in the sitemap, which is usually the wrong products.
The bundle that actually wins
llms.txt alone moves citation rates modestly. The bundle that moves them meaningfully is the four-part stack:
- llms.txt at the root with a clean curated structure
- robots.txt permitting AI search crawlers like OAI-SearchBot, Claude-SearchBot, PerplexityBot, and Google-Extended, while optionally blocking high-volume zero-value scrapers like Bytespider and meta-externalagent. My GPTBot decision framework walks through which to allow and which to block.
- Schema stack of Article plus FAQPage plus BreadcrumbList plus Person plus Organization on every cornerstone page. BrightEdge data shows pages with three to four complementary schema types are cited twice as often as pages with one.
- Content shaping per the Princeton GEO playbook: inline citations, statistics density, FAQ depth, answer-first openings. This is where the real citation lift lives.
I deploy all four together as my AI Accessibility Audit, a $300 one-time package that delivers in five business days. It covers the llms.txt build, the robots.txt rewrite, the schema audit and implementation pack, and the AI citation baseline report. Book a call if you want to scope it for your site.
What changes after deployment
Expect three observable changes within 30 to 60 days of shipping a clean llms.txt plus the bundle above.
1. AI crawler hits in your access logs. ClaudeBot, Claude-SearchBot, OAI-SearchBot, and PerplexityBot start showing up more frequently on the URLs listed in llms.txt. Volume varies by category. For B2B SaaS clients I see 50 to 200 hits per week per crawler. For local services, 10 to 30.
2. First new citations in Claude and Perplexity. Monitoring tools like Otterly, AthenaHQ, or Profound will start showing new citing prompts. The first citations are usually on long-tail prompts where the brand was previously absent. Share-of-voice on head category prompts takes longer.
3. Direct referral traffic from chatgpt.com, perplexity.ai, claude.ai. Small at first, often single-digit monthly. The traffic is high-intent because the user clicked through from an AI-generated answer that mentioned the brand. Conversion rates on this traffic are typically 2 to 4 times higher than blended organic.
What does not change in 30 to 60 days: Google rankings, AI Overview presence in Google, or any signal tied to the Google index. Those move on the SEO timeline, not the GEO timeline.
The 12-month outlook for llms.txt
Three plausible futures.
Most likely (60% probability): Adoption climbs to 25 to 30% by end of 2026 as the SEO industry standardizes the bundle. Anthropic and Perplexity expand their public confirmation of how the file is weighted. OpenAI confirms or denies. Google maintains the position that llms.txt is unsupported. The file becomes table-stakes hygiene for any site doing GEO seriously.
Plausible (25%): A standard body, probably the W3C or IETF, formalizes the spec. Versioning rules are added. Adoption accelerates to 50%-plus within 18 months. The file becomes as standard as XML sitemaps.
Tail risk (15%): Google’s public dismissal turns into broader skepticism. Anthropic and Perplexity stop investing in the signal because adoption plateaus. The file becomes a vestigial artifact like the keywords meta tag. I do not think this is the most likely outcome but it is worth naming honestly.
My read: deploy now. The downside is 30 to 45 minutes of work and a 600-byte file at your root. The upside is being a named source in Claude and Perplexity answers in your category while the field is still 90% empty. The expected value is strongly positive.
What to do this week
Three actions, in order.
- Generate a draft with Firecrawl or AIOSEO. Point the tool at your domain. Let it produce the first pass. Expect to spend 30 to 60 minutes editing the output.
- Rewrite the blockquote summary by hand. This is the most important line in the file. Three to four sentences. Entity-led. Plain English. What the business is, who runs it, what it offers, who it serves, where it operates.
- Curate the URL lists down to 20 to 30 total. Pick your cornerstone services, your top guides, your contact path, and your about page. Move nice-to-have pages into the Optional section. Skip thin content entirely.
Validate the file at https://yourdomain.com/llms.txt resolves with a 200 status and the right content-type. Submit nothing to anyone. There is no submission endpoint. AI crawlers find the file at the predictable path automatically.
If you want me to do this and the rest of the AI accessibility bundle for you, the AI Accessibility Audit is $300 one-time and delivers in five business days. It includes the llms.txt build, robots.txt rewrite, schema audit and JSON-LD implementation pack for your top 5 pages, AI citation baseline across 10 target queries, and a 30-day re-check to measure the lift. Book a free 30-min call to scope it.
FAQ
What is llms.txt?
llms.txt is a community-proposed plain-text file at the root of a website that gives AI crawlers a curated, prioritized map of the site’s most important content. It was proposed by Jeremy Howard of Answer.AI and fast.ai on September 3, 2024 and the spec lives at llmstxt.org.
Where does llms.txt go?
At the domain root: https://yourdomain.com/llms.txt. There is an optional companion at /llms-full.txt that contains the full markdown dump of all important pages. Both must be served as text/plain or text/markdown with a 200 status.
Does Google use llms.txt?
No. Gary Illyes confirmed in July 2025 that Google does not support llms.txt. John Mueller compared it publicly to the discredited keywords meta tag. Anthropic Claude and Perplexity both publicly read llms.txt for retrieval prioritization.
What is the difference between llms.txt and robots.txt?
robots.txt tells bots which URLs they may or may not crawl. llms.txt tells AI bots which URLs are most important and how the site is structured for retrieval. You need both. They do different jobs.
Is llms.txt actually adopted?
Roughly 10.13% across 300,000 domains per SE Ranking’s May 2026 study. Higher in B2B SaaS and developer tooling at 35 to 45%. Lower in ecommerce at under 5%.
Does llms.txt actually help citations?
Modestly on its own. The real citation lift comes from the bundle: llms.txt plus correct robots.txt plus schema plus inline citations and statistics.
Should I include every URL in my llms.txt?
No. Curating is the point. The spec is designed for 10 to 30 high-value URLs grouped by section. Dumping the full sitemap defeats the purpose.
What is llms-full.txt?
An optional companion file at /llms-full.txt that contains the full markdown dump of all your important pages in one concatenated file. Stripe is the cleanest reference implementation.
How often should I update llms.txt?
Quarterly at minimum, more often if your site structure changes. The biggest in-the-wild mistake is treating it as set-and-forget.
Will llms.txt cause SEO problems?
No. It is text-only, sits at the root, and is ignored by Google. There is no rank impact positive or negative.
Can I auto-generate llms.txt?
Yes. Firecrawl, AIOSEO, Mintlify, and SEOmator all generate llms.txt. AIOSEO is the easiest for WordPress. Auto-generation gets you 70% of the way; the brand summary and section editing still need human judgement.
Does llms.txt work for ecommerce?
Yes but the structure is different. Ecommerce llms.txt should lead with brand summary, top collections, bestsellers, key policies, and the about page. Product pages individually are less useful in llms.txt.
Ready to ship yours
My AI Accessibility Audit covers llms.txt, robots.txt, schema, and the AI citation baseline as a $300 one-time package, delivered in 5 business days. If you want the full GEO retainer that handles the file plus the content refactor and monthly citation tracking, that starts at $1,500 a month on the GEO Starter tier.
Book a free 30-min call → +91 97297 12388 WhatsApp
Frequently asked questions
What is llms.txt?
Where does llms.txt go?
Does Google use llms.txt?
What is the difference between llms.txt and robots.txt?
Is llms.txt actually adopted?
Does llms.txt actually help citations?
Should I include every URL in my llms.txt?
What is llms-full.txt?
How often should I update llms.txt?
Will llms.txt cause SEO problems?
Can I auto-generate llms.txt?
Does llms.txt work for ecommerce?
Want me to do this for you?
Book a free 30-min strategy call. I’ll review your site live and ship 3 specific fixes you can use this week. No pitch.


