Book Free Consultation

Free Robots.txt Generator

Build, validate, and download a production-ready robots.txt file in seconds. Platform presets for WordPress, Shopify, Next.js, and more.

User-agent Rules

Live Preview — robots.txt

What Is a Robots.txt File and Why Every Website Needs One

Every time a search engine spider lands on your domain, the very first file it requests is robots.txt. Before Googlebot reads a single blog post, before it looks at your homepage, before it touches anything — it reads that file. That makes robots.txt one of the most consequential files on your site, and also one of the most often ignored.

A robots.txt file is a plain text file that lives at the root of your domain — always at yoursite.com/robots.txt and nowhere else. It follows the Robots Exclusion Protocol, an informal standard that well-behaved crawlers have respected since 1994. The file tells bots which sections of your site they are allowed to visit, which sections they should skip, and where to find your sitemap.

I have audited hundreds of websites for clients, and I estimate that roughly 60% of them have a robots.txt problem. Either the file doesn't exist, it's a copy-paste from a forum post that blocks things it shouldn't, or it's the default WordPress file that hasn't been touched since installation. Any of those scenarios can leak crawl budget, expose staging content, or silently prevent important pages from ranking.

Getting it right is not difficult once you understand the syntax. This tool is designed to remove the guesswork entirely — pick a platform preset, adjust rules to match your specific setup, and download a validated file in under two minutes.

Who Reads Your Robots.txt

Googlebot is the most important, but it is far from the only crawler that checks your robots.txt. Bingbot, Yandex, DuckDuckBot, and dozens of AI training crawlers all follow the same protocol. Malicious scrapers typically ignore robots.txt entirely, so the file is not a security measure — it is a communication tool between you and legitimate bots. Think of it as a polite sign on the front door of your website, not a lock.

How This Robots.txt Generator Works

I built this tool to solve a specific problem: most robots.txt generators give you a form with two fields and call it done. That is not enough. A real robots.txt for a real site has multiple User-agent blocks, nuanced Allow/Disallow rules, sitemap references, and sometimes crawl-delay settings. Getting any of those wrong can quietly damage your SEO for months before you notice.

Here is how to use the generator:

  1. Select a platform preset. Choose WordPress, Shopify, Next.js, Medspa/Local, or start blank. Each preset loads a sensible baseline configuration used by well-optimised sites on that platform.
  2. Edit User-agent blocks. Each block targets a specific crawler or all crawlers (using the wildcard *). You can add as many blocks as you need — for example, one block for all crawlers and a separate stricter block for a specific bot.
  3. Add Disallow and Allow rules. Use the dropdowns inside each block to set the rule type, then type the path. Rules are evaluated in order, so be precise.
  4. Add your sitemap URL. Paste the full URL of your XML sitemap. This tells every crawler where to find your content map, which accelerates indexing.
  5. Set crawl-delay (optional). If your server struggles under heavy crawl load, a crawl-delay value in seconds asks bots to wait between requests. Use this sparingly — it slows down indexing.
  6. Review the live preview and validation warnings. The tool checks for common mistakes in real time and flags them before you download.
  7. Copy or download. Use the copy button to paste directly into your server, or download the file ready to upload via FTP/SFTP.

If you run a WordPress site, you can also manage your robots.txt through RankMath or Yoast SEO under their General Settings — those plugins store the content in your database and serve it dynamically. Either approach works; just make sure you are not editing the physical file AND the plugin simultaneously, or you will have conflicting outputs.

Pro tip: After uploading your new robots.txt, open Google Search Console, go to Settings, and use the robots.txt tester. It shows which URLs are blocked and lets you test specific paths against your rules. This step takes three minutes and can save you from a costly mistake.

Robots.txt Syntax Explained

The syntax is deliberately minimal, which is part of why mistakes are so easy to make. There are only a handful of directives, but the way they interact requires precise understanding.

User-agent

Every rule block must start with a User-agent line that specifies which crawler the following rules apply to. The wildcard * matches all crawlers that are not explicitly addressed in another block. You can stack multiple User-agent lines before a set of rules to apply those rules to multiple bots at once.

# Apply rules to all crawlers User-agent: * # Apply a rule to Googlebot only User-agent: Googlebot

Disallow

Disallow tells a crawler not to visit a specific path. The value is matched as a prefix — so Disallow: /admin/ blocks /admin/, /admin/settings/, and anything else starting with /admin/. A completely empty Disallow: (with no value) means "allow everything" — which is the standard way to create a fully open robots.txt.

Disallow: /wp-admin/ Disallow: /private/ Disallow: /checkout/

Allow

Allow is used to carve out exceptions inside a blocked path. This is most commonly needed in WordPress to allow /wp-admin/admin-ajax.php even when the rest of /wp-admin/ is blocked. The Allow rule must come before or at the same specificity level as the Disallow to take effect. In practice, put Allow before Disallow within the same block.

Allow: /wp-admin/admin-ajax.php Disallow: /wp-admin/

Sitemap

The Sitemap directive is a non-standard addition that has been universally adopted. It takes a full absolute URL to your XML sitemap. Unlike the other directives, Sitemap is not grouped inside a User-agent block — it sits at the top level, usually at the end of the file. You can include multiple Sitemap lines.

Sitemap: https://yoursite.com/sitemap_index.xml

Crawl-delay

Crawl-delay asks the crawler to wait a specified number of seconds between requests. Google officially does not support this directive (use GSC Crawl Rate settings instead), but Bing, Yandex, and others do honour it. Include it only if your server has performance issues under crawl load.

Crawl-delay: 2

Comments

Any line beginning with # is a comment and is ignored by crawlers. Use comments to annotate your file — future you (or a future developer) will appreciate knowing why a rule exists.

Platform-Specific Robots.txt Examples

Below are the exact configurations this tool generates for each platform preset, with explanations for each rule.

WordPress

WordPress exposes several directories that should never be crawled. The admin panel, login page, plugin assets folders, and theme files do not belong in search results. The most critical exception is admin-ajax.php, which powers front-end functionality including contact forms, cart updates, and dynamic content — blocking it breaks these features for users.

# WordPress — Sprout Sage Solutions recommended configuration User-agent: * Allow: /wp-admin/admin-ajax.php Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-login.php Disallow: /?s= Disallow: /xmlrpc.php Sitemap: https://yoursite.com/sitemap_index.xml

Shopify

Shopify's default robots.txt blocks checkout, admin, and internal search pages. Shopify manages the actual robots.txt file differently from self-hosted platforms — since 2021, you can customise it via the robots.txt.liquid template in your theme. The output below reflects the standard Shopify configuration.

# Shopify standard configuration User-agent: * Disallow: /admin Disallow: /checkout Disallow: /orders Disallow: /carts Disallow: /policies/ Disallow: /256/ Disallow: /search Sitemap: https://yourstore.myshopify.com/sitemap.xml

Next.js

Next.js applications served via Vercel or a custom Node server should block the internal /_next/ directory (static build assets) from crawling, though allowing specific static file paths under it is fine. API routes should be blocked unless you want them indexed.

# Next.js / Vercel configuration User-agent: * Disallow: /api/ Disallow: /_next/ Sitemap: https://yoursite.com/sitemap.xml

Medspa / Local Business

For medspa and local business websites, the goal is typically to maximise crawlability of all patient-facing content while blocking any back-office portals, patient intake forms (for privacy), and internal search. If you are on a platform like Jane App or Mindbody for booking, their embedded widgets live on your domain and may create crawlable URLs you do not want indexed.

I cover this in more detail in my guide on medspa SEO — the short version is: be generous with Allow rules and conservative with Disallow. For a local service business, nearly every page on your site is indexable and worth crawling.

# Medspa / Local business configuration User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: /?s= Disallow: /patient-portal/ Disallow: /intake-forms/ Sitemap: https://yourmedspa.com/sitemap_index.xml

Common Robots.txt Mistakes That Kill Your SEO

In my experience reviewing client sites, these are the mistakes I see most often — and the ones this generator's validator actively checks for.

1. Blocking the Entire Site

This is the most catastrophic mistake, and it happens more than you would think. A single line — Disallow: / — tells every crawler to leave immediately. If your developer used this in a staging environment and accidentally deployed it to production, your entire site can de-index within days. The fix is one character: Disallow: with nothing after it means allow everything.

2. Blocking CSS and JavaScript Files

Google renders your pages the same way a browser does. If your robots.txt blocks /wp-content/ or /assets/, Googlebot cannot load your stylesheets or scripts. The result is that Google sees a broken, unstyled version of your page and ranks it accordingly. Always allow CSS, JavaScript, and image files.

3. Blocking the Sitemap URL

I have seen this exact mistake: a developer blocks /sitemap.xml in robots.txt and then adds a Sitemap: directive pointing to the same URL. The crawler reads the Disallow rule first, skips the sitemap entirely, and your content discovery suffers. Always test that your sitemap URL is not blocked.

4. Using Robots.txt to Hide Sensitive Content

Robots.txt is public. Anyone can read it at yoursite.com/robots.txt. If you list Disallow: /secret-client-portal/, you are actively advertising that path to anyone who is curious. Use authentication for genuinely private content, not robots.txt.

5. Forgetting the Allow Exception for WordPress admin-ajax.php

Block /wp-admin/ — yes, you should. But forget to add Allow: /wp-admin/admin-ajax.php and you will break contact forms, WooCommerce cart functionality, and any plugin that uses AJAX for front-end features. The WordPress preset in this tool handles this automatically.

6. Incorrect Path Formatting

Paths in robots.txt are case-sensitive on Linux servers (which is almost all web servers). Disallow: /Blog/ does not block /blog/. Paths must also start with a forward slash. The tool enforces correct formatting as you type.

7. Duplicate Conflicting Rules

If you Disallow a path and also Allow the same path in the same block, crawlers use the most specific rule. But if you have two blocks targeting the same User-agent with contradictory rules, behaviour varies between crawlers. Keep your blocks clean — one block per User-agent, ordered from most specific to least.

For a deeper look at how robots.txt fits into a broader technical SEO strategy, see my article on site speed and crawlability.

How I Configure Robots.txt for Medspa Websites

Medspa and aesthetics clinic websites have a unique set of needs. I have built and optimised dozens of these sites for clients, and the robots.txt configuration is always one of my first technical checkpoints.

Most medspa sites run on WordPress with a booking integration — Jane App, Mindbody, Vagaro, or a custom booking widget. The challenge is that some of these integrations create URL patterns on your domain that look like unique pages but are actually duplicate parameter-laden variations of the same booking interface. Left unchecked, Googlebot can waste a significant portion of its crawl budget on these non-valuable pages.

What I Block for Medspa Sites

  • /wp-admin/ (with Allow exception for admin-ajax.php)
  • Search results pages: /?s= and /search/
  • Booking parameter URLs if the booking system creates them on-domain
  • Patient portal paths if any exist
  • Tag and category archive pages (if the site is primarily a services site, not a content hub)

What I Explicitly Allow

  • All service pages — these are the money pages
  • All blog and resource content
  • Location and contact pages
  • Before/after gallery pages (huge for aesthetics clinics)
  • All CSS, JS, and image assets

The overarching principle for a local service business is: be generous. You want Googlebot to see as much of your valuable content as possible. Blocking aggressively makes sense for large e-commerce sites or SaaS platforms with thousands of parameter-driven URLs — but for a 30-page medspa site, the risk is over-blocking rather than under-blocking.

If you are running SEO for your medspa and want a full technical audit, I offer a free consultation where I will review your robots.txt, sitemap, Core Web Vitals, and schema markup in a single session.

I also build out complete structured data for medspa sites as part of my medspa SEO packages. Robots.txt is just one piece — schema markup, internal linking, and page speed all compound together to determine how well your site ranks.

Robots.txt vs Meta Robots vs X-Robots-Tag

These three directives all control how search engines interact with your content, but they operate at different levels and do different things. Confusing them is a common source of technical SEO errors.

Robots.txt — File-Level Crawl Control

Robots.txt controls whether a crawler can visit a URL at all. It is evaluated before the page is fetched. The limitation: if you block a page in robots.txt, the crawler cannot read the page — including any noindex tags you have placed on it. This means blocking a page in robots.txt does not guarantee it won't appear in search results if other sites link to it. Google may index the URL (without a description snippet) just from seeing it in a link.

Best for: blocking entire sections of a site from being crawled (admin areas, internal search, staging paths, duplicate parameter URLs).

Meta Robots — Page-Level Index Control

The meta robots tag lives in the <head> of a specific HTML page:

<meta name="robots" content="noindex, nofollow">

This tells crawlers: you can visit this page, but do not include it in search results and do not follow its links. Because the crawler must visit the page to read this tag, the page cannot be blocked in robots.txt if you want the noindex to be respected.

Best for: pages that exist and need to be crawled, but should not appear in search results (thank-you pages, internal tools, paginated archives beyond page 2).

X-Robots-Tag — Server-Level Control for Any File

The X-Robots-Tag is an HTTP response header. It works like a meta robots tag but can be applied to any file type — including PDFs, images, or non-HTML resources that cannot have a meta tag. It is typically set via server configuration or programmatically.

Best for: applying noindex to PDFs, images, or resources generated by an application.

The Decision Matrix

  • Want to stop a crawler from visiting a path entirely? Use robots.txt Disallow.
  • Want the page crawled but not indexed? Use meta robots noindex (and ensure the page is NOT blocked in robots.txt).
  • Want to control a non-HTML file? Use X-Robots-Tag in the HTTP header.

Testing Your Robots.txt File

Writing the file is step one. Testing it before you deploy is step two, and most people skip it. Here are the tools I use.

Google Search Console Robots.txt Tester

Google Search Console has a built-in tester under Settings. Paste your robots.txt content and test any URL to see if it would be allowed or blocked. This is the most authoritative way to check, since it mirrors exactly how Googlebot interprets the file. The one limitation: it only tests Googlebot behaviour, not other crawlers.

Manual Testing

Visit yoursite.com/robots.txt directly after uploading. You should see a plain text file with your rules. If you see a 404, the file is not in the right location or the filename is incorrect (it must be exactly robots.txt in lowercase).

Screaming Frog

If you run technical SEO audits, Screaming Frog's spider can simulate a specific User-agent and will flag any URLs it encounters that are blocked by your robots.txt. This is useful for catching accidental blocks on pages you want indexed.

The Fetch and Render Test

In Google Search Console, use URL Inspection to fetch a page and request indexing. If the fetched page looks broken or missing elements, it usually indicates that Googlebot cannot access the CSS/JS — which traces back to a robots.txt rule blocking your assets directories. This is one of the easiest ways to detect over-blocking.

For a comprehensive technical review, my website speed and technical test tool can flag several crawlability issues alongside performance metrics.

Crawl Budget and Why Robots.txt Matters for Large Sites

Crawl budget is the number of pages Googlebot will crawl on your site within a given time window. For most small sites (under a few hundred pages), crawl budget is not a meaningful constraint — Google will crawl everything reasonably quickly regardless. But for large sites — e-commerce stores with thousands of product variants, news sites with hundreds of thousands of articles, or SaaS platforms with user-generated URL structures — crawl budget management is a real SEO discipline.

Robots.txt is the most direct lever you have on crawl budget. Every URL you block is one fewer crawl request Googlebot makes, which frees budget for the pages you actually want indexed.

What Wastes Crawl Budget

  • URL parameters that create duplicate versions of the same page (e.g., ?ref=newsletter, ?sort=price)
  • Faceted navigation pages on e-commerce sites (every filter combination creates a new URL)
  • Paginated archives deeper than page 5 or 10
  • Internal search results pages
  • Session ID URLs
  • Duplicate print versions of pages
  • Empty category pages and thin tag archives

How to Use Robots.txt to Manage Crawl Budget

Block entire path patterns rather than individual URLs. Use precise path matching — Disallow: /?s= blocks all WordPress search results, Disallow: /tag/ blocks all tag archives. Check Google Search Console's Crawl Stats report to see which sections of your site Googlebot is spending the most time on — if it is burning requests on low-value pages, that is your signal to add Disallow rules.

Pair robots.txt with canonical tags and XML sitemaps for a complete signal. A URL blocked in robots.txt, canonicalised to another page, and excluded from your sitemap sends a strong consistent signal that the URL has no independent value.

Note: Crawl budget management is primarily relevant for sites over 10,000 pages. If your site is smaller, focus on the quality of your content and internal linking structure first — use robots.txt mainly to block admin and private paths, not as a crawl optimisation strategy.

When NOT to Use Robots.txt

Robots.txt is a powerful tool used in the wrong place more often than you would expect. Here are the situations where you should reach for a different solution.

Do Not Use It to Secure Private Content

As I mentioned earlier, robots.txt is public. Using it to hide pages is counterproductive — you are both announcing their existence and only stopping well-behaved bots. Secure private content with authentication. Password-protect it. Put it behind a login. Do not rely on robots.txt.

Do Not Use It as Your Only noindex Strategy

If you want a page out of Google's index, blocking it in robots.txt is not the right method. The page may still get indexed from external links. Use a meta robots noindex tag and ensure the page is crawlable. This is the authoritative way to remove a page from search results.

Do Not Use It to Fix Duplicate Content at Scale

If you have large-scale duplicate content issues, the canonical tag is the right tool. Blocking duplicate URLs in robots.txt prevents them from being crawled, but it does not tell Google which version is the authoritative one. Canonical tags pass that attribution explicitly and also pass link equity to the canonical URL.

Do Not Use It to Remove an Already-Indexed Page

If a page is already indexed and you add it to robots.txt, Google will stop crawling it — but the page will remain in the index, possibly indefinitely, because Google can no longer reach the noindex tag (even if you add one). To remove an already-indexed page, use Google Search Console's URL Removal tool for immediate results, combined with a noindex tag on a page that is still crawlable.

Do Not Forget to Remove the Staging Block

Development and staging environments typically have Disallow: / to prevent indexing. When you launch, that robots.txt must be replaced. I have seen live production sites running for months with a Disallow: / blocking every crawler, wondering why organic traffic has flatlined. Always verify your robots.txt immediately after any deployment.

If you are unsure about any of this for your specific site, I am happy to take a look during a free consultation call. I will check your robots.txt, Search Console coverage report, and overall crawl health in one session.

You might also find my headline analyzer tool useful — once your pages are indexed, your title tags and headlines are the primary driver of click-through rate from the SERP.

Download the Complete Robots.txt Cheat Sheet

Get my one-page PDF reference: every directive explained, the most common mistakes to avoid, and platform-specific rules for WordPress, Shopify, and Next.js. Free.

Frequently Asked Questions

A robots.txt file is a plain text file placed at the root of your website (e.g., yoursite.com/robots.txt) that tells search engine crawlers which pages or sections they are allowed or not allowed to visit. It follows the Robots Exclusion Protocol and is the first file most bots check before crawling any other page on your site.
Your robots.txt file must be placed in the root directory of your domain — accessible at yoursite.com/robots.txt. It cannot be in a subdirectory. If your site is in a subfolder, a robots.txt placed there will not be recognised by search engines.
Robots.txt can prevent Googlebot from crawling a page, but it does not guarantee the page won't appear in search results. If other sites link to a blocked page, Google may still index it — showing the URL with no description. To prevent indexing entirely, use a noindex meta tag on a page that is still crawlable.
Disallow in robots.txt tells crawlers not to visit a URL. The noindex directive (in a meta tag or HTTP header) tells crawlers they can visit the page but must not include it in search results. Blocking a page with Disallow can actually prevent Google from seeing the noindex tag — so the page might still appear in results without a description snippet.
Yes. Disallow /wp-admin/ for all bots, but add an Allow rule for /wp-admin/admin-ajax.php. This file powers front-end functionality used by many WordPress plugins — including contact forms, WooCommerce cart updates, and dynamic content. Blocking it entirely can break live features on your site.
Add a Sitemap directive at the bottom of your robots.txt file on its own line: Sitemap: https://yoursite.com/sitemap.xml. This is not part of any User-agent block — it sits at the top level. You can include multiple Sitemap lines if you have more than one sitemap file. This helps search engines discover and prioritise your content faster.
Yes, completely free with no account required. All presets, the visual builder, real-time validation, copy, and download are available at no cost. Sprout Sage Solutions builds these tools to help website owners and marketers get technical SEO right without needing an enterprise SEO platform.
Absolutely. A misconfigured robots.txt is one of the most common and damaging SEO mistakes. Blocking CSS or JavaScript prevents Google from rendering your pages correctly. Blocking your entire site with Disallow: / can de-index everything. Blocking your sitemap URL stops crawlers from finding your content map. This generator includes real-time validation that warns you about all of these issues before you download.
contact

Feel Free to Write Our Tecnology Experts

    Free 30-min SEO audit3 prioritized wins. No pitch.
    Book →