Robots.txt Generator

Visually build and test your robots.txt file — no coding required

1Configure Rules

User-Agent:
|
Crawl-delay:

Sitemap URLs

Sitemap:

2robots.txt Output

User-agent: *
Disallow: 

URL Test Simulator

What Is robots.txt?

A robots.txt file tells web crawlers which pages they can and cannot access on your website. It lives at the root of your domain (e.g. https://example.com/robots.txt) and is one of the first files crawlers check before indexing. While it cannot force compliance, major search engines like Google and Bing respect its directives.

How to Deploy

1

Generate & Copy

Build your rules using the form above, then click Copy to copy the generated text.

2

Create the File

Save the content as a plain text file named robots.txt (no file extension changes).

3

Upload to Root

Upload the file to the root directory of your website so it's accessible at https://yourdomain.com/robots.txt. For WordPress, this is typically managed by SEO plugins like Yoast or Rank Math.

4

Verify

Open https://yourdomain.com/robots.txt in your browser to confirm it's accessible. Then use Google Search Console → Settings → Crawling → robots.txt to validate.

Directives Explained

User-agent

Specifies which crawler the rules apply to. Use * for all bots, or a specific name like Googlebot for targeted rules.

Disallow

Blocks crawlers from accessing the specified path. Disallow: /admin/ blocks everything under /admin/. An empty value means nothing is blocked.

Allow

Overrides a broader Disallow rule for a specific path. Allow: /admin/public/ re-enables access within a blocked /admin/ directory.

Sitemap

Tells crawlers where your XML sitemap is located. Not tied to any User-agent block — all crawlers will see it.

Crawl-delay

Requests crawlers wait N seconds between requests. Supported by Bing and Yandex, but ignored by Google (use Google Search Console instead).

Wildcards

Use * to match any sequence of characters, and $ for end-of-URL matching. Example: /*.pdf$ blocks all PDF files.

Best Practices

  • Always include a Sitemap directive pointing to your XML sitemap for faster discovery
  • Don't use robots.txt to hide sensitive content — use authentication or noindex meta tags instead
  • Test your robots.txt with Google Search Console before deploying changes
  • Be careful with Disallow: / — it blocks your entire site from being indexed
  • Use specific User-agent directives to block AI training bots without affecting search engine crawling
  • Remember that robots.txt is publicly visible — don't put private paths in it that you want to keep hidden

Frequently Asked Questions

Does robots.txt prevent pages from appearing in Google?

Not necessarily. While Disallow prevents Googlebot from crawling a page, Google may still index the URL if other pages link to it — it just won't know the content. To fully prevent indexing, use a noindex meta tag instead.

Can robots.txt block all AI crawlers?

You can block known AI bots (GPTBot, ClaudeBot, CCBot, Google-Extended, Bytespider) by adding specific User-agent rules. However, not all AI crawlers identify themselves, so it's not a guarantee. The "Block AI Bots" preset above covers the major ones.

What happens if a rule conflicts with another?

When multiple rules match the same URL, the most specific (longest) rule wins. For example, if you have Disallow: /admin/ and Allow: /admin/public/, the /admin/public/ path will be allowed because it's more specific.

Does Crawl-delay work with Google?

No. Google ignores the Crawl-delay directive entirely. To control Google's crawl rate, use the Crawl Rate Settings in Google Search Console. Crawl-delay is supported by Bing, Yandex, and some other crawlers.

Where should I upload my robots.txt file?

It must be at the root of your domain: https://example.com/robots.txt. Placing it in a subdirectory (like /pages/robots.txt) will not work. Each subdomain needs its own robots.txt file.