SEO · May 22, 2026 · 5 min read

How to Write a robots.txt File — A Practical Guide

A robots.txt file tells search engine crawlers which pages to access and which to skip. Learn how to write one correctly and avoid common mistakes.

The file that controls how search engines crawl your site

Every website should have a robots.txt file. It sits at the root of your domain (https://example.com/robots.txt) and provides directives to web crawlers about which parts of your site they’re allowed to access.

Robots.txt is not a security mechanism — it’s a polite request, not a technical barrier. Well-behaved crawlers (Googlebot, Bingbot) respect it. Bad actors ignore it. But for search engine optimization, getting robots.txt right helps search engines crawl your site efficiently and focus on the pages that matter.

Generate your robots.txt file with Robots.txt Generator.

The basic structure

User-agent: Googlebot
Disallow: /private/

User-agent: *
Allow: /
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml

User-agent — Specifies which crawler the following rules apply to. * is a wildcard that applies to all crawlers. Googlebot targets Google’s crawler specifically.

Allow — Permits crawling of a specific path (even within a broader Disallow).

Disallow — Prevents crawling of a specific path.

Sitemap — Points crawlers to your XML sitemap. Always include this.

Rules are processed in order for each user-agent. More specific rules take precedence over general ones.

What to disallow

Disallow staging or duplicate content — If your site has staging URLs, parameter-based duplicates, or print versions accessible to crawlers, disallow them. Duplicate content wastes crawl budget and dilutes ranking signals.

Common paths worth disallowing:

Disallow: /admin/
Disallow: /login/
Disallow: /cart/
Disallow: /checkout/
Disallow: /search?
Disallow: /*?sort=
Disallow: /*?filter=

Disallow internal search results — Search result pages (/search?q=anything) create infinite URL combinations with near-identical content. Disallowing them prevents crawl budget waste.

Disallow parameter-based duplicates — If your URLs use query parameters for sorting, filtering, or pagination that create effectively duplicate pages (/products?sort=price, /products?sort=name), disallow the parameterized versions.

What NOT to disallow

Pages you want to rank — This sounds obvious, but misconfigured robots.txt files accidentally blocking important pages are one of the most common technical SEO issues. Verify that your primary landing pages, blog posts, and category pages are not blocked.

CSS and JavaScript files — Google needs to render your pages properly. Blocking CSS and JS files prevents Google from seeing your site as users do, which can hurt rankings. Many older robots.txt guides recommend blocking these — ignore that advice.

The sitemap location itself — Sitemap: in robots.txt is a convention, not a directive. The sitemap file itself doesn’t need to be allowed — it’s not crawled as a page.

Allowing specific crawlers while blocking others

You can write different rules for different crawlers:

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: AhrefsBot
Disallow: /

User-agent: *
Allow: /

This configuration allows Google and Bing full access, blocks Ahrefs’ crawler, and allows all others by default. This approach is used by sites that don’t want third-party SEO tools crawling their content.

Testing your robots.txt

After writing your robots.txt:

Verify the file is accessible — Visit yourdomain.com/robots.txt directly. If you see the file contents, it’s working.
Test in Google Search Console — Google Search Console has a robots.txt tester tool that shows which Google crawlers can access any URL given your current file.
Check for accidental blocks — Run a crawl of your site (using Screaming Frog or similar) and filter for pages blocked by robots.txt. Verify every blocked URL is intentionally blocked.
Confirm your sitemap is referenced — The Sitemap: directive should point to the correct URL of your XML sitemap.

Common mistakes

Blocking the entire site during development — Setting Disallow: / to block all crawlers during development and forgetting to update before launch. This prevents Google from indexing the site entirely.

Inconsistent trailing slashes — Disallow: /admin and Disallow: /admin/ behave differently. /admin blocks the path without a trailing slash and may not block /admin/dashboard/. Always use trailing slashes for directory paths.

Case sensitivity — robots.txt paths are case-sensitive on most servers. Disallow: /Admin/ doesn’t block /admin/.

Blocking without noindex — Robots.txt Disallow prevents crawling but doesn’t remove a page from the index if it’s already indexed and has inbound links. Use a noindex meta tag for pages you want excluded from search results entirely.

Use Robots.txt Generator to build your file with a visual interface that prevents syntax errors.