What should a robots.txt file include for AI search optimization?

A User-agent: * block defining your default policy for all crawlers Specific Allow rules for critical AI bots: GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot, and Google-Extended Social media bot rules for Twitterbot, facebookexternalhit, and LinkedInBot to ensure link previews work A Sitemap directive pointing to your XML sitemap for efficient crawl discovery Our generator creates all of these automatically. For a deeper strategy, explore our AI Search SEO service .

Where do I put my robots.txt file?

It should be accessible at yourdomain.com/robots.txt (not in a subfolder) Most web servers and CMS platforms have a dedicated setting for editing robots.txt On WordPress, you can use plugins like Yoast SEO or edit it via your hosting file manager On static sites, simply place the file in your public or build output directory After uploading, verify it is working with our Robots.txt Checker .

Should I allow or block AI crawlers by default?

AI platforms like ChatGPT and Perplexity are increasingly where buyers research vendors and solutions Blocking AI crawlers means your competitors will be cited instead of you If content protection is critical (e.g., proprietary research), you can block training-specific bots while allowing retrieval bots where possible Google-Extended can be blocked separately from Googlebot, letting you maintain search visibility while restricting AI training Read our analysis on why competitors get cited in AI answers when you do not.

What is the difference between Allow and Disallow in robots.txt?

Disallow: /path/ tells crawlers they should not access any URL starting with that path Allow: /path/ explicitly permits access to a path, useful for overriding broader Disallow rules Disallow: / blocks the entire site for that user-agent Disallow: (empty value) means nothing is blocked, which is the same as allowing everything Understanding these rules is foundational to technical SEO and AI visibility.

What is Crawl-delay and should I use it?

It is not officially supported by Google (Googlebot ignores it), but Bing and some other crawlers respect it It can reduce server load if you are on shared hosting or have a resource-constrained server Setting it too high can significantly slow down how quickly your new content gets indexed For most modern websites with adequate hosting, no crawl delay is recommended If server performance is a concern, consider a technical SEO audit to identify the root cause.

Can I have different robots.txt files for different subdomains?

app.yourdomain.com and www.yourdomain.com each need their own robots.txt Blog subdomains (blog.yourdomain.com) require separate configuration API subdomains should typically block all crawlers since API endpoints are not meant for indexing Staging subdomains should use Disallow: / to prevent accidental indexing of test content Managing multi-subdomain crawl access is a key part of enterprise SEO audits .

LET'S TALK

FREE TOOL

Robots.txt Generator

Create a properly configured robots.txt file with AI crawler rules, social bot access, and sitemap directives.

Already have a robots.txt? Validate it with our Checker →

Configuration

Default Access for All Robots

Sitemap URL

Disallowed Paths (one per line)

Crawl Delay (seconds)

Allow AI Crawlers

GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and more

Allow Social Media Crawlers

Twitterbot, facebookexternalhit, LinkedInBot

Your robots.txt

# robots.txt generated by Growtika
# 2026-07-07

User-agent: *
Disallow: /admin/
Disallow: /private/
Disallow: /staging/

# AI Crawlers
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: CCBot
Allow: /

User-agent: meta-externalagent
Allow: /

# Social Media Crawlers
User-agent: Twitterbot
Allow: /

User-agent: facebookexternalhit
Allow: /

User-agent: LinkedInBot
Allow: /

Validate this file with our Checker

Why Use a Robots.txt Generator?

Writing a robots.txt file manually is error-prone. A misplaced directive or forgotten crawler can make your entire site invisible to search engines or AI platforms. A robots.txt generator creates a properly formatted file based on your preferences, ensuring correct syntax and complete coverage.

This is especially important in 2026, when your robots.txt needs to account for over a dozen AI-specific crawlers beyond traditional search bots. Each AI platform (ChatGPT, Claude, Perplexity, Gemini, Copilot) uses its own user-agent, and each requires explicit rules if you are using a restrictive default policy.

Our generator creates a robots.txt file optimized for both AI search visibility and traditional SEO, with toggles for AI crawlers, social media bots, crawl delays, and sitemap declarations.

How to Use Your Generated Robots.txt

Configure your preferences using the toggles and inputs on the left.
Review the generated file in the preview panel on the right.
Click "Copy" or "Download" to get the file.
Upload the file to your website's root directory (e.g., yourdomain.com/robots.txt).
Use our Robots.txt Checker to validate the live file after deployment.

Robots.txt Best Practices for Modern Websites

What we configure for every client and recommend you do the same.

Start permissive, restrict selectively.

The safest default is Allow: / under User-agent: *. From there, add specific Disallow rules only for paths that genuinely need protection (admin panels, staging, internal APIs). The opposite approach, blocking everything and whitelisting, is fragile and guaranteed to break when someone adds a new page or directory.

Give every AI crawler its own block.

GPTBot, ChatGPT-User, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended. Each one needs its own User-agent section with Allow: /. This is not optional in 2026. If you use a restrictive wildcard policy and forget to add an AI bot, that platform cannot see your content. Our generator handles this automatically.

Include social media bots for link previews.

Twitterbot, facebookexternalhit, and LinkedInBot are not search crawlers. They render Open Graph previews when someone shares your link. If these are blocked, your shared links appear as blank cards with no title, image, or description. That kills click-through rates on social channels.

Add your sitemap URL. Every time.

A Sitemap: directive at the bottom of robots.txt tells every crawler where to find your XML sitemap. It is the simplest line you can add and one of the most impactful. Do not assume crawlers will find it through other means.

Skip the crawl-delay unless you have a real reason.

Crawl-delay is ignored by Google entirely. Bing respects it but you rarely need it. Setting a high delay slows indexing and does not meaningfully reduce server load on modern infrastructure. If your server struggles under crawler traffic, the problem is your server, not your robots.txt.

Version control your robots.txt.

Treat robots.txt like code. Put it in Git. Review changes in pull requests. A one-line accidental edit can make your entire site invisible. If someone changes this file, you want a diff, a reviewer, and a deployment log.

Robots.txt Mistakes That Cost Real Traffic

We have seen every one of these. Tap to see what went wrong and how to fix it.

CRITICAL

Generating a file but never deploying it

A marketing team generates a perfect robots.txt. It sits in a Google Doc. The live site still serves the old file.

CRITICAL

Blocking all bots by default, then forgetting to add exceptions

Setting User-agent: * Disallow: / as the default, intending to add Allow rules for specific bots. Then shipping it without the bot-specific blocks.

WARNING

Adding disallow paths that match real content

Blocking /api/ seems reasonable. But if your marketing site uses /api-integrations/ or /api-security-guide/ as blog paths, those get blocked too.

WARNING

Not including the trailing slash on directories

Writing Disallow: /admin instead of Disallow: /admin/

TIP

Using crawl-delay when you do not need it

Setting a 10-second crawl delay because someone read it was best practice.

Frequently Asked Questions

Need Help With AI Search Visibility?

A correct robots.txt is the foundation. We help B2B companies become the recommended answer in ChatGPT, Perplexity, and Google AI Overviews.

Get a Free AI Visibility Audit →

Company

Solutions & Use Cases

Sectors

Earn a Link the Hard Way

A game in a footer? Bold move.

Top 3 · Fight for the Link

#	Company	Score
🥇	Octo	43171
🥈	Celesties	3783
🥉	vibeberry.io	2990

Only the top 3 get a link from Growtika. Fight for it.

Beat the Leaderboard →

Outrun the top 3 and claim a do-follow link from Growtika.

SEO & AI Search Agency for B2B SaaS. Built for how buyers actually find you.

★Top Product MarketingClutch