Should product pages ever be blocked in robots.txt?

Never. Blocking product pages in robots.txt prevents Google from crawling them — and if Google cannot crawl them, it cannot index them regardless of your sitemap. Never block any page you want to appear in Google search results.

What should be blocked in robots.txt for an e-commerce site?

Internal search results (/search?q=...), checkout and cart pages, account pages, admin pages, API endpoints, staging/preview URLs. Do NOT block filter URLs via robots.txt — use canonical tags instead (blocking filters prevents Google from reading the canonical).

Can robots.txt conflict with noindex tags?

Yes — this is the most dangerous conflict. If a page is blocked in robots.txt, Google cannot crawl it to read the noindex tag. Result: Google may still index the page (via links from other sites) without being able to read the noindex. Always use noindex without robots.txt blocking on the same URLs.

Robots.txt for e-commerce: configuration and best practices

The robots.txt file controls which pages search engine crawlers can access. On an e-commerce site, it protects crawl budget by blocking non-valuable URLs — but a misconfiguration can accidentally block product pages or entire catalogs from Google.

Standard robots.txt for an e-commerce site

User-agent: *
Disallow: /admin/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search

# Allow product and collection pages
Allow: /products/
Allow: /collections/

Sitemap: https://example.com/sitemap.xml

On Shopify, the default robots.txt is well configured. Avoid modifying it unless you have a specific reason — Shopify already blocks admin, checkout, and account pages.

What to block

Admin areas, checkout, cart, account pages, internal search results, API endpoints, staging/preview URLs. These pages have no SEO value and consume crawl budget.

What NOT to block

Products, categories, blog posts, homepage, brand pages. Never block pages with a canonical tag — Google cannot read the canonical if it cannot crawl the page.

Sitemap declaration

Always declare your sitemap URL in robots.txt: Sitemap: https://example.com/sitemap.xml. This ensures all crawlers find your sitemap, not just Googlebot which reads Search Console.

URL type	Block in robots.txt?	Alternative
Admin/checkout/cart	Yes	None needed
Filter URLs (?color=red)	No	Use canonical tags
Noindex pages	No	Use noindex meta tag only
Product pages	Never	They must be crawlable

The most dangerous robots.txt error: accidentally blocking all pages with Disallow: /. This immediately drops all organic traffic. Always test your robots.txt with the Google Search Console robots.txt tester before publishing.

Audit your robots.txt

30 min call · no commitment

Audit your robots.txt

Created in France (Annecy – Chantilly) · Email & Google Meet support

FAQ

Robots.txt is a plain text file at the root of your domain (example.com/robots.txt) that contains crawling instructions for bots. It is the first file Googlebot checks before crawling your site. It cannot prevent indexation — only crawling.

See Seegea in action

Book a 30-min live demo on Google Meet. No commitment.

Book a demo

E-commerce robots.txt: rules that prevent disasters

Standard robots.txt for an e-commerce site

What to block

What NOT to block

Sitemap declaration

FAQ

See Seegea in action

Keep reading