SEEGEA

E-commerce robots.txt: rules that prevent disasters

The robots.txt file tells search engine crawlers which pages they can and cannot access. On an e-commerce site, it is used to prevent Googlebot from wasting crawl budget on admin pages, search results pages, and internal URLs that should not be indexed.

4 min readApril 17, 2026

The robots.txt file controls which pages search engine crawlers can access. On an e-commerce site, it protects crawl budget by blocking non-valuable URLs — but a misconfiguration can accidentally block product pages or entire catalogs from Google.

Standard robots.txt for an e-commerce site

User-agent: *
Disallow: /admin/
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /search

# Allow product and collection pages
Allow: /products/
Allow: /collections/

Sitemap: https://example.com/sitemap.xml

On Shopify, the default robots.txt is well configured. Avoid modifying it unless you have a specific reason — Shopify already blocks admin, checkout, and account pages.

What to block

Admin areas, checkout, cart, account pages, internal search results, API endpoints, staging/preview URLs. These pages have no SEO value and consume crawl budget.

What NOT to block

Products, categories, blog posts, homepage, brand pages. Never block pages with a canonical tag — Google cannot read the canonical if it cannot crawl the page.

Sitemap declaration

Always declare your sitemap URL in robots.txt: Sitemap: https://example.com/sitemap.xml. This ensures all crawlers find your sitemap, not just Googlebot which reads Search Console.
URL typeBlock in robots.txt?Alternative
Admin/checkout/cartYesNone needed
Filter URLs (?color=red)NoUse canonical tags
Noindex pagesNoUse noindex meta tag only
Product pagesNeverThey must be crawlable
The most dangerous robots.txt error: accidentally blocking all pages with Disallow: /. This immediately drops all organic traffic. Always test your robots.txt with the Google Search Console robots.txt tester before publishing.

Audit your robots.txt

30 min call · no commitment

Audit your robots.txt
Created in France (Annecy – Chantilly) · Email & Google Meet support

FAQ

Robots.txt is a plain text file at the root of your domain (example.com/robots.txt) that contains crawling instructions for bots. It is the first file Googlebot checks before crawling your site. It cannot prevent indexation — only crawling.

See Seegea in action

Book a 30-min live demo on Google Meet. No commitment.

Book a demo