What is the robots.txt used for
Editing the robots.txt is an essential part of an onsite SEO strategy, particularly with Magento.
The robots.txt is a file that allows you to tell Google and other search engines which parts of the website they should not access. By default, all Magento stores have a robots.txt that can be found at: yourdomain.com/robots.txt
You’d probably expect to locate and edit the robots.txt under Marketing then SEO & Search, however, Magento being Magento this is not the case (hence this post)
So…
How do you edit and configure the robots.txt in Magento 2?
How do you edit the robots.txt in Magento 2?
- Log in to Magento back end
- Click Content, under Design then choose Configuration
- Select edit on the ‘global’ main website section.
- Expand “search engine robots”
- “Edit custom instructions of the robots.txt file” is the section of Magento 2 that manages what lives on yourdomain/robots.txt
- Don’t forget to test & click save in the top right of the screen once you’re happy
Note: This approach works for version 2.3.0 and onwards of Magento:
What to include in a Magento robots.txt file?
The robots.txt can block your entire website from search engines if implemented incorrectly. You should always get a custom robots.txt written by an SEO.
To completely block the default magento URLs from search engines you can implement the following into the robots.txt:
# Clean
Disallow: /catalogsearch/
Disallow: /*?SID=
Disallow: /*?cat=
Disallow: /*?cat=*&color=*
Disallow: /*?color=*
Disallow: /*?product_list_order=
Disallow: /*?product_list_limit=
Disallow: /catalog/category/view/
Disallow: /catalog/product/view/
Disallow: /*?dir*
Disallow: /*?dir=desc
Disallow: /*?dir=asc
Disallow: /*?q=*
Disallow: /*?___store=default
Disallow: /*?___SID=
Disallow: /*?___
Note: we’d still recommend testing this set up as there may be scenarios where your site is not SEO friendly at all and the above may block pages that are sending you traffic (even though they shouldn’t be)
How to write the Robots.txt file?
Robots.txt rules are as follows:
- “Disallow:” = do not crawl any URL that contains the string added after it
- # = just a noting system that Google ignores make sure you don’t do # Disallow as Google just ignores it
- Sitemap: = Add your sitemap to the robots.txt so it encourages Google to crawl it more
- * = this is a wildcard and basically means ‘anything’ so be careful when using it
Why edit the robots.txt file in magento 2?
You may want to edit this to stop Google or other search engines from crawling certain parts of your site. Examples vary drastically per website, but examples are:
- To block individual checkout urls that have ID parameters
- If you have landing pages only a certain marketing channel should see e.g. paid search or email you can put these into a folder called /landing-pages/ and block everything in this folder in the robots.txt
Conclusion
The robots.txt is an essential part of an onsite SEO strategy for a Magento eCommerce site. If you’re not using it it’s likely Google is accessing parts of your site it shouldn’t. You’d want to restrict Google access to focus on pages that add the most commercial value to your store.
As always, remember to test your set up! Google offers this free tool to test if you have any errors in your robots.txt file: https://support.google.com/webmasters/answer/6062598?hl=en
If you have any concerns or queries feel free to email us.