Tired of Googlebot crawling your layered navigation?

NOTICE! Use of this .htaccess snippet is on your own responsibility. Powerhosting ApS can not and will not be responsible of your rating on Google or anywhere else, whether you use this snippet or not. However, used the right way it will definetely do more good than harm.

As a company hosting thousands of e-commerce sites we regularly experience sites being crawled extensively by Googlebot. And normally a shop owner would like google to drop by every once in a while, but when it starts to crawl your filter or layered navigation you will soon realize it can cause a lot of harm. Not only for your SEO with duplicate content, but also your server ressources can be challenged.

With most layered navigation modules the filter is implemented via Query strings. So that when you press e.g. Size L, the url will change to what-ever-url-you-are-on?size=l. And when you then add price filtering it will look like: what-ever-url-you-are-on?size=l&price=100-200*. Playing around with any given layered navigation you will soon realize that the combinations are endless. And if you do not believe me, then wait for Googlebot to come by.

Over the years many of our customers experiencing this issue, have tried to handle this in robots.txt, or setting noindex nofollow X-Robots-Tag headers or meta tags, with mixed outcome. Most of the time it seems that Googlebot simply does not care. The only thing that always works is sending the right http response code to the bots when they are crawling query_strings.

The following is an example that must be modfied to meet the need of your situation. The query strings are often time custom. Looking through your access log will help you find the right query strings to use. This needs to be in the top of your .htaccess file in your document root.

# Sending 410 Gone to bots on filters / layered navigation - Here be dragons

<IfModule mod_rewrite.c>
        RewriteCond %{QUERY_STRING} (size=|manufacturer=|price=|limit=) [NC]
        RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|yahoo) [NC]
        RewriteRule .* - [G,NC]
</IfModule>

Most of the time you would only want to edit the line containing the query strings. Remember this is a regular expression, and that is why we use parenthesis in combination with pipe to let you define multiple query strings in the same line. NC means case insensitive or nocase.