Redirect Google crawler to different robots.txt via .htaccess

Redirect Google crawler to different robots.txt via .htaccess - .htaccess files are extremely useful in many cases for users who either do not have root permissions or for users who simply aren't comfortable in making changes in their web server's configuration file. Trying to debug .htaccess not working isn't always the easiest thing to do, however, hopefully by checking the discuss below mentioned about htaccess, subdomain, web-crawlers, robots.txt, .htaccess common problems as well as the troubleshooting tips, you'll have a better grasp on what you may have to modify to get your .htaccess file running smoothly.Problem :


I have googled for the answer all day and still couldn't find an answer.



I have a virtual subdomain www.static.example.com which is a mirror site of www.example.com. It means I have just one root folder for subdomain and domain aswell.



I want to redirect crawlers to different robots.txt file - robots_static.txt when they see .static in url in which I will forbid indexing via /disallow command. I want to do this because I have duplicated content in Google search results. Subdomain is showing the exact same content as the main domain.



Does anyone know how could I achieve that crawlers sees robots_static.txt instead of robots.txt?



What I have managed to find so far is this:



RewriteCond %HTTP_HOST ^www.static.*$ [NC]
RewriteCond %THE_REQUEST ^[A-Z]3,9 /.*robots.txt.* HTTP/ [NC]
RewriteRule ^robots.txt /robots_static.txt [NC,L]


but when I check in webmaster tools, it still sees robots.txt as my robots file instead of robots_static.txt, so it crawls and index everything twice.



What did I do wrong?
Thanks



EDIT:
This is my .htaccess file



##
# @package Joomla
# @copyright Copyright (C) 2005 - 2013 Open Source Matters. All rights reserved.
# @license GNU General Public License version 2 or later; see LICENSE.txt
##

##
# READ THIS COMPLETELY IF YOU CHOOSE TO USE THIS FILE!
#
# The line just below this section: 'Options +FollowSymLinks' may cause problems
# with some server configurations. It is required for use of mod_rewrite, but may already
# be set by your server administrator in a way that dissallows changing it in
# your .htaccess file. If using it causes your server to error out, comment it out (add # to
# beginning of line), reload your site in your browser and test your sef url's. If they work,
# it has been set by your server administrator and you do not need it set here.
##

## Can be commented out if causes errors, see notes above.
Options +FollowSymLinks

## Mod_rewrite in use.

RewriteEngine On

RewriteEngine On
RewriteCond %HTTP_HOST !^www.
RewriteRule ^(.*)$ http://www.%HTTP_HOST/$1 [R=301,L]




RewriteCond %HTTP_HOST ^www.static.*$ [NC]
RewriteCond %THE_REQUEST ^[A-Z]3,9 /.*robots.txt.* HTTP/ [NC]
RewriteRule ^robots.txt /robots_static.txt [NC,L]


## Begin - Rewrite rules to block out some common exploits.
# If you experience problems on your site block out the operations listed below
# This attempts to block the most common type of exploit `attempts` to Joomla!
#
# Block out any script trying to base64_encode data within the URL.
RewriteCond %QUERY_STRING base64_encode[^(]*([^)]*) [OR]
# Block out any script that includes a <script> tag in URL.
RewriteCond %QUERY_STRING (<|%3C)([^s]*s)+cript.*(>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL.
RewriteCond %QUERY_STRING GLOBALS(=|[|%[0-9A-Z]0,2) [OR]
# Block out any script trying to modify a _REQUEST variable via URL.
RewriteCond %QUERY_STRING _REQUEST(=|[|%[0-9A-Z]0,2)
# Return 403 Forbidden header and show the content of the root homepage
RewriteRule .* index.php [F]
#
## End - Rewrite rules to block out some common exploits.

## Begin - Custom redirects
#
# If you need to redirect some pages, or set a canonical non-www to
# www redirect (or vice versa), place that code here. Ensure those
# redirects use the correct RewriteRule syntax and the [R=301,L] flags.
#
## End - Custom redirects

##
# Uncomment following line if your webserver's URL
# is not directly related to physical file paths.
# Update Your Joomla! Directory (just / for root).
##

# RewriteBase /

RewriteCond %THE_REQUEST ^GET.*index.php [NC]
RewriteCond %THE_REQUEST !/system/.*
RewriteRule (.*?)index.php/*(.*) /$1$2 [R=301,L]
RewriteCond %THE_REQUEST ^GET

## Begin - Joomla! core SEF Section.
#
RewriteRule .* - [E=HTTP_AUTHORIZATION:%HTTP:Authorization]
#
# If the requested path and file is not /index.php and the request
# has not already been internally rewritten to the index.php script
RewriteCond %REQUEST_URI !^/index.php
# and the request is for something within the component folder,
# or for the site root, or for an extensionless URL, or the
# requested URL ends with one of the listed extensions
RewriteCond %REQUEST_URI /component/|(/[^.]*|.(php|html?|feed|pdf|vcf|raw))$ [NC]
# and the requested path and file doesn't directly match a physical file
RewriteCond %REQUEST_FILENAME !-f
# and the requested path and file doesn't directly match a physical folder
RewriteCond %REQUEST_FILENAME !-d
# internally rewrite the request to the index.php script
RewriteRule .* index.php [L]
#
## End - Joomla! core SEF Section.

<FilesMatch ".(ico|pdf|flv|jpg|ttf|jpg|jpeg|png|gif|js|css|swf)$">
Header set Expires "Wed, 15 Apr 2020 20:00:00 GMT"
Header set Cache-Control "public"
</FilesMatch>

<ifModule mod_headers.c>
Header set Connection keep-alive
</ifModule>

########## Begin - Remove Etags
#
FileETag none
#
########## End - Remove Etags

Solution :

Google's bots will still want to request /robots.txt from your sub domain and not /robots_static.txt which would have no meaning to them.



RewriteCond %HTTP_HOST ^www.static..*$ [NC]
RewriteRule ^/robots.txt$ /robots_static.txt [L]


When requests for /robots.txt are made from your www.static domain the /robots_static.txt file will be served up as if it were /robots.txt


Additionally, if you would like to do some further testing, give the htaccess tester tool a try. It allows you to specify a certain URL as well as the rules you would like to include and then shows which rules were tested, which ones met the criteria, and which ones were executed.

Comments

Popular posts from this blog

Rewrite in Mediawiki, remove index.php, .htaccess

.htaccess rewrite wildcard folder paths from host

Using .htaccess to set a cookie and 301 redirect