htaccess for redirects and gone 410

htaccess for redirects and gone 410 - .htaccess files are extremely useful in many cases for users who either do not have root permissions or for users who simply aren't comfortable in making changes in their web server's configuration file. Trying to debug .htaccess not working isn't always the easiest thing to do, however, hopefully by checking the discuss below mentioned about google-search, htaccess, 301-redirect, 410-gone, .htaccess common problems as well as the troubleshooting tips, you'll have a better grasp on what you may have to modify to get your .htaccess file running smoothly.Problem :


I am facing a problem. THE Problem is that in google search there some spam links that are of type HTTP and I would like to return 410 for those urls that actually do not exist. but for people who type mywebsite.com in url (if they do not add https:// in the beginning they do receive 410 status). if i modify htaccess to redirect all the http requests to https, then all those spam links (there are tons of them and all of those are different pattern such as: /spstyaaliti4csf6ne.desiringly, /rd9ot2iob.twiceuttered, /pi6odntdezo17dia.deranger, /disanidasi34 and so on.) gets redirected to https and then responds with 404 (but tried to check on httpstatus.io as google bot it responds 301 301 200. which should not be the correct behavior.


htaccess file has these lines: those are for https forwarding


RewriteEngine On
RewriteCond %HTTPS off
RewriteCond %HTTP_HOST ^(www.)?example.lt$ [NC]
RewriteRule ^(.*)$ https://www.example.lt/$1 [L,R=301]

and this were for 410


RewriteEngine on
RewriteCond %HTTPS !=on [NC]
RewriteRule ^(.*)$ https://%HTTP_HOST%REQUEST_URI [R=410,L]

How can I deal with those redirects and 410? I want normal people typing in browser example.lt (without need to write https:// beforehand to access my website) and those spam URLs to return 410.


Solution :

Ideally you would fix your site to not return a 200 status after the redirect. Pages that don't exist on your site should return a "404 Not Found" status. However, even if you are return 200 with a "not found" message, Google is smart enough to treat that like a 404. Google calls that a "soft 404" error and treats it just like a real 404 error. Of course having your site return the correct error code makes it unambiguous for search engine crawlers and testing tools like the one you tried to use.


It's fine if you redirect to HTTPS for spam links and then serve a 404 error or a soft-404. It is not required to configure your server serve a 410 Gone status before redirecting. Google treats "410 Gone" status and "404 Not Found" status pages nearly identically: it removes them from the search results. The only difference is that 410 pages get removed immediately while 404 pages have a 24 hour grace period. Googlebot is perfectly able to process a redirect to a 404, 410 page, or soft 404, and remove the URL from search results.




A better 410 Gone rule


Mod rewrite does not use the redirect flag ([R=410]) for setting the 410 status. Rather, it has its own flag for "gone": [G]. When you write a rule for "410 Gone" you should use the gone flag.


One approach to showing 410 for the spam URLs this would be to use a regular expression to try to match the those URLs specifically. Based on the examples you gave, you could use something like this:


# If the request is not for a valid directory
RewriteCond %REQUEST_FILENAME !-d
# If the request is not for a valid file
RewriteCond %REQUEST_FILENAME !-f
# If the request is not for a valid link
RewriteCond %REQUEST_FILENAME !-l
# 410 gone for a long string of lowercase letters
# and/or numbers followed by an optional long extension
# to handle spam URLs like /spstyaaliti4csf6ne.desiringly
RewriteRule ^/?[a-z0-9]12,30(.[a-z0-9]8,30)?$ - [G,L]

This rule set would have to go at the top of your .htaccess file before any of your other rules.


Additionally, if you would like to do some further testing, give the htaccess tester tool a try. It allows you to specify a certain URL as well as the rules you would like to include and then shows which rules were tested, which ones met the criteria, and which ones were executed.

Comments

Popular posts from this blog

Rewrite in Mediawiki, remove index.php, .htaccess

.htaccess rewrite wildcard folder paths from host

Using .htaccess to set a cookie and 301 redirect