Is it a good practice to block spam bots by IP ranges?

Is it a good practice to block spam bots by IP ranges? - .htaccess files are extremely useful in many cases for users who either do not have root permissions or for users who simply aren't comfortable in making changes in their web server's configuration file. Trying to debug .htaccess not working isn't always the easiest thing to do, however, hopefully by checking the discuss below mentioned about htaccess, web-crawlers, robots.txt, spam, .htaccess common problems as well as the troubleshooting tips, you'll have a better grasp on what you may have to modify to get your .htaccess file running smoothly.Problem :


I'm getting too many hits on my website from Yandex. It doesn't obey robots.txt and by some way it can bypass .htaccess rules. So I'm thinking to block all Yandex IP ranges in my system firewall.



Is that a good way that won't block legit users/traffic as well?



What are the downsides for such action?


Solution :


I'm getting too many hits on my website from Yandex.




Take a look at http://en.wikipedia.org/wiki/Yandex. Yandex is a popular russian search engine. Never knock down search engines as they make your site more available to the public which means higher chance of revenue for you if you run ads and/or an online business.




It doesn't obey robots.txt...




As closetnoc states, search engines are good at obeying robots.txt




...and by some way it can bypass .htaccess rules.




I agree with w3d here. Your rules are likely wrong. Perhaps they are being applied to the wrong user agents or IP addresses.




So I'm thinking to block all Yandex IP ranges in my system firewall. Is that a good way that won't block legit users/traffic as well?




This is definitely not a good idea unless you plan on blocking a portion of the world from discovering your website.




What are the downsides for such action?




If you decide to block a fixed set of IP addresses, then unless they are well known and you know for a fact those IP addresses belong to the same individual or business every time, you could be blocking legit users.



Try using other methods to block spam by adding captcha features to any of your web forms on your site or limit the connection rate for the bots.



Also, make excellent use of whois. There's an online version at whois.com and in the top right box, enter any IP addresses in question to see who actually has that IP address you're trying to block. Chances are its one from china.



Also, look into Honeypot. It's like a method where a bad bot can be discovered by making an invisible link that only robots click on, not real users. Then block further access from those IP addresses. Such idea can all be done in PHP and I think in ASP as well.



See: http://en.wikipedia.org/wiki/Honeypot_(computing)


Additionally, if you would like to do some further testing, give the htaccess tester tool a try. It allows you to specify a certain URL as well as the rules you would like to include and then shows which rules were tested, which ones met the criteria, and which ones were executed.

Comments

Popular posts from this blog

Rewrite in Mediawiki, remove index.php, .htaccess

.htaccess rewrite wildcard folder paths from host

Using .htaccess to set a cookie and 301 redirect