Is it a good practice to block spam bots by IP ranges?
.htaccess files are extremely useful in many cases for users who either do not have root permissions or for users who simply aren't comfortable in making changes in their web server's configuration file. Trying to debug .htaccess not working isn't always the easiest thing to do, however, hopefully by checking the discuss below mentioned about htaccess, web-crawlers, robots.txt, spam, .htaccess common problems as well as the troubleshooting tips, you'll have a better grasp on what you may have to modify to get your .htaccess file running smoothly.Problem :I'm getting too many hits on my website from Yandex. It doesn't obey robots.txt and by some way it can bypass .htaccess rules. So I'm thinking to block all Yandex IP ranges in my system firewall.
Is that a good way that won't block legit users/traffic as well?
What are the downsides for such action?
I'm getting too many hits on my website from Yandex.
Take a look at http://en.wikipedia.org/wiki/Yandex. Yandex is a popular russian search engine. Never knock down search engines as they make your site more available to the public which means higher chance of revenue for you if you run ads and/or an online business.
It doesn't obey robots.txt...
As closetnoc states, search engines are good at obeying robots.txt
...and by some way it can bypass .htaccess rules.
I agree with w3d here. Your rules are likely wrong. Perhaps they are being applied to the wrong user agents or IP addresses.
So I'm thinking to block all Yandex IP ranges in my system firewall. Is that a good way that won't block legit users/traffic as well?
This is definitely not a good idea unless you plan on blocking a portion of the world from discovering your website.
What are the downsides for such action?
If you decide to block a fixed set of IP addresses, then unless they are well known and you know for a fact those IP addresses belong to the same individual or business every time, you could be blocking legit users.
Try using other methods to block spam by adding captcha features to any of your web forms on your site or limit the connection rate for the bots.
Also, make excellent use of whois. There's an online version at whois.com and in the top right box, enter any IP addresses in question to see who actually has that IP address you're trying to block. Chances are its one from china.
Also, look into Honeypot. It's like a method where a bad bot can be discovered by making an invisible link that only robots click on, not real users. Then block further access from those IP addresses. Such idea can all be done in PHP and I think in ASP as well.
See: http://en.wikipedia.org/wiki/Honeypot_(computing)
Comments
Post a Comment