Duplicate title tags and meta descriptions after removing .html extension from files
.htaccess files are extremely useful in many cases for users who either do not have root permissions or for users who simply aren't comfortable in making changes in their web server's configuration file. Trying to debug .htaccess not working isn't always the easiest thing to do, however, hopefully by checking the discuss below mentioned about seo, google-search-console, htaccess, duplicate-content, .htaccess common problems as well as the troubleshooting tips, you'll have a better grasp on what you may have to modify to get your .htaccess file running smoothly.Problem :Google Webmaster Tools/Search Console is giving me errors regarding duplicate title tags and meta descriptions.
The website in question is a static HTML website. All documents do have a .html extension. In order to remove the .html from all documents I am using the code below in my .htaccess file:
RewriteCond %REQUEST_FILENAME !-f
RewriteRule ^([^.]+)$ $1.html [NC,L]
So for example http://example.com/about.html becomes http://example.com/about Now Google thinks that there are two separate about pages -
even though it's only one. Can someone explain to me how to resolve this?
Let's assign www.example.com/about - Is your main URL and that URL you want to index in Google.
And www.example.com/about.html - Is your duplicate URL and that you don't want to index it on Google.
So There are two perfect solution. You can use any one or both.
1 ) Use 301 redirection from example.com/about.html to example.com/about
. So Google will index only the final or redirected version of URL.
2) Use Canonical link tag on head section.
Your pages are duplicate hence your canonical link tag will be same on all these pages.
www.example.com/about/
www.example.com/about
www.example.com/about.html
www.example.com/about/index.html
So when you place below canonical link tag then all above pages will inheirt same canonical link tag, just like the webpage title/description is same for all URL's
<link rel="canonical" href="https://www.example.com/about" />
So here Google will index only that canonical link tag, other pages will consider as duplicate and Google avoid to index it.
If your .html URLs were already indexed at the time you changed your URLs (and removed the .html extension) then the only way to preserve your SEO and avoid duplicate content from the get go is to implement 301 redirects from the .html URL to your desired URL.
(This assumes you have changed all the URLs in your application to your desired "extensionless" URLs.)
Something like the following at the top of your .htaccess file:
RewriteCond %ENV:REDIRECT_STATUS ^$
RewriteRule (.+).html$ /$1 [R=301,L]
The check against REDIRECT_STATUS is to avoid a redirect loop by ensuring the rewritten request (to .html) is not redirected (when the internal rewrite is triggered, REDIRECT_STATUS is set to 200).
In order to remove the
.htmlfrom all documents I am using the code below in my .htaccess file
Aside: I guess this is probably just how you are describing it, but that isn't actually what that snippet of code does. You "remove the .html" from the URL by physically changing the URLs in your application (not with .htaccess). You then use .htaccess to internally rewrite the URL back to the actual filesystem path (with the .html extension) - and it's this that your snippet of code does. It re-appends the .html extension, it doesn't remove it.
Comments
Post a Comment