One of the many SEO obstacles facing bloggers is duplicate content. It isn’t just about what you see (although, that is important too), but about what you don’t see.
Bloggers want to show their content in more than one place: the homepage, categories, post page, etc. The problem is, search engines couldn’t give a flying rat-monkey. All they care about is the fact they are seeing the same content repeated in multiple places. This will automatically result in either a penalty or your pages being thrown into the supplemental index.
Photo by Bob Jagendorf
Step 1: Remove Duplication on archives, tags, and homepage
To stop this from happening, you have several choices.
- You can use the_excerpt() instead of the_content() on your index, category, and tags pages. (You can customize the_excerpt by writing within the “Excerpt” box under “Advanced Options” in the post edit page.)
- You can use <!–more–> to place excerpts on your homepage. Although, this will be very time consuming if you have many posts.
If you wish to use full posts on other pages, such as tags or categories, you will have to do something more drastic: noindex or exclude bots by robots.txt. Of course, I do not recommend this approach. At least, not for both categories and tags.
Whichever you choose, this will help you on your way to becoming duplicate content free.
Step 2: Remove duplicate content due to canonicalization
The most common duplicate content for bloggers has nothing to do with posts, but the way their server handles page requests.
Can you access your website at both www and non-www? What about with www.example.com/index.html and www.example.com/? If so, then you may already have problems with search engines.
Every time a search engine sees a copy of a page (even between non-www and www), it is seen as duplicate content. As well, links pointing to one will not benefit the other. An example of this is 10 links are pointing to www.example.com, 5 to example.com. The linkjuice from each do not benefit the other and only hurt themselves by telling search engines they are two completely different websites, but with the exact same content.
So how do we fix this? By telling your server to redirect the URLs using a 301.
To redirect your index pages, place this in your .htaccess file (be sure to change example.com to your own domain):
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{THE_REQUEST} ^.*/index.html
RewriteRule ^(.*)index.html$ http://example.com/$1 [R=301,L]
RewriteCond %{THE_REQUEST} ^.*/index.php
RewriteRule ^(.*)index.php$ http://example.com/$1 [R=301,L]
</IfModule>
To redirect www to non-www use this:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.example\.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [R=301,L]
</IfModule>
Or use this to redirect non-www to www:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
</IfModule>
As well, be sure to change example.com to your domain name.
If you are wondering whether to use a www or non-www, it doesn’t matter. Personally, I have used both on websites, but tend to favor non-www.