Canonical Link Tag Examples - A Way to Avoid Duplicate Content
80Does Duplicate Content Make Sense? Use the Canonical Link Tag to Show Robots the Way.
As you probably know, most of the major search engines use robots to index websites. It's however difficult for many of these robots to decide on which information to index when it comes across session ids, pagination links, sort menus, gallery view links, results per page links etc. By providing the canonical link tag, we are telling the robots exactly which information to index.
While there are many articles online explaining what canonical link tags do, I couldn't find any posts that gave real world examples on working with page links, view option links (example: list view or gallery view) and results per page links. In this post I will provide information on which link variables to remove and which to keep.
Please understand that these decisions should be made carefully and you should not attempt this if you do not fully understand the concept. You are doing this at your own risk and the slightest mistake could do more harm than good.
First I will start with the basics:
Duplicate content or duplicate title & description meta tag suggestions in Google webmaster tools usually take place when a page on your website is accessed by more than 1 unique URL. For example, on many websites the homepage can be reached through multiple URLs, please see the 3 URLs below:
http://www.yourdomain.com
http://www.yourdomain.com/
http://www.yourdomain.com/index.html
All of the above URLs look different yet the landing pages remain the same. By implementing the canonical link tag, you are telling the robots which URL to use as your homepage.
To specify 'http://www.yourdomain.com/' as you homepage via a canonical link tag, you would add the below canonical link tag in between the head tags '<head></head>' of your page. Please see the canonical link tag below:
<link rel="canonical" href="http://www.yourdomain.com/" />
Working with session id's:
Say your website generates session id's for each visitor that visits your site, as shown with the 'sid' variable in the URL below:
http://www.yourdomain.com/ about_us.html?sid=123456789
In this case your canonical link tag would look like the example below:
<link rel="canonical" href="http://www.yourdomain.com/about_us.html" />
The above example removes the sid variable from your URL because the about us page remains the same and the sid variable is only being used to pass session id's. By removing the sid from your URL, your are not only preventing URLs with session id's from being indexed, you are also preventing robots from seeing these URLs as duplicate content.
A more advanced look into canonical link tags would be pages that are the same yet have variables that change what is being shown on our pages. A good example would be the following URL 'http://www.yourdomain.com/category.html?cat=1', in this example our 'category.html' page may display a list of our store's categories, yet 'category.html?cat=1' may display products that are within a category. Example: cat=1 may lead to a category about shoes and cat=2 may lead to a category about shirts, therefore we want to preserve these variables in our canonical link tag.
For the category list page 'http://www.yourdomain.com/category.html' the URL within the canonical link tag would be as shown below:
<link rel="canonical" href="http://www.yourdomain.com/category.html" />
On our shoe category page 'http://www.yourdomain.com/category.html?cat=1' the canonical link tag would look like the below example:
<link rel="canonical" href="http://www.yourdomain.com/category.html?cat=1" />
And our shirt category page http://www.yourdomain.com/category.html?cat=2 would have a canonical link tag as shown below:
<link rel="canonical" href="http://www.yourdomain.com/category.html?cat=2" />
The above 3 examples are just a few examples of tricky pages that we have to deal with when implementing the canonical link tag. Your goal is to preserve variables that generate unique content.
Let's look into pagination. In this example we have the 'page' variable which is working with the 'cat' variable. Now we want to keep both the 'cat' variable and the 'page' variable because both of these variable generate unique content. Therefore 'http://www.yourdomain.com/category.php?cat=1&page=2' would be added to the canonical link tag as shown below:
<link rel="canonical" href="http://www.yourdomain.com/category.php?cat=1&page=2" />
The same goes for page 3, 4 and so on, as shown in the below example:
<link rel="canonical" href="http://www.yourdomain.com/category.php?cat=1&page=3" />
<link rel="canonical" href="http://www.yourdomain.com/category.php?cat=1&page=4" />
Now let's go a little deeper into pagination and let's add a session id and a sort variable to 1 of our URLs which would look like the example below:
http://www.yourdomain.com/category.php?cat=1&page=2&sid=123456789&sort=by_price
We automatically remove the sid variable from our URL because as previously mentioned, the sid variable only passes session id's.
Now what about the sort variable? Well, the sort variable only rearranges the content that is being shown on the page, therefore we remove the sort variable as well.
The canonical link tag for 'http://www.yourdomain.com/category.php?cat=1&page=2&sid=123456789&sort=by_price' should look like the example below:
<link rel="canonical" href="http://www.yourdomain.com/category.php?cat=1&page=2" />
Same goes for option menus that let people choose different views, for example 'gallery view' and 'list view'. For this you should use the default view option for your canonical link tag. Since the view option only changes the way a page looks, the view variable should be left out because it leads to pages that contain duplicate content.
Another example is the 'results per page' option. If you give visitors the option of choosing how many results are displayed per page, provide the link that displays the default number of results per page.
The canonical link tag for 'http://www.yourdomain.com/category.php?cat=1&page=2&view=gallery&per_page=50' would look like the example below:
<link rel="canonical" href="http://www.yourdomain.com/category.php?cat=1&page=2" />
As you can see, we removed the 'view' variable and the 'per_page' variable from the above canonical link tag. Notice that we kept the 'cat' and 'page' variable in our URL because these 2 variables produce unique content.
I hope that this article gives people a better idea of how the canonical link tag works. By implementing this tag, you will make a robot's decision on which pages to index a lot easier. Just look at all the duplicate content that we were able to eliminate in our examples by adding the canonical link tag. Also keep in mind that your Pagerank will be shifted away from duplicate pages, therefore increasing the strength of the URLs within the canonical link tag.
This article was written because we were looking into crawling a couple of sites ourselves. After creating the crawler and sending it out to visit a few sites, We learned how much of a hassle it is dealing with duplicate content. As a result and in an effort to prevent duplicate content on our site, we added the canonical link tag.








thehemu 13 months ago
that's great seo tip.