Lots of discussion about canonicalization and canonical URLs lately. I’ve discussed URLs and URL structure a few times in the past. We thought that we would help illustrate the idea of canonical URLs. From an SEO point of view here is the definition of a canonical URL:
Canonical URL: the search engine friendly URL that you want the search engines to treat as authoritative. In other words, a canonical URL is the URL that you want visitors to see.
Quite often canonical URLs were used to describe the homepage. The typical example used is that most people treat the following URLs as the same:
The fact is that these are all different URLs. From a search engine perspective, this can cause a bit of an issue. Hence the idea of canonicalization. Canonicalization is the process of picking the best URL (to present to the search engines) when there are multiple choices available. Typically a search engine, such as Google will attempt to pick the best URL that they feel is the authority for that page. However, sometimes they may in fact select the wrong one. Now let’s suggest that you have product pages that depending on how the user navigated to the pager returns a different URL… same page but different URL, now we have a duplicate content issue. Not to mention the nightmare for interlinking and external link inventories.
The easiest way to avoid this is to let the Search engines and the users know which is your “preferred URL” a.k.a canonical URL. One suggestion is to redirect all of the variations to your canonical URL (the URL that you want to be the authority). In February, the major search engines announced another solution with the canonical tag. This tag gives you control of the content that you want the engines (and users) to see.
Matt Cutts of Google fame has discussed duplicate content and canonical tags a number of times. One of the questions that he was asked included:
Q: So when you say www vs. non-www, you’re talking about a type of canonicalization. Are there other ways that urls get canonicalized?
A: Yes, there can be a lot, but most people never notice (or need to notice) them. Search engines can do things like keeping or removing trailing slashes, trying to convert urls with upper case to lower case, or removing session IDs from bulletin board or other software (many bulletin board software packages will work fine if you omit the session ID).
We have seen sites that have upwards of 15 versions of the same page but with different URLs. The simplest solution is to have one final destination URL. An easy way to do this is through the canonical tag or by redirecting all of these pages to one authoritative page. The canonical tag is simple to use, all you need to do is add this tag to specify your preferred version of a URL inside the (head) section of the duplicate content URLs.
Canonical simply means relating to or belonging. It also means reduced to the simplest and most significant form. Just remember that a canonical URL is the simplest and most significant (authoritative) version of the URL that you want to be seen.