The Only World-Standard SEO Software

Download Now
SEO PowerSuite
SEO PowerSuite Hot-new version
Supported OS

The Ultimate Guide to Hiding Webpages from Indexation (Part 2): Mastering Duplicate Content Issues

September 9th, 2013 | | Posted in category Copyright duplicate content Google Indexation Search Engine Optimization

In the first part of the article we learned how to instruct search engines not to index certain webpages. In this post we are going to figure out how to deal with duplicate content issues.

 

What is duplicate content?

 

The term 'duplicate' can be applied to any content located in more than one place on the Web.

Basically, there are two types of duplicate content: off-site and on-site. Off-site duplication is when the same content is found on two or more different sites. On-site duplication is when identical content appears on two or more pages of one and the same site.

 

Examples of duplicate content  

Off-site

  • Scraped content (the one that was illegally copied from one site to another).
  • Syndicated content (includes blog posts, RSS feeds, podcasts, email newsletters, etc.).

On-site:

  • Pages that have duplicate titles and meta descriptions.
  • Adobe PDF and printer-friendly versions of a page.
  • Pages that have http before login and https after.
  • A home page that has multiple URLs serving up totally identical content.

  • Pages that have been duplicated in the results of assigning session IDs and different URL parameters.
  • Website pages with sorting options by time, date, size, color and the links.

What issues can content duplication trigger?

 

Although there's no such thing as duplicate content penalty, content scraping is likely to bring you in trouble with content owners, who will have all the rights to accuse you of copyright infringement. After all, you also won't be happy if you find out that someone has stolen your content and will do everything to protect it.

Also, off-site content duplication may downgrade your website total value, lower its rankings or relegate it to the supplementary index.

 

 

As for on-site content duplication may result in:

  • decreased crawl rate of search engines bots - the more identical pages present on a website, the more time crawlers need to spend to examine them all;
  • delayed inclusion into SERP;
  • a sub-optimal page may rank instead of the main one - search engines may decide that a duplicated page ( that may miss out on some content, be weaker in terms of link equity/ social engagement/ conversions, etc.) is the canonical one and include it into SERP instead of the main page;
  • wasting 'crawl allowance' on extraneous pages - Google has sophisticated algorithms that determine how much to crawl for each site, so making Google crawl through lots of duplicate pages may slow down the process of site indexation.

Summing up, content duplication can significantly affect your site quality score and hold back its rankings progress. So, checking if your website is guilty of producing duplicate pages and addressing that problem is crucially important!

 

Detecting duplicate content.

There are a number of manual and automatic ways to detect off-site and on-site content duplication.  Here are some of them:

Search engine operators  

site:www.example.com [a part of the content copied from your site here] 

Just go to Google and enter the operator site:www.example.com [a part of the content copied from anywhere from your site]. Click 'Search'. If you don’t see your site anywhere in SERP and at the bottom of search results you get a message saying about omitted results (see below), it probably means that your site got into Google supplementary index.


 

 

 

There are different reasons why sites get into the supplementary index but the most common one is content duplication.

site:example.com intitle: [text goes here] 

This operator will help you find all your site pages that have an identical title.

site:example.com inurl:  [text goes here]

This operator lets you search for specific text in the indexed URLs, thus letting you see all URLs that fall under the category of duplicate content.

The use of search engine operators is a simple way to look for content complicates. However, its main disadvantage is that this way is that you can’t save the found results, which makes the process of searching for duplicates a time-consuming and challenging task.

Special SEO tools

SEO site audit and content management software can be of great help when searching for duplicate content. There's a wide range of tools that are capable of detecting both on-site and off-site duplicates. These types of tools save all found results and provide you with instructions on how to deal with content duplicates.

Copyscape.Com is a free online tool designed for detection of external content duplicates and identifying website owners who are stealing your content.

The tool is pretty easy to use. You just enter your page URL and it will find all sites that have scraped content from it.

Another great tool that you can use to find on-site duplicate content is Website Auditor. The software searches for duplicate URLs on your site and checks if there are unique titles and meta descriptions on each website page.

Mastering duplicate content issues

 

 

When external duplicates are detected, these are the ways to solve the problem:

  • 301 redirect – if content duplicates appeared in the result of moving a site to another domain, you just need set permanent redirects to a new site;
  • cross-domain canonical tags – if you have multiple domains with the identical content, you need to choose which URL is going to be the main source with the help of cross-domain canonical tag ;
  • contacting webmasters – if your content was scraped, you need to contact webmasters of the sites your content was found on and ask them to delete it. If they refuse to do that, you are free to take a legal action.

As for external duplicates, you can hide them from search engine crawlers with robots.txt files or robots meta tags (the ways how to do that are described in the first part of this article).

Also, you solve external duplicates issue with the help of:

  • 301 redirects – you may use them to redirect a second version of the page to the main one;
  • rel=canonical tags – this code tells search engines that the current page is actually a copy of the above mentioned URL;
  • rewriting URLs – this is an advanced solution that requires understanding of the code. By modifying the .htaccess file you can remove session IDs from the URLs and even modify the domain name;
  • and other means described here.

Talking about titles and meta descriptions, you’d better make them unique.

 

Bottom-line:

As you can see, having duplicate content can negatively affect your site credibility score and result in lower search engines rankings.

Luckily, there’s a lot you can do about this. As the article suggests, you can eliminate all forms of duplicate content, replacing it with unique and valuable one or hide some duplicates from indexation. And you absolutely need to do that because it will help you make your site more search engine friendly and improve its rankings performance.

Image Credits:   oscarmell productions (via Flickr.Com), plagspotter (via Flickr.Com), weboptimist (via Flickr.Com).

 



back to SEO blog


  • http://www.geeknoob.com/ Kundan Bhardwaj

    I already knew about copyscape but never known of the omitted results in google and what they mean.

    Link-Assistant.Com Like or Dislike: Thumb up 0 Thumb down 0

    We are glad that you have found this info useful. :)

  • Matthew Powers

    Thanks for the great post! I had a follow up question tho. What do you think is the best way to handle duplicate content caused by different session IDs/URL parameters like you mentioned above?

    Link-Assistant.Com Like or Dislike: Thumb up 0 Thumb down 0

    That depends on what kind of CMS your site is based on, what type of
    duplicate URLs you have and some other other factors.

    Basically, as the article suggests, you can hide duplicate content caused by different
    session IDs/URL parameters using robots.txt files, robots meta tags,
    rel=canonical tags or you can rewrite these URLs via modifying the .htaccess file.

    Also you can instruct Google how to deal with such pages using Google Webmaster Tools. Here you may find step-by-step instructions on how to do that: https://support.google.com/webmasters/answer/1235687?hl=en