• Blog

    Latest from our Blog

  • Tools

    Our Favourite WP Tools

  • Hosting

    Recommended Web Hosts

  • Coupons

    Get great money off deals

  • Themes

    WordPress Theme Directory

  • Plugins

    WordPress Plugins Directory

  • Promote

    Your WordPress Product

The underlying factor of a good website is its content. Since it is people who constitute your website’s organic traffic, it is quintessential for you to uphold quality content in your website. User experience, site design and readability also play an important role. These are the factors that draw people to your site, makes them want to subscribe to your newsletter and follow you on Twitter.

What is Content Scraping?

content scraping in wordpress

In the simplest of terms, content scraping is the act of stealing your content. To elaborate this self-made description, when a website produces an article, it is that website’s property – sometimes even copyrighted. When another website copies that content (in its own page or email) without permission – that’s called stealing. It’s simply taking things that do not belong to you.

Why is it bad?

Well for starters, you don’t get credit for it. A writer always wants his work to be known. (That’s why some websites pay you a lot of money to contribute anonymously – and that’s also why very few of these website have engaging content.)

Also, and this is probably what you’ve been wondering – it wreaks havoc on your SEO scores. Consider the following situation:

Imagine you’ve started a new blog where you put in a lot of effort and produce detailed, well-researched articles. Since your site is new, it takes time for it to show up in Google. Sites that have been there a long time usually get more preference (in terms of SEO scores) than you, particularly in this case. The older site steals your content and publishes it. This affects both sites’ SEO scores. Of course, the site that’s copying content is probably using some black-hat technique – which is terrible in the long run. Hence, you’re the one in the loosing end.

Why do people do it?

There’s no shortcut to success. A smart man knows it. An honest man believes it. On the contrary, the world is full of people who think that Mr. Gates built Microsoft in a day. So they decide to give the shortcut a shot. At they end of the day, they’re bound to fail – I believe that – a personal, take if you will.

Which brings us to our question – what’s in it for the thieves?

  • Lead generation: Producing quality content is hard and if your newsletter contains useful stuff, you’re bound to get more visitors and subscribers.
  • Affiliate marketing: Replace the hyperlinks of the products mentioned in the stolen article with your affiliate link and you get a commission.
  • Ads: The more the content, the more space to put ads. Greater the visitors, the more you earn from each click!

Catching the culprits

detect content scraping

The first rule of war goes thusly:

Know thine enemy

How do we catch them? Primarily, there are four ways.

Google your post titles

Probably the oldest trick in the book. All you have to do is search Google with your exact post title. Best case scenario – you get your article as the first result, followed by your site’s tags, social media profiles, etc. Worst case – your site shows up at the 5th or 6th result and there are 4 other sites with your exact title and content.

Pingbacks and trackbacks

One of the golden rules of blogging is internal linking. Google vehemently emphasizes on this – and I don’t see any reason not to follow the leader. I know it’s a bit tedious, but hey – the fruits are sweeter! One of the perks of interlinking is that your site’s links are present in the rogue site (one which copies your content). When people are reading the stolen article, they’ll naturally follow the links and land in your website. You can utilize this action to your advantage. Closely monitor your site’s stats. A domain which send you trackbacks and pingbacks regularly (that you don’t know of) is most likely to be the culprit.

Google Webmaster Tools

Registering your site in Google and Bing Webmaster tool is of utmost importance for strategic monitoring of your site’s stats. Under Google Webmaster Tools head over to “Traffic > Links to Your Site”. This will link every inbound link to your site. The content scrapers are most likely to be on the top of this list.

Use a paid service

Copyscape is a premium service which costs like 5 cents per page searched. One of its products – Copysentry - monitors the interwebs 24×7 and tries to find copies of your article. Once a red flag is raised, you immediately receive an email with the details of the culprit. The next step is up to you. That’s right. Copyscape does not take action. It merely points out the content scrapers. If you’d like to read a bit more on their service and pricing, I suggest you take a look at their brochure.

How to prevent content scraping

prevent

Using WordPress plugins

Anti Feed-Scraper Message and Copyright Proof are two of the best WordPress plugins to prevent/combat content scraping.

Anti Feed-Scraper Message adds your site’s data to your RSS feed’s footer. You can enter the relevant data in the plugin’s settings page and the rest will be taken care of. Here’s a quick look at the final result:

adding site data to your rss feeds footer

Copyright Proof is a much more versatile tool which lets you a add time-stamped certificate along with a licencing and attribution notice below each blog post. You can also kick it up a notch by prevent right-clicking on your post area – which wouldn’t give the user any way to copy your content (unless he/she is skilled in HTML/CSS).

Google Plus Authorship

google plus authorship

Google’s patented technology - Content Author Badges appearing below links shown in the SERP took the SEO world by storm. Nathan recently discussed a complete guide to setting up Google Plus Authorship in your WordPress site.

Free Pinging Services

One of the best ways to prevent content scraping is by pinging a lot of search engines with your site’s URL. WordPress internally pings a certain services using Ping-o-Matic but if you want to take matters into your own hands, try Google Ping, Pingler and BulkPing to manually ping a plethora of search engines. Pingler also has a WordPress plugin (compatible up till v3.6) which will ping 5 new posts every 24 hours, whereas premium members enjoy unlimited pings.

Internal Linking

Interlinking

As explained earlier, internal linking your blog articles is not only a good but a necessary practice as well. It helps build a strongly linked website – according to Google’s liking. Also, in the event that your content is scrapped in another site – you have nothing to worry about – strong internal linking will only get you useful backlinks!

Add a link to your all post titles automatically

Since most of the content scrapers use automated scripts to steal your content, chances are that your post title will be copied as well. You can use this to your advantage. Open your theme’s “single.php” file, locate the title and replace it with the following code:

<h1>
<a href="<?php the_permalink(); ?>"><?php the_title(); ?></a>
</h1>

For example, if we wanted to enable this for the default Twenty Eleven theme, then we would open the content-single.php and replace the codes as shown in the image below:

Editing Twenty Eleven's content-single.php [BEFORE]

Editing Twenty Eleven’s content-single.php [BEFORE]

Editing Twenty Eleven's content-single.php [AFTER]

Editing Twenty Eleven’s content-single.php [AFTER]

 

Prevent Image Hotlinking

Also a really cool way to to shoo away scrapers, this method saves you a lot of bandwidth! Here’s how you do it:

  • Create an image informing the reader that the article was stolen from your site. In our tutorial, let’s consider our site name is wplift.com and the name of the image is prevent.png
  • Upload that image to your WordPress’ installation directory (you must place it in the root/installation directory)
  • Append the following code in your site’s .htaccess file (also found in the root directory). Make sure you backup your .htaccess file just to be safe. If things go wrong you can always revert.
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.+.)?wplift.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*.(jpe?g|gif|bmp|png)$ /prevent.png [L]

Of course when you’re applying this tutorial in your site, you must replace the site name and image name accordingly. If you have uploaded the image in a different directory, then you must replace “/prevent.png” with “/path/to/your/image.jpg”.

How to fight Content scraping

Good Cop – File a DMCA Takedown

good cop - file dmca

Dealing with previously scrapped content can be a bummer. You have two choices – file a DMCA complaint and wait for them to take the right course of action. That’s called being the good cop. Both of us know that these content scrapers won’t stop and filing a DMCA complaint also costs your money! If you decide to buy the product, then you could also enjoy their WordPress plugin!
If the rogue site is hosted in a reputed hosting company like HostGator, you can simply use their DMCA form and file a complaint.

Bad Cop – Deny IP using .htaccess

bad cop - deny ip

However, if you want to take action right away here’s what you do – directly deny the scraping site’s IP address. Suppose the IP address was 112.338.23.102. First backup your .htaccess file (found in the root WordPress installation directory) and add the following code:

Deny from 112.338.23.102

Conclusion

We’ve all heard of the saying:

Prevention is better than cure.

This can not be truer in the fight against content scraping. If your blog hasn’t been scrapped yet – that’s a good thing. But you never know when you’ll fall prey to one. If you’re running a tutorial based site, then I’m sure each article takes a lot of time and effort to craft. It would really be a bummer when some dolt steals it – hence the preventive techniques. I would highly recommend enabling Google Plus Authorship, image hot linking and heavy internal linking in your blog posts – that’s the absolute best way to prevent content scraping.


Post Tags

Author:

Sourav is a WordPress enthusiast, an avid gamer and a sitcom collector. His playlists include heavy metal, electronic, and new-age tracks. When he's not online, he's spending quality time with his friends and family.

Leave Yours +

7 Comments

  1. Thanks for that great post! Very useful informations!

  2. Hey Sourav,

    Cool article and will definately check out those solutions. When you get a ‘spare’ minute you might wish to also add http://www.tynt.com/ to your arsenal.

    Seems a worthy tool if the copy/paste stats are right..

    Cheers, Stuart

  3. Good post and thanks for sharing. I always add internal links to my post and had some ridiculous theft of my articles recently! Jonny

  4. Nice article, you go through more options than any of the other articles I’ve read online, touching this subject.

    Yoasts’ SEO plugin has a pretty neat feature that allows you to add a tag line with a hyperlink to the bottom (or top) of your RSS feed.

  5. Content scrapping is a cause of concern to most bloggers and so long as your work is free on the internet, you must fight this war at one point in time. This article is really handy and forms a good guide as I have never imagined adding link to my post title before now.

  6. WordPress Content Copy Protection is another great plugin I would recommend. This plugin has FULL text AND image protection for your WordPress sites that makes it near impossible for a user to copy your text and images: http://www.securiilock.com/. They also have a free version available here: http://wordpress.org/plugins/wp-content-copy-protection/

Leave a Reply

* Required Fields.
Your email will not be published.

Search

Our Sponsors