Lessons Learned from Manually Assessing 8000 Blog Comments & Deleting Half of Them
You may have noticed that the comments had been removed from WPLift for the last week, the reason is I have been doing a clean-up of the site because of Panda 4.0. WPLift has been hit a little by it so our traffic has dipped from the search engines so I have been doing an audit of on-site factors to try and repair this for when the next refresh comes around. Panda places more emphasis on on-site factors and authority so if you have been hit by it, taking a look at your site and cleaning it up should help. I plan to expand on this when the refresh happens and I can see the results of my work so I will share what worked ( or what didn’t! ).
One thing I thought could be a factor is the amount of spammy comments we had on WPLift, foolishly I had just left our comments open and was letting Akismet handle – it which while it’s very effective, isn’t one-hundred percent perfect.
The result ?
Over 8000 comments, many of which were very spammy in nature – links to all sorts of dodgy websites, keyword-stuffed comments and many other sorts of comments which should never be on your blog. I didn’t waste too much time checking comments as I reasoned that the links were no-followed and they didn’t seem to be harming my site. Looking back now, that’s stupid of me – when trying to rank on Google, linking out to dodgy sites and having a load of irrelevant content in the form of comments will wreck your keyword density and I think maybe it’s one of the reasons we got hit. Even if it’s not, It doesn’t look good to actual visitors – it makes the site look uncared for and neglected.
I did think about removing comments altogether like CopyBlogger has recently done, but decided against it – a lot of our comments are very useful and add to the blog post, whether that’s with corrections about the content of my posts ( which Im always happy to receive ), additional links and information and just the general feeling of a community around a blog.
So this week, I set about cleaning up the comments on WPLift, here is what I did.
How I removed Thousands of Spam Comments
When you have over 8000 comments to go through and pick which ones are spam, it’s quite a daunting thought. To help ease the burden I decided first of all to remove all pings and track backs as I no longer wanted to accept them. I don’t think they are really useful anymore, a lot of them are from spam sites, scrapers, aggregators etc. You can filter comments by pings so I did that and then bulk deleted them all.
Removing pings brought the total comments down to around 6000 which was still quite daunting but cheered me up a little.
To start removing the spam comments I tried setting the screen options for the comments page to show more comments, the max is 999 so I tried that which gave me 6 pages of comments to go through.
I then went down ticking all the spam comments and tried to bulk delete them, that didnt work – I received a WordPress error stating that the request was too large. I tried smaller amounts on screen at once but kept getting the error so in the end I showed 100 comments at once and then went down the list clicking “Spam” next to any comments I wanted to remove. This was an arduous process so I spread it out over 4 days – whenever I needed a break from working I cleaned out a few pages and kept a record of where I was up to.
Unfortunately I decided this was the best way – I could have tried some more automated process but I wanted full control over what comments stayed as I had a few criteria for comments that needed removing. After my manual removal I ended up with 4800 comments so I had to remove around 1200 by hand.
Let’s move on to what I removed.
Types of Comments I Removed
Here is a run-down of the types of comments that I removed and what you should be on the look-out for on your site.
I deleted the comment if a user posted a comment with a name like “Affordable WordPress Development” or even more spammy things like “Auto Car Loan”, you know the sort – they are only made to try and rank for that search term, even though the Penguin update slapped this sort of link-building. Ironically I probably helped some of these sites by removing the links :)
There were a lot of comments with fake praise about the site which you could tell were automated, things like : “Excellent site you have here.. It’s hard to find good quality writing like yours these days. I really appreciate individuals like you! Take care!!” These were easy to spot and the username linked to some irrelevant website, all deleted.
Reporting Non-Existent Errors
Similar to the fake praise comments, these ones reported issues with the site such as RSS feed not working, browser errors etc but I could tell these were also auto-posted.
Comments with Links
There were quite a few comments with links to various sites which had got through Akismet, links to payday loans, pills, porn were all present – the worst kind of neighborhoods to link to. There were also a lot with signature links like you would see on a forum, I also removed these.
A lot of people with their own products, plugins etc had dropped their link into the comments of reviews or roundups of other products. I assessed these on a case by case basis, If I thought they were a good resource which added to the post I left them. Other people were more sneaky, by adding comments to reviews saying the product mentioned was rubbish, a scam or support was bad etc but looking at the users email or website I could see these were from competitors – deleted!
Short Comments & Bad Grammar
This was another one I had to think about, short comments like “Nice Post” “Great Roundup” etc while from genuine readers I thought they didn’t add anything to the site and could have been used to get the first comment approved so they could then spam in further comments. I decided to go ahead and remove these as well as ones with terrible grammar. The bad grammar comments were not malicious but again, didn’t add anything to the site as a lot of time they were gibberish.
One thing I noticed which was interesting was that when I used Disqus on the site for a period on our old design, the number of spam comments dropped by a huge amount. I enjoyed that period, breezing through pages of comments with only a few spam ones was a nice break!
As I mentioned earlier, pings have been disabled now. They used to be used for blogs to post follow up articles on their own site and let the original site and commenters know of your content. The blogging world has changed a lot since then and I think nowadays is pretty much irrelevant for most sites.
So now I have the comments cleaned up, my thoughts are going to turn to how to prevent this from happening in future. I don’t want to implement a Captcha as I personally hate them and know that it will deter people from using the comments. I did see an interesting piece of code posted the other day “How I Stopped WordPress Comment Spam” which I’m going to look into, that looks like it will stop automated spam but won’t deter manual spammers. I could use the Jetpack plugin to handle them but Im not keen on how it looks, and there is also the option to move back to Disqus or Livefyre but again, I’m not a huge fan of those either.
Requiring people to login with Facebook or Twitter is another option, but not everyone uses those either so could deter comments.
Whether or not this will have any effect on Google’s view of the site remains to be seen but I can only see it being a positive in their eyes.
What do you think ? What are the steps you use to prevent spam in your comments ?