Pushing Bad Data- Google’s Latest Black Eye

October 20, 2021 by No Comments

Google stopped counting, or at least publicly displaying, the number of pages it indexed in September of 05, after a school-yard “measuring contest” with rival Yahoo. That count topped out around 8 billion pages before it was removed from the homepage. News broke recently through various SEO forums that Google had suddenly, over the past few weeks, added another few billion pages to the index. This might sound like a reason for celebration, but this “accomplishment” would not reflect well on the search engine that achieved it.

What had the SEO community buzzing was the nature of the fresh, new few billion pages. They were blatant spam- containing Pay-Per-Click (PPC) ads, scraped content, and they were, in many cases, showing up well in the search results. They pushed out far older, more established sites in doing so google scrape. A Google representative responded via forums to the issue by calling it a “bad data push,” something that met with various groans throughout the SEO community.

How did someone manage to dupe Google into indexing so many pages of spam in such a short period of time? I’ll provide a high level overview of the process, but don’t get too excited. Like a diagram of a nuclear explosive isn’t going to teach you how to make the real thing, you’re not going to be able to run off and do it yourself after reading this article. Yet it makes for an interesting tale, one that illustrates the ugly problems cropping up with ever increasing frequency in the world’s most popular search engine.

If you write content, whether you write articles, blogs, website content – or content in various formats as I do – you’re probably the victim of content scraping, also known as splogging, or spam blogging. Basically, people scrape parts of, or entire articles & blogs, and use them as content on their own instantly content generated websites.

The reason I am writing this article, is to alert those who are using this method – who perhaps didn’t realize quite what they were doing. As I am finding that some people are being sold programs to do this, without fully realizing what they’re actually doing, and the possible implications for them.

Whatever you call this kind of tactic – it is theft. when you create anything, whether it’s art – or textual content – it’s is protected by copyright, and nobody has the right to use is, apart from within any terms & conditions of use as stated on the article websites or on your terms of use in your website.

EzineArticles (and other articles sites) clearly state that in order to publish an article on your website from EzineArticles, you must agree to publish the entire article including the resource box with the backlinks, and with no changes at all to the content. So, whether you’re scraping content from blog sites, or scraping parts of articles without the resource box – it’s theft, and you’re not likely to get away with it for long.

Hosts & ISP’s are concerned about this, and will take action if you pursue them – it does not take long at all, with most good hosts, for them to enforce their AUP (Acceptable Use Policiy) and close down a website.

If you steal people’s content, that they have worked long & hard to create – they are going to be very upset, and most will not even bother to contact you first to give you the option of making things right, they’ll go straight for your host, as well as attempting to get you banned from any of the partners you’re using to monetize the site, such as Google AdSense for example, Clickbank, or any other affiliate partner.

Personally, I spend time searching for websites that scrape my content – I email them first, if I get no response after a week then I do a Whois lookup & contact the host, and I’ll contact Google AdSense, PayPal, Clickbank, any other partner of the website to alert them of what is happening.

I did this the first time I started to search for people scraping my content, which was quite recently, and I was quite surprised to get what seemed like a very real apology from the website owner, who had bought into a piece of software with no idea that he was stealing content – that afternoon I went to his site again, and he had taken down every single page, as every page had been generated using content scraping.

It seems a lot of people are not meaning to steal content, they are buying into programs without fully realizing what they’re doing is wrong, and when they find out they are often horrified, and do take action straight away. Not all of them, of course there are some people who will be completely comfortable with what they’re doing, and will just do it for as long as they can before they get shut down.

So, if you are currently using content scraping and you didn’t realize what you were doing, stop now! If you were considering doing this, hopefully this auricle has helped to change your mind. And finally, if you’re a writer and you find people who are scraping your content – just send them an email first & give them the opportunity to realize they’re making a mistake, and to correct it, before taking further action.

Leave a Comment

Your email address will not be published. Required fields are marked *