Google Reclassifies Certain Types of Thin Content Pages

Topics: Content Marketing, Experiments, SERPs

Post Penguin Update

We have new data which shows that the 25th October spike wasn’t part of the gradual Penguin 3 roll-out. It appears the update had a lot to do with on-page factors and is (among other things) related to how Google treats thin content pages and soft 404s.

This was first spotted by Martin Reed, who noticed that Google changed the status of our purposely blank untitled.html page used to test Google’s title rewriting behaviour.

The page has always been treated as thin content and flagged in Google Webmaster Tools as having a non-informative title tag:

non-informative

After the 25th October update, we no longer see anything in this category, which now says: “Non-informative title tags: 0”

In our Crawl errors, however, a new page just popped up. You’re guessing it right, it’s the untitled.html:

Errors Page

Google’s algorithm update seems to have reclassified certain types of thin content pages, now treated as soft 404s:

Soft 404

Specific Cases

We checked a number of other web properties to make sure this wasn’t a coincidental date match. There were several instances of an increase in soft 404 pages all on, or around the 25th October:

Soft 404 Example 1

Soft 404 Example 2

Soft 404 Example 3

Soft 404 Example 4

After inspecting each case of newly flagged soft 404s we found several instances of correctly classified pages which failed to returned 200 instead of 404. There were also a number of interesting borderline cases, most with the detection date correlating to the recent update.

[hozbreaktop]

Example 1: Thin Content Page

A thin content page page with the following in the content section:

Customer Happiness Page (h1)
http://linktoahappinesspage.com/page (a href)
– We don’t have any topics in this community at this point. (p)
[icon] Promo line for the survey tool (p)

This page was first flagged on the 28 October 2014. The URL is noindexed.

[hozbreaktop]

Example 2: Low Value Page

Another unusual soft 404 we noticed is just a HTML sitemap:

Sitemap

Note: This page was first detected on the 8 August 2014 so it’s unlikely it’s related to the recent update.

[hozbreaktop]

Example 3: Tag Pages

In this example I can share full details as the site in question is an old pet project. Essentially we’re talking about a classic, tag pages containing the “trigger” type of statement in the content area:

No Posts

Here’s the full list of newly detected URLs:

[hozbreaktop]

Example 4: Zero Result Pages

This one is the case of a zero result search, more specifically an indexable zero post author page in WordPress. The page was first detected on the 25th October alongside  a couple of tag pages similar to our previous case with analogik.com.

URL format: website.com/blog/author/username/ and  website.com/blog/tag/

Author

We found the same sort of notification for another domain (URL format: /blog/search/keyword/) with the following in it’s content section:

No Results

Note: The soft 404 was first detected on the 27th October 2014.

[hozbreaktop]

Summary of Findings

Data examined so far suggests a change in treatment of certain types of thin content pages (some of which were noindexed). Page types treated as soft 404s include:

[list style=”arrow” color=”green”]

  • Tag Pages
    • Blog
    • e-Commerce
    • Catalogues
  • Zero Search Result Pages
    • Blog Author
    • Internal Website Search
  • Ultra Thin Content
    • Uninformative Documents
    • Blank Pages

[/list]

Community Reactions

So far the consensus is that this was a Panda update. Glenn Gabe wrote about it here and tweeted:

Others agree, including Aleyda Solis:

0 Points