The official Google search blog announced a record number of changes in February 2012. One of them is the way they interpret links after dropping one link-based signal which was used for years.
Link evaluation. We often use characteristics of links to help us figure out the topic of a linked page. We have changed the way in which we evaluate links; in particular, we are turning off a method of link analysis that we used for several years. We often rearchitect or turn off parts of our scoring in order to keep our system maintainable, clean and understandable.
Here is a list of link signals which Google may use in their ranking algorithm:
- PageRank
- Anchor Text (Exact match, partial match, URL links, non-descriptive)
- Link Location
- Link Repetition
- Context: Surrounding text and tags
- Link age
- Title attribute (within link)
- Link changes over time
- Link accumulation on page
- No follow
- Number of outgoing links
- Number of internal links
- Link reciprocation
- Javascript
- robots.txt / meta directives
- Image links
- ALT attribute on image links
- Target (e.g. _blank)
- Font size
- Bold / Italic
Which one do you think it was? Let us know on Google+
Discussion Highlights:
Bill Slawski
Actually, I was intriqued by the many references to changes in how Google handles freshness, and I can think of some of the features they might have removed, and some of the new signals they might be looking at to do things like determine burstiness.
Regarding links, while that section starts by mentioning “link characteristics,” they tell us that “we are turning off a method of link analysis that we used for several years.” So is this something as simple as ignoring whether or not links might have underlines or not, or does it involve a larger process or method of link analysis?
I think there’s still some value in the use of anchor text and in PageRank, but there are many different methods of link analysis that Google could potentially turn off.
For example, the local interconnectivity patent approach (http://www.google.com/patents/US6725259) that was inferred as being turned on in 2003 in the book In the Plex might be a candidate. That involved looking at the top-n (10, 100, 1,000) results for a query and reranking them based upon how frequently they link to each other. There’s still some value to looking at interlinking when it comes to determining if one or more results might be ideal navigational results for a query, but is it helping to send better results to the tops of those results? It’s something I would test on a regular basis to see if it does.
(Continued from question about user-centric temporal blog activity analysis)
+Deyan P The Google poster that you wrote about is a few years old, and I was hoping to see a followup research paper associated with it, but I can’t say that I’ve seen one come out. I’ve suspected that Google has looked at a number of the heuristics described within it, and likely implemented a few of them.
I recently wrote a blog post about a recently granted Google patent that was originally filed in 2006, which described how they might filter some blogs out of blog search, and the description included some really broad, outdated and not very good rules for deciding whether or not they would include blog posts within blog search. These included considering the number of links within a post (with too many being bad, distance of links with posts that had links too far from the start of a post not included, and presence of links pointing back to the post or to other pages on the same domain.
I followed up that post with another one that (1) had 98 external links, (2) had links throughout the post instead of just a short distance from the start of the post, and (3) had 36 named anchor links towards the start of the post that linked back to different sections of the post. All three of those would potentially keep the post from being included in Google Blog search because that post broke three of the rules from the description of that patent. The post was showing in Google’s blog search sometime shortly after I posted it.
While I suspect that Google did come up with filters to keep some blog posts from appearing in Google Blog search, I don’t think many of the rules described within that patent were implemented as described in the patent.
But they could have been. All three were link analysis type heuristics, and if any of them were still in use by Google, they were ones that should be retired, because they were too broad and didn’t do things like consider the target of the outgoing links (in my example, 97 of the 98 links were pointed to pages at the USPTO) or even the internal ones, which were named anchor links helping to make the blog post more usable by delivering readers to sections of the post that they might find most interesting.
Another of the rules from that particular patent would potentially filter some blog posts out of Google Blog search results if they linked to videos. The patent was originally filed a number of months before Google acquired YouTube. The intent was to avoid blog posts that might link to “undesirable” content, but it didn’t distinguish between the kinds of content that those videos might contain. Again, a rule that was likely too broad when described in the patent, but which probably didn’t get implemented as written.
I suspect that there are other “link analysis” methods that Google may have actually implemented that may have been based upon assumptions that didn’t carry out as providing the value they were intended to give, or might have been based upon circumstances that have changed.
Wiep Knol
For me, “several years” isn’t the same as “almost since the beginning”, so I don’t think it’s anchor text or Pagerank.
Anchor text has been abused a lot, but I’d love to see what the SERPs would look like without it. I think many users would complain. As for Pagerank, there’s a difference between PR and TBPR – Google could switch off TBPR easily, but still use PR behind the scenes. I don’t think this will happen very soon, though.
I agree with Tad that it will probably be something smaller and it really could be anything. Raw link numbers (sidewides), on-page link relevance, no longer placing extra weight on old links, how to deal with link spikes, or adjusting first-link-counts – just to name a few.
Off to do some testing & hoping someone will be able to squeeze something out of Matt Cutts at SMX West 🙂
AJ Kohn
I’m with the majority on this one. I can’t see it being anchor text or PageRank. These are still fairly strong indicators for topical relevance and authority. They’ve been abused but I sense that Google may be getting better at normalizing the abuse. It’s likely something more subtle.
I’m thinking it’s something to do with link position or number of links. I recall that Google changed their guidelines on number of links on a page from <100 to the more vague ‘reasonable number of links’ per page. (I still like less than 100 BTW.)
That change was made, in part, because Google could now crawl and index more of each page. Their bandwidth problems had been solved by Caffeine.
So when I think about this change I think about what link problems Google was trying to address pre-Caffeine that were obviated by that launch.
I wish I had more time to cogitate on it but I have to practice my presentation and get on the road to San Jose.
Tadeusz Szewczyk
I can only guess but all the three mentioned above anchor text, nofollow and PageRank have been abused so much that Google could turn them off. I think though it’s rather something smaller. It’s probably something like number of links on a site linking to another site. I think in the recent past 10k links were still better than just one but by now I guess they could neglect the number completely. So whether a site links to you once or 10k times won’t matter probably anymore.
Potential Link to 2005 Research Paper
In today’s collaborative Google Hangout with Lyndon NA & Sasch Mayer I discovered two link analysis methods described in a 2005 research paper published by Monika Henzinger from Google:
1) Query-Independent Connectivity-Based Ranking
2) Query-Dependent Connectivity-Based Ranking
These fit the criteria of being around for a few years and potentially phased out by a superior link analysis method.
References:
Monika Henzinger, 2005: http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/9019.pdf
Wow, you opened up a whole drawer full of possibilities now..
I was just thinking Google may have dropped the anchor text importance but after reading this, maybe the “nofollow’ attribute.
Certainly more likely than anchor text. But probably not because of spam.
Smells like page rank to me
They say “that we used for several years”. PageRank has been used since day one.
Hmmm…I’d like to think they will value nofollow.
That’s Google, always in change.
Dan, I reckon its context. They’ve probably got more sophisticated bots now
“we are turning off a method of link analysis” – sounds more like a change in the algorithm than an actual change in signals used, like a part of the algorithm has been superseded by more recent work. I would think/hope a larger change would deserve a posting of its own (especially something like nofollow which would be a fairly major change in robots policy).
You make a very good point.
From the list of link signals listed by +Deyan SEO , I think one of the following Google may remove:
1. Anchor text: You can see that links with anchor text when building offsite usually have been done by SEOs (people who link for SEO purpose). Naturally, when people share a link, they usually just copy/paste url without anchor text. And for determining the topic of linked page, I think Google can probably analyze from the content of the page.
2. So the same with Title attribute.
3. Target: this one have nothing for evaluate to me.
PS: Sorry for my English.
Regardless of which link signal Google has decided to demote or remove from its Algo :: it seems to me the growing impact “social shares” is having and the rendering of ones rel=author and rel=publisher along with the subsequent psychological “social proof” value this metric holds for triggering a CTR will likely minimize the impact “turning off this method of link analysis” in the long term.
I think they are turning off page rank.
I agree, turning off anchor text and using surrounding text would solve some link problems. Turning off anchor text would be a very BIG change I think. I think it’s already been turned off likely. Go over to most SEO forums and you’ll see people complaining of big ranking drops. Many say Google is even notifying them with a message in Webmaster Tools specifically mentioning backlink schemes. Doing this will reduce any remaining effectiveness that forum profile spamming may have had. Also, blog comment spam effectiveness could be mitigated as well by turning off the anchor text signal.
I don’t think it could be something like nofollow, pagerank or even anchor… these are the larger factors that can literally change the whole shape of the web if altered even a bit.
You could have summarize the post at the end! btw is there any word from Matt Cutts on this?
Yes, I think Google might remove it or they should.
Looks to me like Google have done something that make unnatural linking more detectable – or at least means they are more bothered about it. It could be that the thing that they turned off counter-balanced the effect of blog linking schemes and now it’s been turned off they’ve had to hit the blog link schemes to rebalance the algo. Something like that?
Here’s my pick after seeing Bill’s list of potential changes:
http://www.seobythesea.com/2012/03/12-google-link-analysis-methods/#comment-424410