The Only World-Standard SEO Software


SEO PowerSuite Christmas Sale 2017

Download Now
SEO PowerSuite
SEO PowerSuite Hot-new version
Supported OS

Google “Ranking Factors” Correlation Study Explained

| Posted in category Competition Research Google Internet Marketing Search Engine Optimization

correlation and causationStudies on search engine ranking factors have always aroused heightened interest among digital marketers and website owners. The last correlation research held by moz and searchmetrics are by far the best examples of such studies.

What makes them outstanding is the fact that these effortful studies were performed by true professionals over impressively big amounts of data. Still, they managed to present the results in a simple and easy-to-interpret way.

Perhaps the only obvious defect is the title, which in both cases includes the phrase ranking factors. It confuses many marketers and drives them to the wrong conclusions.

Sure enough, the inclusion of ranking factors 2013 into a title instead of a less attractive and a bit boring "correlation study" forces many readers open the article. There are big chances that you are reading my post thanks to this little trick.

Anyway, these studies are great. What makes MOZ' research even better is that they thoroughly explained the methodology and published a link to the full data archive (420 mb table zipped into 160 mb archive), which allowed some crazy digital marketers including me to do their own research. So let's play with data and see what hidden gems we can find there.

The original research data can be downloaded here
http://moz.com/search-ranking-factors/methodology. This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.

What does the correlation coefficient say?

If you look at the upper row of MOZ Search Engine Correlation Data table you will see Page Authority as an absolute winner of the metrics' battle.

First of all, what is PA (Page Authority)? This is an SEO metric representing the ability of a page to get high rankings. You can see that the correlation coefficient for PA is 0.39, which is undoubtedly very good for a link-based metric.

Running ahead, I should say that the relations between search engine rankings and metrics is not linear but I'll use a linear function as a simple example of how scatterplots of data with different correlation coefficients look like.

 correlation examples

First scatterplot has a pure linear function Y = 10 + 2X. All Y values can be calculated from X and thus the correlation coefficient is 1. Three other plots represent two separate datasets with the correlation of 0.75, 0.39 and 0.23.

You can see that 0.75 and 0.39 correlations can be eyeballed (so if there was no blue regression line on the plots you could have drawn it yourself in your mind). But if we remove the blue regression line from the last plot, the correlation between Y3 and X is not easily detectable.

Now, when you see that the correlation between, for example, the number of Facebook likes and rankings is .23 then you will remember the last example plot and understand that in many cases the rankings can't be "predicted" by the number of likes.

Let's now have a look at the relationship between search engine rankings and the arithmetic mean of PA and DA (MOZ's Domain Authority) calculated in each position.

Page Authority and Domain Authority vs. Google Rankings plotAccording to the plot, you can see that the higher the rank is the bigger the gap between PA and DA of webpages neighboring on the same SERP. MOZ research covers a relatively big amount of data obtained from top-50 results for more than 14K keywords.

Still, keep in mind that the scale of metrics values heavily depends on the niche, search volume, type of query and so on. In general, the more competitive niche is the higher the metrics of pages on SERPs.

For example, let's take two different SERPs with PA correlation around 0.39 (I chose search results for queries honeymoon packages vs stock photo) and two other SERPs with DA correlation of 0.27 (dressup games Vs. cranberry sauce recipe) . Click to see the larger image.

PA and DA of websites presented on different SERPsI think the graphs speak for themselves and prove the statement about the scale of metrics values for different SERPs. Also, take a note of a quite big standard deviation value of 0.18 for both PA and DA provided in the full results table at moz.com. Don't forget that, for example, just-another-crap-site.blogspot.com and high-quality-blog.blogpost.com have the same DA.

Also, many SERPs contain a lot of Wikipedia, Amazon and other highly popular pages which also have a big influence on average values. It holds true for not only MOZ PA and DA but for absolutely all SEO metrics mentioned in the research.

How SEO metrics are calculated

Old-school SEOs still use toolbar PageRank as the main indicator of a page quality even despite the fact that it is rarely updated (the last Google PR update was in the end of 2013 and in February before that) and has an absolutely non-representative scale.

Sure enough, each SEO software company with own backlink database will try to come up with an in-house PageRank alternative while this metric is popular with inbound marketers.

MOZ have such alternative which is called MozRank. It has a logarithmic scale from 0 to 10 (but with 2 decimal digits) and is updated once a month. The correlation coefficient between MozRank and rankings is 0.26.

The algorithm is very similar to PageRank calculation And MozRank can thus be treated as a raw estimate of link popularity.

Domain Authority is a more popular metric and is positioned as an estimate of a domain ability to rank. So let’s have a look at the relationship level between MozRank and DA.

Relationship between DA and mozrank

For most of you it may not be a surprise that there is a strong linear dependence of Domain Authority from MozRank.

However, you can see that even a 0.95 correlation is not enough for a simple calculation of DA based on MozRank only. But what are the rest 39 variables involved into Domain Authority calculation? There's no any official info on that but we can make some assumptions:

  • Robots instructions - is the page open for indexation?
  • Website age – although it is not discoverable in 100% of cases
  • Top-level domain  – as, for example, links from .gov and .edu have more value as they are harder to get
  • Website IP address – is it reliable and dedicated?
  • Number of a website URL mentions in text but without a hyperlink
  • Speed of the backlink profile growth and the quality of newly acquired backlinks
  • Social signals – can be used to some extent but not necessarily as the number of shares is too easy to manipulate
  • Domain name length – the less characters a domain name has the more likely it  has a recognizable brand behind it
  • 31 other metrics (according to MOZ marketing materials)

Don't get me wrong, I'm not trying to understate the complexity and quality of MOZ metrics. I've been using them myself since they were released several years ago. However, there is a very small chance that things like website traffic, search engine penalties, and third-party metrics (Alexa Rank, SemRush Keywords, etc.)  are used in the formula.

And there is one more thing... Each backlink provider relies on its own backlink index when it comes to metrics calculation. Let's have a look at the bar plot below which represents the number of unique URLs (in billions) in the four major databases of backlink providers (Majestic SEO, WebMeUp, MOZ and Ahrefs).

Backlink tools indexNeck and neck, right? Well, I'm going to spoil the party by adding Google

Backlink tools and Google indexNo, I didn't delete the bars, they are just too small when compared with 60 trillion of URLs in Google’s index. Even Majestic SEO’s impressive 186 billions is 320 times less than what Google has.

Summing up

Correlation studies can give some food for thought but the results should not be misinterpreted. The researchers are always limited by computing powers so they are forced to choose relatively small samples in comparison to real numbers. What makes it even worse, there are more than 200 ranking factors and most of them hard or impossible to take into account.

Regarding SEO metrics, remember that they are keyword-agnostic and are calculated with quite a big delay compared to Google which can evaluate a page two hours after publication, rank it and instantly start bringing organic traffic to the page. However, correlation studies show that some of these metrics make sense when it comes to a rough estimation of a website quality.

Photo Credit:alizinha/CrossFitNYC via Flickr cc.



back to SEO blog