The Future of Search: 5 High-Tech
Trends That Will Change SEO For Good
By: Masha Maksimava
September 20th, 2016
Ah, the buzzwords. We've been hearing about AI, machine learning, natural language processing, and the like for a while now. Often, not in a very scientific context. Sometimes, even referred to as the same thing.
But really, what are those things? How do they affect Google's search results? And why does any of this even matter?
In this article, I've put together 5 trends that are revolutionizing search, with a detailed explanation of the mechanisms behind each one, its role in Google's ranking algorithm, and the impact it's likely to have on SEO.
But before we get down to the five, here's an important notice: all these five concepts, or "trends", do not exist in isolation and are deeply interconnected in Google's algo. Often, I will be calling a trend something that is in fact only one side of a phenomenon. I'm doing this because that side has its distinct traits and impact on SEO.
Let's start with the buzzword king.
Google's fascination with machine learning has been around for a while; but it wasn't until 2014 that they decided to try and incorporate it into the company's main product — search — and see how it works out.
The experiment proved to be incredibly effective, and in April 2015 Google announced that a machine learning artificial intelligence system, termed RankBrain, was now an integral part of their ranking algorithm (they even called it the third most important ranking signal).
To fully understand the impact — and the future potential — of machine learning systems like RankBrain, let's look at how technology in general and AI in particular have been developing over the years.
It's no news that as time goes on, human progress takes place at a faster and faster rate. Google's Ray Kurzweil calls this the Law of Accelerating Returns. The principle behind it is simple: more advanced societies — because they're more advanced and already have more tools at hand — have the ability to progress at a faster rate.
Director of Engineering, Google
Fundamental measures of information technology follow predictable and exponential trajectories.
In theory, this is understandable. But when the human brain tries to predict what technology will look like a few years from now, it tends to massively downplay it. The trick is that we, humans, are linear by nature — and our perception of technology is too. So when we try to imagine how much technology will change in 20 years, we look 20 years back and imagine a change of comparable magnitude.
For example, here's how computing has been developing since early 20th century and up till now.
The slightly curved line on the graph isn't exactly daunting. Over more than a hundred years of development, computers have (only?) been able to reach the capacity of the brain of a mouse. You might well be looking at the graph above with a kind (but a tad condescending) smile. Well done computers, you've beat the mouse!
But in reality, what looks like a slightly curved line is an exponential function.
Here's another visualization of the exponentially growing rate at which technology is developing that, perhaps, conveys the message even better.
Machine learning in search
RankBrain, Google's machine learning artificial intelligence system, helps the search engine better interpret search queries and provide more relevant search results in response to them. Google is characteristically fuzzy on exactly how RankBrain works; but from the little bits of information here and there, we can conclude that it does some of the following:
1. It figures out the meaning of ambiguous queries. It looks like RankBrain is able to figure out a query's meaning (as opposed to break it down into words and look for these words in web pages). Google's Hummingbird was the first step towards this, and apparently, RankBrain received much of its training data from there. The difference though is that RankBrain uses machine learning to constantly improve its ability to understand ambiguous search terms. Say, when a certain query is typed into the search bar for the first time, RankBrain may be able to draw analogies to queries of similar nature, and produce relevant search results based on those analogies.
2. It helps rank search results using a unique, situation-specific combination of factors. RankBrain looks at the search results that tend to do exceptionally well with searchers (e.g., the ones that get plenty of clicks and very few bounces) for a given query, group of queries, or niche, and then reverse-engineers those "best" search results to figure out what common traits they have. Using these insights, it may then conclude that for a given query, pages with video content and little text do best, and rank pages that fit this description higher. For another query, it may find that the best pages hardly ever use the actual keyword from the query in their content, but do tend to use the co-occuring terms a, b, and c a lot. In other words, it identifies common features of the pages that it knows are good responses to the query, and then looks for those features in other pages.
How does it affect SEO?
With previous Google updates, be it Panda, Penguin, or Mobile(geddon), the impact of each one was clear and somewhat universal (your content had to be unique, your links had to pass certain quality criteria, and your site had to be optimized for mobile devices if you wanted to rank in Google Mobile).
The situation with RankBrain — and systems of the same kind that will follow — is the opposite. There is no universal way of getting it "right"; instead, the ranking criteria for two distinct queries can be completely different, making the system incredibly hard to game.
My prediction is that RankBrain is going to gain even more momentum and play an increasingly big part in Google's ranking algorithm. For SEOs, this means that the era of generic, search-bot-oriented tactics (keyword density, backlink count, content length) is finally irreversibly over. Instead, competitive research will likely play by far the most important part of every SEO campaign. Marketers will try to "emulate" RankBrain and look at the top performing web pages in a given industry — even for individual queries — and look for common traits that these pages share, so that they can create content with the same features (I'll dwell more on this in the final part of the post).
Virtual assistants and voice search
You're hardly surprised anymore when you hear someone okay-ing Google in your local grocery store. The virtual assistant trend — Apple's Siri, Microsoft's Cortana, Google Now — may have been perceived as a fad when it first started out. Mostly, we started using it not because we believed this kind of technology was useful, but because we were willing to see just how well it'd work. ("Can it really understand me?" — Yes, with an 8% error rate, which is going down steadily. "What if I try swear words?" — You'll be told to mind your language. "What if I speak in a Scottish accent?" — Totally fine, but to be on the safe side, you might want to avoid saying "eleven".)
In 2014, the trend literally took a whole new shape when Amazon developed Amazon Echo — a device whose exclusive function is to be a virtual assistant to its owner. Echo's capabilities go way beyond chit-chat and answering questions; it can play you music and audiobooks, create to-do lists, set alarms, and control smart devices in your home.
On the list of virtual assistants, Google's has been seemingly lagging behind. Unlike Siri or Cortana, Google Now feels somewhat impersonal — and its major function remains searching for things on Google, not giving you direct answers and solutions.
But as things go with Google, just because they've been quiet on something doesn't mean they haven't been working on it. This May at Google I/O, the company introduced the Google assistant — a conversational lil' helper that "understands your world and helps you get things done".
The Google assistant is incorporated into Google's two new products — Google Home, an Echo-like voice-activated device for your home, and Allo, the new messaging app that lets you interact with the Google assistant directly in your chats "either one-on-one or with friends".
I might have to say this again. Either one-on-one, or with friends. Google's created a messaging app which you may want to use to chat with a helpful robot tete-a-tete. Yup, kind of like in "Her".
The impact on SEO
Google reports that currently, 55% of teens and 40% of adults use voice search every day. And that's a fast growing market, with the ratio of voice search growing faster than type search. From the SEO standpoint, the queries made with voice are different from typed queries. People don't search the same way with their keyboards as they do with their vocal chords. Voice search is conversational search where people use more natural sentences instead of the odd-sounding query language.
For SEOs, that means we've got to make adjustments not only to how we do keyword research, but also to the language we use when we create content. The first step towards this is understanding the customer's conversational speech. More than before, we'll need to do research to discover phrases customers use to describe problems, and interview customers to get a grasp of the language they use when they talk about the topic our content is focused on.
Pretty soon, you'll likely find consumers making queries like "Okay Google! I'm looking for — what d'ya call it — an app that will help me get things done with reminders and stuff". That won't make a great tagline for your to-do app, but little bits of it can help you create the kind of Web copy that resonates with your customers' language.
With virtual assistants, things are a little more complicated. Siri, Cortana, and now the Google assistant can be used to search for things online; but their more important function is getting things done for people instead of presenting them with a list of possible solutions.
You might tell Google, "What's playing tonight?". Today, because of our voice recognition and natural language processing, we understand that you're probably talking about movies. You can imagine going a step further, over time. If I'm asking it on a Friday, to have the context that maybe I want to watch with my family, and give you three movies you might like. I might then say, "Is Jungle Book any good?". Then I might ask it to pick up tickets. The next day, I might pick up the phone and Google says, "It's a few hours before the movies, and your tickets are here".
So far, it looks like the things that virtual assistants will be able to do (finding and booking a restaurant, buying tickets to the movies, etc.) are still going to be largely powered by search, opening up a new market for SEOs — one where we'll be competeing to become the service that virtual assistants are able to easily use to complete tasks.
Natural language processing
Give a computer a piece of text, and chances are it'll make very little sense of it. Say, you feed this article to a language processing system. It looks at words and phrases within it and counts how frequently each is used, calculating something called "term frequency", or TF for short.
It might reasonably conclude that I use the term "AI" 7 times, and the word "part" 10 times. Which one is more important to the article? How does a computer figure out what the article is really about?
That's where machine learning comes in (yes, again). Language understanding systems are trained on large corpuses of data, such as Google's New York Times Annotated Corpus — a set of 1.8 million articles spanning 20 years. By studying a large dataset of text on a wide variety of topics, these systems can learn all kinds of things about text, and eventually figure out what any given piece of content is about.
There's a number of factors that help search engines process and understand text.
TF*IDF stands for "term frequency–inverse document frequency". It's been long used to index web pages. The metric measures the importance of a given keyword or key phrase in a given piece of content.
The term frequency is exactly what the name implies — the number of times a given word appears in a document, divided by the number of words in the document. For example, in this article, the TF for "AI" and "part" would be calculated like this:
TF("AI") = 7/4096 = 0.00170898
TF("part")=10/4096 = 0.00244141
The inverse document frequency is a measure of how common (or rare) a term is across all documents, used to diminish the weight of terms that occur very frequently in the dataset and increase the weight of terms that occur rarely. For Google, this dataset would be the Web; for the sake of this experiment, I'll look up the two terms I'm analyzing in Google Ngram Viewer.
Based on that data, let's assume we've got a collection of 1,000,000 documents, about 3 of which contain the term "AI", and 400 of which contain the term "part".
To calculate the inverse document frequency, we'll need to divide the total number of documents in our dataset by the number of documents containing the term we're analyzing, and then take the natural logarithm of that number.
IDF("AI") = ln (1,000,000 / 3) ≈ 12.71689827
IDF("part") = ln (1,000,000 / 400) ≈ 7.82404601
Finally, let's go on and find the TF*IDF for both terms.
(TF*IDF)("AI") = 0.00170898 * 12.71689827 ≈ 0.02173292
(TF*IDF)("part") = 0.00244141 * 7.82404601 ≈ 0.0191017
As you can see, although the first term is used less than the second, its TF*IDF is higher because it is more rare in other documents. This way, we can tell that the article is more about "AI" than it is about "parts".
TF*IDF is a great way of finding how prominent a term is on your page; and since we know for a fact that Google is using it, it can be incredibly useful in helping marketers understand how relevant a certain piece of content might be considered for a given query.
But if you've been following the calculations attentively, you may have spotted a problem in the formula. True, you can artificially inflate your content's TF by using a keyword obsessively over and over again; this would, in turn, result in a greater TF*IDF. That's exactly why this formula is only used as a base for determining relevance; next, other factors come in.
2. Synonyms & entity salience
Google reports that synonyms play a role in 70% of searches. The search engine has a complex synonyms system, accumulated over 10 years of research. Initially, the system was mainly based on data from dictionaries; now, with machine learning taking over the somewhat static traditional approach, it's increasingly being based on the information Google learns (both about searchers and web pages) on the go, as it processes queries. Interestingly, the search engine also draws synonyms from the anchor text of a document's backlinks.
The synonyms system allows Google to match documents to queries even if searchers use different words than a given web page. This works especially well for "entities", where Google is aware of several alternative names for the same concept. Try running a search for "big blue" on Google.com and you'll see what I mean.
Co-occurrence often has a lot to do with synonyms and related terms, and helps Google narrow down a page's broad topic to a more specific focus. It also helps solve the problem of disambiguation for queries where the same keyword can refer to more than one concept.
Say, if a page is about "Java" and also mentions terms like "island", "Indonesia", "Jakarta", "population", Google will reasonably decide that the page is about Java the island. On the other hand, if it includes concepts like "programming", "JVM", "software", "computer", etc., that would signal that it's probably about the programming language.
To a degree, co-occurrence also helps Google figure out the quality of content in terms of how well it expains a topic or answers a question. Say, if you search for "Scarlett Johansson", Google may look for a page that also mentions related terms such as the names of the movies starring the actress, her date and town of birth, awards she won, etc. If a page does not mention these co-occurring terms and instead contains names of other actors and actresses, it probably doesn't have in-depth information the searcher is looking for, and may hence rank lower.
Physical distance between certain words in a document can also indicate how related they are, but it doesn't work in a universal way for all page elements. There's also a concept of semantic distance — a way of figuring out the relationship between a pair of words. For example, terms in page titles and headers can be physically very distant from a certain word at the end of the article; but semantically, they are considered to be as close to it as they are to any other word in that article. Similarly, concepts discussed in lists are all considered to be equally close to each other semantically, regardless of the order they appear in.
In isolation, almost every search query is confusing. To understand what you are really looking for when you search for something, Google may use different aspects of what it knows about you — as a searcher, Internet user, and as a person.
Personalization isn't exactly a new concept in SEO. We're used to the realization that our search and browsing history play a part in the results we get. And chances are that the ways Google uses "context" to modify search results will get increasingly complex.
This has started to happen already. For example, if you search for "Zara," you could get some general information about the brand and an infinite list of stores from around the world. But if you search for "London" first and for "Zara" next, the search engine will remember that you were just looking at London, and will automatically narrow down the results to that area.
Here's another example. If you're looking at a Facebook post on an Android phone and, while you're at it, hold down the home button to activate voice search, Google will scan the content you are looking at — so that it can find relevant information without you having to copy and paste things around.
Then, of course, there's location. It's no news that Google will often adjust the search results for you based on where you are; but it's also learned to use the searcher's location in more complex ways. Now, if you're outdoors with your smartphone, you can ask Google, "What's that building?" or "What's that restaurant?" — and it will look it up for you, using your location as a kind of context.
Over time, location awareness will likely grow even more powerful. Behshad Behzadi, Google's director of search innovation at the Zurich lab, mentioned that the company is looking to make the use of searchers' location more proactive — which would, for instance, let you get alerts on things to see or events to attend nearby when you are out for a walk.
Measuring user satisfaction
Google measures the effectiveness of every ranking algorithm change they test in terms of what they call "metrics". Those metrics are various user satisfaction signals; simple examples of those are SERP click-through rates and pogo sticking (which occurs when a searcher clicks on a search result and then bounces back quickly). But there are also more complex, sci-fi-ey kind of metrics Google may soon start looking at. Such as the searchers' facial expressions.
One of Google's recent patents describes a technology that would modify rankings of search results using "biometric indicators of user satisfaction" (or dissatisfaction) with a certain result. These indicators would be captured with the camera of a searcher's phone, as shown in this illustration:
So let's say you search for "San Francisco restaurants" on your mobile phone. The top result is a restaurant you've been to before and didn't like (so you might instinctively frown as you see the listing). This is captured by your phone's camera and seen as a negative signal by the search engine — which can potentially result in pushing the restaurant down in the search results.
On the contrary, if your reaction to a search result implies that you're "lovin' it", Google might count that as a positive signal, and subsequently up-rank the result in question.
But wait, there's more. In addition to your facial expression, Google may also start to use searchers' phones to measure things like body temperature and heart rate, which can also communicate an emotion (or so the thinking goes):
Determining that one or more biometric parameters indicate likely negative engagement by the user with the first search result comprises detecting an increased body temperature, detecting pupil dilation, detecting eye twitching, detecting facial flushing, detecting a decreased blink rate, or detecting an increased heart rate.
The SEO tactics these trends will inspire
Google has been the pioneer in many technological advancements we've seen recently. With search being Google's main "product", it's only fair that SEO is evolving just as quickly.
Increasingly, SEO is being referred to as a complex science than as a marketing strategy. There are more and more data points to take into account when performing SEO analysis — many of which, as we've figured out, can vary dramatically from query to query.
This makes most of the old-school "quick SEO fixes" useless at best and harmful at worst. So if there are no universal benchmarks to compare your metrics against, what do you do?
My bet is that SEOs will increasingly turn to the following two tactics.
1. Competitive research
We've been hearing it for a while: you should create content for humans, not bots. There's truth to that. Search engines are getting more and more like humans — so yup, you've got to make your content as appealing to living creatures as you possibly can. You've got to make it the perfect piece searchers are looking for.
But there's still an important question you've got to answer: What are searchers really looking for?
To find this out, you turn to — you guessed it — the search engines. It's a loop of sorts. Search engines' machine learning technologies like RankBrain already know what searchers are looking for, and they use this data to rank pages in SERPs. So your job is to reverse-engineer your top ranking competitors (emulate RankBrain, if you will) and look for common traits in their content — so that you can create content that meets all the criteria.
That's what you need SERP analytics for. In SEO PowerSuite's Rank Tracker, for example, we call it SERP History — an archive for the top 30 search results for every ranking check you run (to save SERP history, you'll need a Rank Tracker license key).
This kind of analysis lets you see how the SERP leaderboard has been changing over time. Examining those competitors in detail and looking for common features in their content is how you identify the factors that work for a specific keyword or industry.
Similarly, SEO PowerSuite's SEO SpyGlass lets you compare your link profile to competitors' (1 competitor in the free version, 5 in Professional, and 10 in Enterprise) and analyze every competitor's backlink profile in details to see what kind of link profiles (in terms of anchor text diversity, for example) are likely to work best in your industry.
2. Data visualization
With the technology behind search growing more advanced, the metrics you're analyzing for SEO are getting more and more plentiful and complicated. To make sense of all this elaborate data, SEOs will inevitably need visualization tools (especially if you think about presenting the increasingly complex data to clients).
Here's an example. Let's say you use Rank Tracker's SERP History archives to figure out the factors that matter in your industry. Let's say you've analyzed the leaderboard and incorporated those insights into your content.
But of course, you can't stop there. Machine learning systems are getting training data non-stop, which means the weight of certain ranking factors for a given query may change as soon as user behavior metrics dictate so.
Let's say you have a post about 5 innovative SEO trends that will change the future of SEO for good, and you want this post to rank for a number of keywords on Google. You've analyzed competitors and figured that long content with visuals and, say, your main keyword in the title tag, tends to rank best; so you go on and incorporate all these things into your page.
A tad later, someone suddenly comes up with a way to explain all those futuristic SEO trends in a very simple way and in a piece of content that's considerably shorter. And searchers really like this shorter explanation a lot. Google notes that — and suddenly, the length of your content, which used to give you an advantage in search, is working against you.
Clearly, there's no way for you to monitor this manually and spot these changes quickly. That's where visualization can be of massive help. If you look at the SERP fluctuation graph in Rank Tracker, you'll be able to instantly spot important changes in the SERPs which call for your immediate attention. The graph measures the difference between the top 30 search results for every keyword during each of your ranking checks. The red spikes on the graph will instantly let you know there's been an important change in the SERP that you need to look into.
Visualization will work equally well in literally any aspect of SEO — especially if you combine it with deep competitive analysis. That's the winning combo that will help you identify search trends, pick the most important aspects to focus on, and, if you do SEO for clients, explain the advanced concepts to them in a way that is intuitively simple to understand.
As always, I'm looking forward to your thoughts and questions in the comments. What are your predictions for the future of search? Which factors do you believe will continue to play an important part in SEO, and which will gradually fade away? Please share your ideas below.
By: Masha Maksimava
P.S.: How about a chat in person? The SEO PowerSuite team and myself will be attending SMX East in New York City on September 27-29, 2016. This year's agenda consists of 50+ new sessions covering innovative tactics in search engine optimization (SEO), paid search (PPC), mobile and more. If you're coming (and you really, really should be), please stop by the SEO PowerSuite booth to ask us anything about the tools or SEO in general, get a quick SEO PowerSuite demo, or even just to say hi :)