30636
•
11-minute read
Entities are a bit of a mystery. They've been around for years now and have influenced many aspects of search, but they are rarely talked about. I guess part of the reason is that there is not much solid information on entities and notoriously ambiguous Google patents are not much help. But the other part is that, even if you understand entities, it is unclear if they can be used for SEO.
In this article, I have gathered what little information we have about entities and did my best to translate it from patent language to human language. And I've managed to discover a few outdated SEO tactics along the way.
Let's turn to one of Google's patents for the official definition of entities:
An entity is a thing or concept that is singular, unique, well-defined, and distinguishable. For example, an entity may be a person, place, item, idea, abstract concept, concrete element, other suitable thing, or any combination thereof. Generally, entities include things or concepts represented linguistically by nouns. For example, the color "Blue," the city "San Francisco," and the imaginary animal "Unicorn" may each be entities.
In fewer words, an entity is anything notable enough for users to search for it by name. For example, I'm not well-known enough (yet) to be an entity — I'm just one of many writers on the topic of SEO and no one is searching for me by name. But take Bill Slawski — he has earned his entity status by being a widely recognized expert on Google patents, by being linked to, mentioned, and interviewed all over the web.
Google is building its entity database using two distinct processes: copying existing entities and discovering new ones.
Right now, Google is mostly copying its entities from existing knowledge bases, like Wikipedia and IMDb. It allows Google to grow their own database quite quickly, but to keep it kosher because they only draw from a few trusted sources. The drawback is that those knowledge bases can be slow to include new entities and to update old ones, so Google is at risk of not serving the most relevant content.
To combat this issue, Google has patented a few methods for discovering new entities from unstructured data available on the web. One method suggests using known entities to see if they are connected to any unknown entities, either through syntax or by frequently appearing together within the same document. For example, if many documents say "Andrei Prakharevich is a writer at Link Assistant", which is a known entity, then Google might eventually wonder whether Andrei Prakharevich could be an entity.
Another method suggests measuring entity value against the size of its field, i.e. it should be easier to become a notable entity in a narrow field than in a broad one. For example, it would be quite difficult for a writer to become an entity within the whole field of SEO, but much easier to become an entity within a subcategory of SEO. Like Bill Slawski is in patents, Marie Haynes is in Quality Rater Guidelines, and Brian Dean is in backlinks.
Google maintains an ever-growing database of about 5 billion entities and over 500 billion of entity properties. The table below summarizes the types of information Google collects about each entity:
Data type | Example | Comment |
---|---|---|
Name | Californication | Using Google's own words, the name is the linguistic representation of the entity. But the entity itself is actually stored as a unique ID ↓ |
Entity ID | XXXXXX01 | Unique IDs help differentiate between entities with the same name. For example, Californication the TV series would be IDXXXXXX01, while Californication the song would be IDXXXXXX02. |
Class | TV series | An entity may belong to any number of classes and subclasses. For example, Californication is a TV series, but it is also a comedy and a drama. Classes are often entities in their own right. |
Attribute | August 13, 2007 | An entity may have any number of attributes. For example, Californication was released on August 13, 2007, ran for seven seasons, starred David Duchovny, was filmed in California, and is rated measly 57% on Rotten Tomatoes. Attributes are often entities in their own right. |
Relationship | Released on | A relationship is a way in which an entity is connected to other entities. For example, Californication was released on August 13, 2007, ran for seven seasons, and starred David Duchovny. |
Relevance | 0.84 | Relevance score measures the strength and/or importance of the relationship between entities. For example, Google may be 0.99 confident that Californication is a TV series, 0.74 confident that it is a comedy, and 0.36 confident that it is a drama. |
In case you are a visual learner like myself, an entity database may look similar to the scheme below, except much much more complicated. In this scheme, oval shapes represent entities, rectangular shapes are relationships, and the numbers are relevance scores:
The absolute easiest way is to google your entity and to see if it has a dedicated knowledge panel:
A more sophisticated way is to use an entity mining tool to get a list of all entities that google considers to be a match for your query. If we are using Mick Jagger as an example, then Google considers 40 entities to be somewhat relevant to the query, but Mick Jagger himself wins with the highest relevance score of 9,747, while the closest contender is Bianca Jagger with a relevance score of 3,055. The tool also tells us what is the type of our entity, like thing, person, organization, and so on.
And the most advanced way to check if something is an entity is to go to the source and use Google's Knowledge Graph Search API to view the code behind your entity. There is no advantage to this method as it's less user-friendly and the only additional bit of information it provides is the entity ID. But, if the tool above ever stops working, you can always fall back on this:
{
"result": {
"description": "Singer",
"image": {
"url": "https://pt.m.wikipedia.org/wiki/Ficheiro:Mick_Jagger_Deauville_2014.jpg",
"contentUrl": "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcQWKDS8YTwb0wu7sRIN4P_IblmoYNW1NVbnIxWgGQ-rhhlylU7H"
},
"@type": [
"Thing",
"Person",
"Organization"
],
"@id": "kg:/m/01kx_81",
"detailedDescription": {
"articleBody": "Sir Michael Philip Jagger is an English singer, songwriter, actor, and film producer who has gained worldwide fame as the lead singer and one of the founder members of the Rolling Stones. ",
"url": "https://en.wikipedia.org/wiki/Mick_Jagger",
"license": "https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License"
},
"name": "Mick Jagger",
"url": "http://www.mickjagger.com/"
},
"resultScore": 9747.802734375,
"@type": "EntitySearchResult"
},
Whenever there is a search query that includes an entity, Google would use its entity database to enhance search results. Most notably, Google uses entities to add knowledge panels, offer search suggestions, and up the relevance of served pages.
A knowledge panel is like a minimalist citation placed right on the search results page. Apart from the name, image, and description of the entity, the panel usually includes a few of the most important entity attributes, which vary depending on the type of entity:
Since Google knows the class of the entity, it can make suggestions regarding other entities from the same class. For example, if I'm searching for Arc'teryx, Google will identify it as an outerwear brand and ask me if I want to check out other outerwear brands, like Patagonia and The North Face:
Using the same logic, I can now search for the entire classes of entities and see some suggestions served right in search. For example, if I search for the best action movies, Google offers a whole range of entities classified as action movies to choose from:
Perhaps most importantly, entities allow Google to lower the influence of keywords and links as ranking signals, and instead look to the meaning of the content. What it can do is map the entities within a piece of content and see whether the map makes sense, whether all expected entities are present and connected to each other.
For example, if I'm writing an article about Google patents, then Google would probably expect me to mention Bill Slawski, who is an expert on patents, as well as the United States Patent and Trademark Office, where the patents are filed. Theoretically, pages that make use of all the right entities, in the right context, would rank above pages that do not.
There is an expectation that entities will soon do the job of backlinks and keywords. Perhaps not all of it, but probably most of it. And the ranking system will become much harder to game because Google will be able to analyze much more context than it used to — no kind of unnatural placement will fly.
To this end, if you want to future proof your SEO strategy you have to start building your own entities and strengthen the connections to other entities within your field. Here are a few things that you can start doing today:
Adding your own entity to Google Knowledge Graph would be an incredible asset for your SEO strategy. And of all the things that you have in your business, your brand is probably the easiest one to turn into an entity (unless you have some famous people on your team).
Make sure to grow your brand awareness through both SEO and marketing techniques. Come up with a unique brand name, have a well-defined positioning (e.g. Arc'teryx is an apparel company specializing in outerwear), be consistent with your brand attributes (location, date founded, founded by, etc.), create and maintain social profiles, create a few listings in key business directories serving your market, promote your brand and secure mentions (unlinked are fine too!) from other entities in your field.
If you don't want to wait for Google to find entities on your website, you can speed things up by using structured data. It offers a huge library of tags, which can be used to tell Google which bits of your content are entities and which are entity attributes. Specifically, local business schema can be used to tie your business to nearby geographic entities and increase your prominence in local search. Organization, person, and author markup can be further used to create connections between entities on your website and their profiles on other websites.
Claiming, optimizing, and maintaining your Google My Business (GMB) listing is the single most important part of any local SEO strategy. It does not necessarily turn your business into an entity, but it gets you most of the way there.
Google will use your listing to learn credible information about your business and to create connections between your business and other local entities and it will do wonders for your local rankings. So the effect is very similar to becoming an entity, it's just that technically a local business database is separate from the Knowledge Graph.
Whenever you are planning a new piece of content, make sure that it includes other entities that Google considers relevant to your subject. Here is how you can research these entities:
First, we know that Google uses entity associations to enhance search results. So one thing we can do is google the main subject of our content and see what kind of entity suggestions come up in search.
For example, if I were planning an article about best down jackets for men, I would google my subject and check the People Also Ask section for related questions:
Then scroll to the bottom of the SERP and check out related searches:
Then switch to the image search and scroll through suggested search modifiers:
We know that Google uses Wikipedia as one of the sources for its entity database, so it stands to reason that you too can use Wikipedia to look up entity properties and find other associated entities. For example, if I were to write an article about Mick Jagger, Wikipedia would tell me that I can't do it without mentioning Keith Richards:
And the final stop is the demo of Google's Natural Language API (NLP API) — the technology it uses to process text and single out entities. The technology itself is paid, but the demo is free.
What you can do is copy the top-ranking content of your competitors, run it through the NLP API, and discover most prominent entities mentioned within the content, like so:
The tool will likely discover hundreds of entities for every text, but only a few of them will have a significant salience score. Grab a few of the most prominent entities from a couple of your competitors, add them to what you've got from Wikipedia and Google search, and you've got yourself a solid list of entities that you might want to include in your content.
I like thinking of entities as the digital model of the real world and I'm curious to see which direction it will take in terms of SEO. Perhaps the model will get so good that we won't need SEO to explain our content to search engines. Or perhaps it will get incredibly complicated and we will have to use tons of structured data to help Google understand what's what. But whichever direction it takes, it is certain that the importance of entities is only going to grow and those who start using them today are fixed to win in the long run.