Latent Semantic Indexing is an information-retrieval technique invented long before the Internet came into use. At some point, when Google started to improve its ranking algorithms, controversy arose over LSI keywords and whether they are beneficial to website SEO.
In this article, I'll dwell on the origins of latent semantic indexing and the concept of LSI keywords. And I’ll show a couple of LSI keyword generator tools to use in copywriting. Enjoy!
- What is Latent Semantic Indexing?
- What are LSI keywords in digital marketing?
- Do search engines actually use LSI keywords?
- The role of LSI keywords in SEO
- How do I find LSI keywords to include in content?
- How do I incorporate LSI Keywords into my content?
- Final tips on using LSI keywords in digital marketing
What is Latent Semantic Indexing?
Latent Semantic Indexing, LSI for short, is a mathematical technique that finds relationships between words in a collection of documents. Using LSI, we can compare a dozen texts and conclude that some of them are similar by topic. The algorithm elicits similarity even if the main topic keyword is not used in some texts directly.
In other words,
- ‘latent’ means ‘hidden’
- ‘semantic’ is related to ‘word meaning’
- and ‘indexing’ is done for ‘information retrieval’
The need for latent semantic analysis appeared at the time when computer capacities grew, and programmers sought to improve users’ access to information. Text-related information processing demanded more efficient semantic analysis. LSI technique was aimed to tackle several issues of text analysis, namely, synonymy and polysemy.
What are synonyms?
Synonymy is a linguistic term describing the existence of different words for the same thing or concept. For example, for the way you go, you have a number of words to describe it as a route, a road, a drive, a passage.
What is polysemy?
Polysemy is a linguistic term for one word having more than one meaning. Polysemes have different, nevertheless, related meanings. Take the word drive: you can drive a vehicle, or you can drive your friend home from a pub, or you can simply drive for a long time. Another thing is that you can drive somebody mad. The word can mean a determination, a journey, a wide pass for vehicles, a computer part, etc.
What is homonymy?
A slightly different phenomenon is homonymy when words are spelled the same (homographs) or sound the same (homophones), but mean different concepts, unrelated by origin. For example, you have to be or not to be as a verb, and there is a bee as an insect.
These linguistic phenomena are the driving force of all puns and humor in art and literature.
Yet, synonymy and polysemy are the main reason why exact keywords match won’t do for search engines.
LSI reveals underlying semantic structures that can be hidden or obscured due to the variability of wording. This technique allows finding similarities between several documents in a collection of texts and retrieving the most relevant of them to a searcher’s inquiry.
LSI is a patented technology, published in 1988 (and the patent expired in 2008).
LSI uses a term-document matrix and Singular Value Decomposition (SVD), a common linear algebra technique, to learn conceptual correlations in a body of texts. Unless you are familiar with operations on matrices and eigenvectors, it will take time to grasp the idea of how it works, but here is a short try.
- LSI begins by constructing a term-document matrix, to identify the occurrences of unique terms within a collection of documents. Rows correspond to terms, and columns correspond to documents, the cells indicate how many times the words occur in the documents.
- Once the term-document matrix is constructed, it’s cleared of stop words (pronouns, articles, function words), and some word forms are truncated (the so-called stemming is done, although it may be not necessary for the language). The terms are now represented in a bag-of-words model.
- The entries in the term-document matrix are often transformed to weight by their estimated importance (for example, by the TF-IDF method, it will be described further).
- Then, SVD is performed on the matrix to decompose it into three other matrices. Each term and document gets a vector representation in orthogonal matrices, the diagonal matrix shows singular values arranged in descending order. Only the largest values are kept, the remaining are set to zero. The choice of k factors for matrix reduction is empirical and relates to the size of the collection. Thus, SVD reduces the matrix size while preserving the main semantic structures.
- Then, the data are compared by taking the cosine of the angle between the two vectors, formed by any two columns (there are also other ways of comparing, for example, by Euclidean distance.
The calculations identify co-occurrences in the body of texts, helping reveal concepts common to several documents in the collection of texts. The benefit of LSI is that it helps eliminate noise and transform a very sparse TDM matrix into a low-rank approximated one that reveals common structures. The shortcomings of LSI is the calculation complexity.
This is an animation of LSA example from the introduction to topic modeling tutorials.
LSI can be used to compare terms to terms, documents to documents, and terms to documents. As a more specific case, it serves to find the neighboring terms (those are the terms closest by their weight), finding a cluster of words closely related to one concept. These can be not only synonyms but also opposites, or simply words that often go together with the major topic. Thanks to the word clustering LSI does, it is effective for document search and categorization.
What are LSI keywords in digital marketing?
LSI keywords are words that are semantically related to the main topic keyword of the page and can be found in a variety of similar texts.
For a simple understanding of what LSI keywords are, let’s have a look at a random query, for example ‘climate change’. First, think of the associations you have with the word phrase.
If you type it in the search bar, you will get a bunch of pages of various kinds. Google pulls out the definition of the term from Wikipedia in a featured snippet, highlighting with bold type the most important terms associated with climate change: ‘ice melt’, ‘ocean warming’, ‘sea-level rise’, and ‘ocean acidification’.
Down the search results page, we’ll find a couple more relevant terms, such as ‘global warming’, ‘greenhouse gas emissions’, etc. These are words and phrases that appear next to our major keyword term in most of the texts.
The tricky question about LSI is...
Do search engines actually use LSI keywords?
For all those asking whether Google uses LSI keywords, there is one short reply from Google representative John Mueller once and for all:
So, why is Google associated with latent semantic analysis? We know for sure that Google’s search engine distinguishes polysemes and synonyms. For popular queries, at least several results on the SERP should cover approximately the same aspect of the topic, since Google successfully identifies the keyword and distinguishes polysemes (of course when you specify it but also based on your search history), and even interprets the intent of the query to draw the most relevant texts.
What is more, every day, Google gets 15% of searches that it has never encountered before. How does it handle them?
The truth is, one can hardly mention any Google research paper on LSI keywords showing at what stage LSI might have been implemented in its algorithms. For sure, nowadays Google uses more advanced natural language processing algorithms to scan the ever-expanding web. Bill Slawski puts it clear here why Google hardly uses LSI for search, and quotes patents as of 2017, stating as an example that a newer Google algorithm RankBrain is based on a word vector approach.
From the latest algorithmic updates, Google uses BERT to improve the relevancy of search results to user queries. The neural network for natural language processing is used for passage ranking or to understand the deep semantics in videos, which seems to be much more complicated than LSI.
Related reading Google’s MUM: Search Updates and SEO Implications
LSI was invented at the very rise of the Internet. For the web that big as it is today, LSI is not practical, let alone sufficient.
One thing to keep in mind is that LSI is only one of the many techniques of semantic analysis, alongside Probabilistic Latent Semantic Analysis, Principal Component Analysis, Latent Dirichlet Allocation, Word2Vec, etc.
The role of LSI keywords in SEO
While the LSI technique is dismissed as being too old and simple for modern-day search needs, the term ‘LSI keywords’ is used by content marketers to describe the scope of optimization work done on a page. So, what is the value of LSI keywords for SEO?
The main benefit of LSI keywords is that you can use them to improve on-page SEO. LSI tools are not aimed to tweak Google’s algorithms. They focus on text analysis to find words and phrases that naturally occur side by side based on texts already available on the SERP.
LSI lets you enrich context with semantically-related keywords. Using LSI keywords should help you create the natural context for the query and cover the topic more in-depth. You can treat it as a kind of a helper in content writing.
The term ‘LSI copywriting’ is used in content marketing to denote the process of adding related terms to your content. Roughly speaking, SEO copywriting tends to get away from outdated and unnatural keyword stuffing techniques. It rather focuses on creating user-friendly content: copywriters should make texts written naturally and bring added value to users (the same thing that search engineers strive for).
So, when we talk about LSI keywords, we mean finding relevant related keywords that can be added to improve content. In that case, we speak of it as a marketing concept used by content creators.
How do I find LSI keywords to include in content?
First, think. If you’re an expert, you will have plenty of ideas to develop in your article. What if you are out of ideas? Use keyword tools.
1. Free tools from Google
The first method that comes to mind when you are set to find LSI keywords is to use Google keywords suggestions. However, when talking about Google’s regular keyword tools, we cannot use them to refer to as LSI keywords generators, since Google’s algorithms are not about latent semantic indexing.
While Google autocomplete is undoubtedly the best source for keyword discovery, this is not always what we mean by LSI keywords. Besides, take notice of the difference between long-tail keywords and semantic LSI keywords. Long-tail keywords already incorporate your main keyword, most likely they will fit into your content, and you will probably want to track them as your target keyword phrases. Whereas LSI keywords may not even include your target keyword at all.
People also ask
Down the SERP, you will always see the universal search result known as the People Also Ask box. This place is more likely to give you a couple of great semantically related topics.
You will see more questions and answers while you unwrap the box. The more questions you see, the more questions Google suggests. However, the suggested topics will grow more and more distant from your core theme.
The clues found in the PPA box are a great choice of LSI keywords to optimize for mobile voice search and FAQ boxes.
Google Related Search
Here is another free keyword generator tool from Google. Down the search results page, go to inspect the Related searches results, the topics that happen to be the most frequent ones next to your major search term. In similar searches, you’ll find a couple of good subtopics to add to your major content and make your article more in-depth. Synonyms and related terms are a nice way to enrich your content.
Google Images is another easy method of finding keywords with the help of labels. The tool suggests the most popular short-tail keywords, closely connected by semantics to the target keyword, and represented in abundance in image results.
Related reading: 20+ Free Keyword Research Tools
A simple LSI keyword generator tool for academic research is XLSTAT, an add-on for Excel. XLSTAT offers a two-week free trial to give it a go and a demo spreadsheet, showing how to apply LSI on your document-term matrix.
First, you will need to create your DTM with binary values for word occurrences in your texts. Then, with XSTAT activated in Excel, go to Advanced features (pressing the + button) and select Text mining > Latent Semantic Analysis. Proceed to set the settings you want for your data, and click OK to apply.
The tool will provide you with the list of topics that the LSI tool elicits from your data. To quickly interpret the quality of the results, the tool generates a scree plot, measuring the importance of the topics via eigenvalues and cumulative variability percentage. There are also visualizations of relations between terms and between documents.
3. LSI Graph
LSI Graph is a nice semantic keyword tool that speaks for itself. It allows performing 10 searches per day for free. Just go to the website, paste your seed keyword, and you will get a list of LSI keywords, accompanied by SEO stats that help you pick the most promising keyword phrases. The results will bring a bunch of ideas to enrich your content with more topics or features.
In LSI Graph, you can see the search volume for the keyword, cost-per-click costs, and trends over a time span. LSIGraph performs an LSI keywords search using its own proprietary measurement known as Latent Semantic Value (LSV). In the right-hand workspace, you will see top-performing content with active links to check them out quickly.
LSI Graph offers also premium features, including bulk keyword management and the Semantic Writer tool. The tool lets you optimize content in-app, generate LSI keywords and see them next to your content, measuring word count, keyword density, etc. In fact, the Semantic Writer offers a helping hand to SEO copywriters, with a special focus on researching LSI keywords.
Keysearch is another free tool to discover LSI keywords for your content. The keyword-finding algorithm behind the tool goes through the first page of Google search results for your main keyword and analyzes all on the ranking pages to find words and phrases most frequently used in them.
Again, you will get all your keyword research stats such as search trends, CPC costs, and even the strength of the domains ranking on the SERP for the keyword, together with their links, organic traffic, and social media popularity.
Keysearch offers a Content Assistant tool that uses the deep analysis feature algorithm. It adds another level to finding LSI keywords. The tool includes related searches from Google plus the top-ranking keywords for the first result in Google. This way you find the most profitable key terms of the best page that draw the most organic traffic to the website.
Thus, Keysearch combines features of a keyword tool for research with a content writing tool that helps create content based on SERP analysis. This is a simple and easy way to generate LSI keywords to add to your content that are pulled by automatic analysis from top-ranking results, Google’s related searches, and question boxes.
5. Content Editor
Content Editor makes part of WebSite Auditor, a tool from SEO PowerSuite software that combines the functions of a site crawler and content optimization app in one. For content creation, WebSite Auditor has a separate module to audit individual pages, and the smart writing assistant tool to optimize pages in-app.
For finding LSI keywords, launch the WebSite Auditor and go to Page Audit > Content Editor module. Hit the + button to add the URL of the page you will be optimizing (the existing page or new one), then proceed to add your target keyword for the page.
The Content Editor tool will analyze the SERP for top-ranking pages and provide on-page optimization tips.
In the main window, you will have the editing space where you can create your content and see the optimization score improving right-hand right in-app.
Alternatively, for content creators, there is an option to export recommendations in a PDF file and hand them over to use in some other writing tool.
The keyword count field is editable. You can see the existing keyword frequency on the page and how to improve it by using more or fewer keywords. You can edit this field manually (and you can add more of your LSI keywords manually as well).
There is a special TF-IDF tool in WebSite Auditor which stands for ‘Term Frequency — Inverse Document Frequency’. TF-IDF measures the importance of a keyword phrase by comparing it to the frequency of the term in a large set of documents. Basically, this content analysis technique follows the same steps as LSI before SVD is applied. Whereas LSI finds out which topics are common for which documents in a collection of texts, TF-IDF simply weighs terms in them.
The beauty of the TF-IDF tool in Content Editor is that it shows the word usage in clear visualized graphs. It shows the average keyword count on competitors’ pages and calculates the keyword count you should use on your page. The quick suggestion tool recommends adding a new keyword or using less of some keywords to avoid keyword stuffing.
The Content Editor provides recommended number of keywords to be used in your content, taken from your best competitors' content and filtered by the TF-IDF parameter. You can unwrap the list of competitors and see the URLs, together with the traffic the page has earned from the organic search for the target keyword. You can preview the plain-text version of the page right in the tool or proceed to the site by a quick link from the tool.
Done with content analysis, the tool suggests topics and questions you should dwell on in your piece of content, pulled right from Google SERP ('People Also Ask' section). This helps you come up with more topic ideas and cover your major theme more in-depth.
As you keep adding new content, the weight of each keyword against the overall word count changes. A special Word cloud widget illustrates the weight of your keywords in the content.
How do I incorporate LSI Keywords into my content?
Can mentioning related words and phrases boost rankings? Not exactly, the effect is not guaranteed. You add relevant keywords to your content and expand the topic, you cover it more in-depth. Meantime, you get more keywords on your page, and your target keywords are supported by enhanced context. Search algorithms may reveal some additional queries that your pages are aimed for. This drives more relevant organic traffic to your site and contributes to your online visibility overall. But what additional keywords are best for on-page optimization?
- Start with thorough keyword research: examine what LSI keywords are present on competitor pages, probably, you’ve found some keyword gaps on your landing pages versus competitor pages.
- Choose the best LSI keywords: look at the total monthly searches and traffic that the words bring to competitors’ pages, explore the search intent of the keywords.
- Avoid keyword stuffing: unlike your target keywords, keyword frequency for LSI keywords does not matter. Besides, the trouble may arise if you overdo and the paragraph will look keyword-stuffed. So, just make sure you included the topic and developed it sufficiently for the users who read it.
- Focus on user experience: first, LSI keywords should guarantee a reduced bounce rate, since the context around the target keyword is more explicit, and there will be fewer irrelevant impressions and clicks.
- Think of better internal linking: use LSI keywords close to internal link anchors (at least in the same paragraph of your article). While this tip dates back to the early years when SEOs assumed that Google might have been using LSI to weigh keywords around anchor text, we see that the most important anchors on a page may influence search results features, in particular, sitelinks.
If you consider using LSI keywords in digital marketing...
Whether or not search engines use LSI today, the concept of LSI keywords is used by SEOs to assist in content creation. Understanding the role of LSI keywords, you can effectively make them a part of your keyword strategy. Just keep in mind that Google algorithms use hundreds of ranking factors where content is the king.
Whatever keyword finder tool or technique you apply, just focus on creating high-quality content. Don’t doubt the value of long-reads because great content gets a user vote and search engines see it.