Google doesn't read the webpages the same way you and I do.
Not a very controversial statement, sure. But with the constant insistence on creating pages for "humans" (since at least 2006), it's getting easier and easier to overlook some technical fundamentals.
This is why in this article I'll talk about the sometimes neglected technical SEO, and in particular, how to perfectly optimize your images to boost rankings.
I'll go over what image optimization is and why it matters; the detailed steps to integrate images to your webpages; and which tools we can use to ensure perfect optimization (feat. Google Vision API).
By image optimization, I mean following the best SEO practices when adding images to our webpages to draw the biggest ranking benefits possible.
In more practical terms, this means implementing the best coding practices, adding schema markup for image objects, using the highest resolution available, and auditing your pages with WebSite Auditor.
There are millions of SERPs right now filled with many perfectly readable copies covering all the right keyword clusters, each backed up by plenty of backlinks.
A human being could not rank these pages from 1 to 10, to our eye they are all quite excellent. Yet Google has zero problems figuring out the top-3 results, which get about 55% of the clicks. And it's all about the way Google, not humans, sees our pages.
Look at these two pages and try to find the crucial differences that make one more rankable than the other:
Did you spot the differences? Because there aren't any, they are copies, and for a human, there's no way to find one more rankable or relevant than the other.
Now let's take another look at those two pages, maybe you'll see the differences this way:
background: url(1.jpg) top left no-repeat;
content: attr (data-shortname);
<h1>Duis aute irure dolor</h1>
<p class="image"> </p>
<p class="description" data-shortname=In reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
<h1>Duis aute irure dolor</h1>
<p class="image" > <img src="1.jpg" alt=""/> </p>
<p class="description" >In reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. </p>
Though the on-page result is exactly the same, they couldn't be more different in terms of code — to the point where the left one will likely not get ranked at all. And code is what the search engine actually reads, not the page as-is.
This is exactly why following the best practices when adding images to your pages is so important. How you add those images and place them around the page, which tags you use, what you write in descriptor tags, which images you make visible — all of these things matter to search engines that read that code.
Even though it seems like any and every website could benefit from better image optimization, the first ones on the list are definitely e-commerce websites.
A combination of high competitiveness and a heavy reliance on visual content makes this the #1 industry in need of optimizing images.
Consider this: with hundreds of thousands of pages, each sporting at least one or two pictures of the product in question, e-commerce websites have a lot of opportunities to screw up. Add to this the fact that correct optimization provides more room for implementing the right keywords, and the need for image optimization becomes obvious.
Now that I've established what image optimization is and why it's so important, I'll go over how to actually optimize your images.
Even though I'll be covering some principles that have been proven to work on practice, as with any SEO advice, you need to take everything you hear and read with a grain of salt, as ranking might behave differently for different topics, industries, and competitive verticals.
The king of image optimization advice is to use the <img> tag along with the <alt> attribute to get something along these lines:
<img src="img_girl.jpg" alt="Girl in a jacket">
Where <img> tag adds the image, and the <alt> attribute describes it.
And in a very general way, this is a good practice, but things, as usual, are much more complicated than they seem.
Let's start at the beginning.
When Google crawls your page, it first downloads all of the text content found there as a big heap of code.
Then, it starts breaking it up into groups — this is where HTML-5 comes into play, as semantic tags help Google orient itself around your text and figure out what's what.
At this point, an <img> tag with the <alt> attribute is read in the exact same way as your other text content, e.g. grouped by the <p> tag.
Once Googlebot gets to your image tag, one of the following things will happen, depending on how you implemented the <alt> attribute:
Before the crawler renders the page's non-textual contents, your <img> tag is effectively not read, while your <alt> attribute is read in the same way as any other phrasal content that's part of the webpage.
This is exactly why using images for better optimization requires you to, first, always add the <alt> attribute, even if you don't have the resources to fill out each one of them. And second, why you need to know the right way to add text to the <alt> attribute to enhance your ranking potential.
In the <img> tag, there is a <title> attribute, specifying the image's title, but separate from the image's actual name. <title> attribute can be important, but it's only ever used to evaluate an image if there is no <alt> attribute whatsoever. I.e. even if you have an empty <alt> attribute, the <title> will still not matter.
Therefore, unless you are knowingly disregarding my advice and not putting any <alt> text at all, changing your image file name won't influence your rankings one way or the other.
Now a word on captions. Even our colleagues over at Ahrefs advise using captions, but, again, things aren't as simple as that.
In <img> tag, which is what you should be using to implement your images, there is no such attribute as a <caption>. Instead, it's an attribute of a <figure> tag, which is also sometimes used to add images to a webpage.
Why aren't we using <figure> for our images then, it even boasts a <caption> attribute! The problem here is that <figure>, unlike <img>, has a very specific purpose — it's used to tell the search engines that the image within it is self-contained.
<figure> is specifically telling the search engine that it should interpret the image independently from the rest of your page, that your page doesn't need it, works perfectly without it, etc. In other words, it completely neutralizes any ranking boost that your image might've provided to your page.
If, however, you end up in a situation where you obviously need a large amount of text that wouldn't work as <alt> text — what you need here is simple HTML-5. Within those, you add a specific text. This text, in turn, is read by the search engines as any text would be and doesn't carry any big influence relating specifically to the image.
Two best practices to follow to get the most out of your images are:
What you need to remember here is that <alt> attribute for an image is not its filename, nor is it the duplicate of its title.
Unlike these two, <alt> text should be able to substitute your image, and work well in the context of the webpage surrounding it.
The two most common mistakes I see webmasters make are a) copying the title of the image or the webpage itself and using them as your <alt> text, and b) using the exact same <alt> text for a few different images.
Avoid this as much as you can, since duplicate <alt> text is often simply ignored by the crawler as any completely identical text would be.
Filling out the <alt> text just to have it filled out definitely doesn't boost your optimization, and weakens the effectiveness of images that could've helped you rank higher, had you filled out their <alt> text properly.
Instead, what you should do, is follow the same best practices for filling out text tags such as <p> — make your <alt> text comprehensible, start it with the capital letter as you would a paragraph, and put a period at the end of the sentence before closing the tag.
Managing image optimization for a website with hundreds of thousands of images is no small feat, and without a dedicated tool, it's plainly impossible.
In WebSite Auditor there are a few ways you can track how you filled out your <alt> attribute: you can get a bird's-eye view or go granular and analyze your image optimization on a page-by-page basis.
First off, if what you want is a general picture, so to speak, of your image optimization, you can use the Site Audit module, and scroll down to the Images column.
That way, you get an overview of all the broken images and empty <alt> text attributes WSA found on your website.
If, however, you want to check a particular group of images, you can head over to the Pages module, where all of your crawled images are kept.
There, you can add the image filters, and check a particular number of pages where image optimization is most needed.
If what you find is that most of your pages don't have <alt> text filled out, it makes sense to start with the pages that need a ranking boost most: your homepage, landing, and product pages.
Since integrating a properly optimized image is a time-consuming task, structure it to bring the biggest boost to the pages that need it most.
Once you've zeroed in on the particular page you want better optimized, you right-click, choose Analyze Page Content, and work on it in the Page Audit module, where you'll see not just if the <alt> text had been filled out, but also if you put the relevant keywords in there.
Within the <img> tag, the two very important, but sometimes overlooked attributes for SEO are src and srcset. Here's what these two look like in code:
<img alt="useful alt text" src="/img/tags/1/image.jpg" srcset="/img/tags/1/370/image.jpg 370w, /img/tags/1/600/image.jpg 600w" sizes="370px">
The src attribute is an oldie but a goodie, it's been here since the creation of the <img> tag and specifies the source URL for the image you want to implement.
The srcset attribute, in turn, only turned up with HTML-5 about six years ago, and its job is to define a certain set (hence src_set) of images with different sizes, resolutions, etc.
This is done in order to let a web browser only show the images that perfectly suit the device accessing the page.
The logical question an SEO expert will ask here is: but which image within this set gets indexed by Google?
It's simple: for an image to be indexed — use the src attribute. For an image to be shown — use the srcset attribute.
This means that unless you need to support older web browsers, you can make Google index your best, highest-resolution image, while the user accessing your page will get the image that best suits their device.
This can help enormously, considering that image quality is one of Google's best practices and overall important ranking factors.
Image resolution matters when you're trying to rank with visual content. Of course, Googlebot can figure out that you've used software to stretch an image and will be able to tell that the picture is blurry, hard-to-read, etc.
But even considering that it's still worth it to have the highest-resolution image indexed, since, even while counting it as blurry, it will still be indexed and considered a very high-resolution image.
A functional and free app that actually works is ImageMagick — since, in it, you don't just stretch your image while diminishing quality, but add layers and layers of filters. The more you decide to "stretch", the blurrier the image in question will get, but it's still one of the best free solutions out today.
On this blog, we're fans of structured data — since it's a great way to get some rich SERP features, which in turn leads to higher CTR's.
That said, there are hundreds of markups available that Google is able to read. They don't influence your snippets at all, so they get overlooked pretty much all the time, unfortunately.
The most important markup for image SEO is ImageObject schema. It's used to signal where the image is, and give information about it. Here's the basic version for it:
< <img itemprop="contentUrl" alt="Useful image description" src="image.url.jpg"/>
< <meta itemprop="contentUrl" content="imagethumbnail.url.jpg"/>
The main reason we need to be using this is the <thumbnail> attribute.
"For best results, provide multiple high-resolution images [...] with the following aspect ratios: 16x9, 4x3, and 1x1."
What it means is that for every image you upload, you also need to have three additional images that would have these aspect ratios. Of course, the trick here is that the user will never get to see these three additional pictures, nor do we want them to.
Through its <thumbnail> attribute, you'll be able to add three additional images for every picture on your page. Those thumbnails, in turn, will get crawled and indexed — they will also be the images appearing in search for the queries.
In other words, the webpage contains a markup, which in turn contains links to images that the user will never see thanks to the meta tags whose entire job is to tell the search engine that this particular image has these preview thumbnails.
The search engine, in turn, uses these thumbnails in their search results.
A very important question in image SEO is the question of uniqueness.
It goes without saying that you should never steal another website's visual content, as that's easily the grounds for a DMCA notice, after which you'll be liable for some serious damages.
While we all acknowledge that image uniqueness is an important factor for optimization, some of us still turn to stock images (myself included), simply because it's easier and faster.
In terms of strict optimization efforts, though, it's always preferable to create your own images, and not use stock pictures, even if you buy them. Consider this: even if you've bought a stock image, Google doesn't have that information. What it has instead is an image that it's indexed a long time ago.
Google knows where it comes from, its creator, date of publishing, etc. That means that even though you've posted it to your website, the only way to squeeze some benefits from it is to make sure that you got the highest quality possible, preferably — higher quality than what the stock had indexed in the first place.
In that case, you really have nothing to worry about. Unfortunately, it's a known practice for stock image providers to sell an image of either an equal or even a significantly reduced resolution than what they have up on the website.
In that case, that image will not help your optimization efforts even one bit, and you should definitely ask for your money back.
Image theft happens a lot online. Since Google loves pages with pictures on them and creating your own takes time and money, you can often find yourself in a situation where another website steals your images with your code entirely and is trying to use them to rank.
In order to avoid that, you always have to ask your coding dept to add specific attributes pointing out who the owner of the image is, either through IPTC standard or ImageObject markup.
The three most important attributes here are:
If your competitors just go ahead and copy the code you used for image integration, Google still considers you the actual owner and gives you preference because of these tags. If, however, the thieving competition thinks to delete those tags, Google will still give you preference as it encourages us to use the tags.
Once upon a time, there were many possible image formats to choose from: JPEG for some cases, PNG for the others, GIFs for yet third.
At this point, the issue of the "best" extension is solved much easier: there are no real competitors to the webP format today.
The short and easy of it is that almost every single web browser popular today supports webP natively, conversion takes next to no time at all, and the images in webP are about 25% smaller in size compared to the same images in JPEG and PNG, whether we're talking about lossless images or not.
Sure, there are extensions that can be even more effectively compressed, but they simply aren't supported by the same number of web browsers.
Ultimately, it comes down to business costs — webP is supported by over 80% of all the global Internet users' browsers. This means that it's simply smarter to go ahead and invest in this 80% of users and use webP. For the other 20%, though, we fallback for a good-old JPEG.
I highly recommend this short introduction for handling webP: it covers why we need to move, and how to go about converting your current images to this new extension.
An extremely popular piece of advice you can find about image SEO is: use lazy loading and CDN for your images. They are considered a must-have if you don't want to demolish your page speed.
Page speed is important, of course. But when should we sacrifice other aspects of optimization for marginal improvements in page speed?
Let's dive deeper into this and see if lazy loading and content delivery networks are really all they're cracked up to be and when (if ever) should you use them.
What it is: Lazy loading is a technique in coding when you don't actually load all of the objects on your page immediately, but instead trick the browsers into loading the images as they come up, thereby winning in page speed.
Short answer: If you don't need any optimization boost from the image itself and want to concentrate on speed — use it. Lazy load, despite the name, is great for speed.
If, however, you want that ranking boost, and you don't have literally hundreds of images per page, then I'd avoid lazy load.
Thing is, it's very difficult to make sure Googlebot interprets the js logic on our webpages 100% of the time. Because of that, lazy loading can knock out all of the ranking benefits our images might have gotten us. The page speed trade-offs, in turn, often aren't worth it and are sometimes negligible.
This made a lot of people understandably angry and confused, since, even though Google's been recommending lazy loading for years now, throughout that time Googlebot was not able to read the lazy loading instructions since it was crawling our code only as a huge heap of HTML text.
That meant that following the advice to try lazy loading you were shooting yourself in the foot. There was a way out of it, of course, and that was the <noscript> tag, but that still wasn't a perfect solution, since the crawler was not able to interpret js logic correctly and pick up what we wanted it to.
Of course, things are somewhat better now since Googlebot started using Chrome 76 to crawl our pages.
However, the problem still persists: the crawlers simply can't always efficiently follow the instructions of not really indexing what you put into src, but instead picking up what you put into data.
Added to which, actually adding lazy loading to a webpage with, for example, 10 images on it, is not very functional — the majority of your audience will be able to access your images perfectly fine. In that case, lazy loading won't even do much for your speed, since the speed was great in the first place.
Overall, I'd say that if you don't really care about organic traffic from the images and you have a page with dozens and dozens of pictures, then absolutely feel free to use lazy loading. And if not, not.
What it is: CDN (content delivery network) is a system of servers that are rented out to different websites to increase their page speed and handle excessive traffic.
Essentially, your users answer your webpages faster, because the servers they are actually accessing are closer to them.
Short answer: if you're dealing with a huge global audience that has a ping to your servers of over 80 ms, then sure! But if not, CDN can end up doing more harm than good, since we're making web browsers work much longer for the more or less the same result.
As far as actual results are concerned, CDN is helpful only in very specific circumstances e.g.:
Other than that, CDN's are not super useful, and that is due to the very nature of CDN.
CDN is functionally another server that you're renting to provide a faster experience to your audience since the server is "closer" to them.
But what actually happens with a functional CDN is that, no matter how you set it up, you're forcing the browser to create two connections: one to you, and one to your CDN.
It takes a browser, at the fastest, about 30ms to create this connection to your CDN. And that's not considering things like TLS/SSL handshakes, congestion, crappy routing, and so on, and so forth. This means that, at the end of the day, the ping won't be the advertised 30ms, but much higher.
The actual time needed to create a connection between the server and the browser could go up to hundreds of milliseconds. And those lost milliseconds are exactly the benefit that a webmaster is trying to get by using CDN in the first place.
As a result, CDN can surely be helpful, but only if your audience's ping to your server exceeds 80-100ms. Consider, though, that this is the ping (not counting the technical issues) of a browser accessing a server half the world away. And I can't readily imagine a website with a sizeable audience so far away that wouldn't simply have their servers closer to their audience.
Additionally, we've been told for the last decade that the host location affects our rankings, and it's better to have the server closer to your audience.
And in a situation where your ping is lower than 80ms, your CDN framework ends up, at best, not providing much of any benefit, and, at worst, providing worse performance than you'd have otherwise.
One of the most useful tools in the SEO expert's arsenal optimizing images is Google's Cloud Vision API. Luckily for all of us, there is a working free trial, and using it after you're done with your trial is very easy as well.
What you can do there is add the image you're trying to add to your webpage and see exactly how Google will interpret your image. Here's what it looks like:
It's a very useful tool, not just because it shows that Google can recognize a castle when it sees one. Through it, you can see how Google reads text on your pages, how your image will work with Google's Safe Search and more.
In Labels column of Google Vision API, you're basically going to see which search queries the media analysis will point to after your image has been crawled, indexed, and analyzed.
The industry that would benefit from this most is medical services. Unfortunately, a huge number of pictures uploaded on the medical service websites end up labeled as ‘Adult content'. With that, these images get all of the corresponding penalties and end up not shown as a Safe search result.
And since Safe search is the default setting for most people searching, and we ourselves don't usually turn it off when we search for medical service, those websites end up suffering a loss in rankings and CTR.
If you've read everything in this article up to this point — congratulations! You now know almost everything you need to know in order to start getting some serious ranking boosts from your images.
If you picked certain chapters and only read them — that's cool too, glad you found at least something useful.
There's a reason image optimization is overlooked — it's tough, and time-consuming, and difficult. But if you take all of the advice here to heart, and go that extra mile to implement it, the ranking dividends will follow soon.