Blog

Meta robots tag: how to use it?

Meta directives for robots (sometimes called "meta robots tags") are pieces of code that provide instructions on how robots should crawl or index the content of a website's pages.

While the directives of the robots.txt file give robots suggestions on how to crawl a website's pages, the tags in question provide firmer instructions on how to crawl and index the content of a given page.

 

What are these meta robots tags for?

Meta robots tags give crawlers instructions on how to crawl and index the information they find on a specific web page. When these directives are discovered by crawlers, their parameters serve as strong suggestions about the crawlers' indexing behavior.

Unfortunately, as is the case for instructions placed in the robots.txt file, crawlers are not obliged to follow your directions: it is therefore very likely that some malicious robots will ignore your directives and "crawl" the content of your pages without any scruples.

Indeed, it is worth reminding that robots tags are not a good security mechanism: if you have private information and do not want it to be publicly accessible, choose a safer approach, such as password protection, to prevent visitors and robots from accessing these confidential pages.

The two types of meta robots tags

There are two types of tags:

  • those that are part of the HTML page ("robots")
  • those that the web server sends as HTTP headers ("x-robots-tag")

The same parameters (like "noindex" and "nofollow") can be used both by the meta robots tag and the x-robots tag: the only difference is how these parameters are communicated to robots.

The 2 Types of Meta Robots Tags

Meta robots tag

The meta robots tag is part of a web page's HTML code. It appears as code elements in the <head> section of a web page, for example:

<code> <meta name="robots" content="[PARAMETRE]"></code>

If the tag <meta name="robots" content="[PARAMETER]"> is standard, you can also provide directives to specific crawlers by replacing "robots" with the name of a specific user agent.

For example, if you want to give a directive specifically intended for Googlebot, you can use the following code:

<meta name="googlebot" content="[PARAMETRE]">

Do you want to use more than one directive on a page? As long as they target the same robot, multiple directives can be included in the same meta tag — you just need to separate them with commas.

Here is an example:

<meta name="robots" content="noimageindex, nofollow, nosnippet">

As we will see below, this piece of code tells robots not to index the page's images, not to follow any of the links, and not to display a snippet of the page when it appears in search results.

If you want to give different instructions to different search robots, you will need to use separate tags that address each robot.

X-robots tag

While the meta robots tag lets you control indexing behavior at the page level, the x-robots tag is included in the HTTP header to control indexing of a page as a whole, as well as very specific elements of a page.

Although you can use the x-robots tag to execute the same indexing directives as the meta robots tag, the x-robots directive offers more flexibility and functionality than the latter.

Indeed, the x-robots directive allows the use of regular expressions, the execution of indexing directives on non-HTML files, and the application of settings at a global level.

To use the x-robots tag, you must have access to the header.php file, .htaccess or to the access file on your website's server. From one of these files, add the x-robots-tag markup in your specific server configuration, including any parameters. Here are some examples of what using the x-robots tag allows:

  • Control the indexing of content not written in HTML (like a video)
  • Block the indexing of a specific element on a page (like an image or a video), but not the page itself.
  • Control indexing if you don't have access to the HTML code of a page (in particular the <head> section) or if your site uses a global header that can't be changed.
  • Add rules to determine whether a page should be indexed or not (for example, if a user has commented more than 20 times, index their profile page).
Meta Robots Tag: How to Use It?

 

What are the parameters of the robots tags?

Below are the parameters that search engine robots understand and follow (or not) when they are used in meta robots tags.

Note
These parameters are not case-sensitive; however, note that some search engines may only follow a subset of these parameters or handle certain directives slightly differently.

All

This is the default tag; you do not need to include it: it tells the search engine to index a page.

Follow

Even if the page is not indexed, the crawler should follow all links on that page and pass equity to the linked pages. This parameter does not need to be specified. It's also a default value!

Noindex

Tells a search engine not to index a page. With this parameter, the page will not appear in search results, but the links it contains will be followed by the crawler.

“Subscriber-only” pages are an example use of this tag. You don't want search engines to index paid content... however, the links they point to can still benefit from their authority.

You can use the meta robots noindex tag to avoid being penalized for duplicate content. Likewise, if two or more URLs on your website point to the same page, this setting will prevent crawlers from deciding for themselves which link to favor (or penalize). You remain in control of your SEO.

Nofollow

Tells the crawler not to follow the links on a page and therefore not to pass link equity. These links can be contained in navigation buttons, images, or other resources.

You can also add a nofollow tag to links in your blog comments to prevent spammers from benefiting from your content. Also consider using it on paid links in banners and ads, as well as on your clients' and partners' logos.

None

Equivalent to using noindex and nofollow at the same time. You tell the crawler to completely ignore the page. It will not be indexed and the links it contains will not be followed.

This tag is useful for outdated pages you want to update or for pages under construction, for example.

Noimageindex

As we saw earlier, this setting tells the indexing crawler not to index the images on a page. This helps protect your images from being used without your prior permission. You can also set the meta name to googlebot-image to specifically prevent Google's crawlers from scanning your site for visuals.

However, note that images can still be indexed if links from other pages point crawlers to them.

Noarchive

Search engines should not display a cached link to this page in search results. This prevents users and crawlers from accessing sensitive content you want to protect.

You can use this tag for paid landing pages or internal documents. It is useful for news sites that want to reserve some content for subscribers or to implement a paywall.

Nocache

Identical to noarchive, but used only by Internet Explorer and Firefox.

Nosnippet

Tells a search engine not to display a snippet of this page (that is, the meta description) in a search result. When you use this tag, the data cannot appear in rich snippets in the SERP.

Instead of showing the metadata you selected, search engines may choose to display a different snippet, which is not always relevant to your SEO strategy.

It is also possible to mark specific parts of text you don’t want used as a snippet with the data-nosnippet attribute.

Max-snippet: [number]

With this setting, you tell the crawler the specific number of characters to display in SERP snippets, even though this does not prevent your content from being indexed.

Replace the variable [number] with the maximum number of characters you want to apply to a text snippet for this search result.

Setting [0] is equivalent to the nosnippet directive. Search engines will not display any part of your content as a snippet in the SERPs.

Setting [-1] will give Google the responsibility of determining the snippet length itself. No limit will be applied in that case.

Note that this directive no longer applies if you use structured data on your page. Apart from images and videos, this parameter applies to all other types of results: Google Images, Google Assistant, Discover, etc.).

Unavailable_after [DATE and TIME]

Search engines should no longer index this page after a particular date.

Because the crawler may still visit the page occasionally, it can remain in the index, but with a lower chance of ranking well.

Be careful: just because the content of a page doesn’t change doesn’t mean it shouldn’t be crawled regularly. Use this tag for event pages or time-limited job listings.

Example: <meta name="robots" content="unavailable_after: 2020-09-21">

Notranslate

When this instruction is not specified, Google may display a link next to the result to help users view translated content on your page.

If you do not want to offer a translation for this page in search results, use this directive.

max-image-preview: [PARAMETER]

This instruction is used to define the maximum size of an image preview for this page in search results. If you do not specify the max-image-preview instruction, Google will display an image preview at the default size.

Three values are accepted:

  • "none": no image preview should be displayed
  • "standard": a default image preview may be displayed
  • "large": a large image preview may appear

The standard and none values allow you to prevent the use of large thumbnails when AMP pages are displayed in search results.

 

3 mistakes to avoid when using meta robots tags

To make the most of meta robots tags and improve your SEO, here are 3 mistakes to avoid.

meta robots tag

1. Poor typography

Crawlers recognize attributes, values, and parameters in both uppercase and lowercase. However, it is recommended to use lowercase for your tags to improve readability, especially in code.

Also get into the habit of including commas and spaces to make your parameters easier to read.

2. Using conflicting tags

Using conflicting tags leads to indexing errors.

For example, if you have multiple meta tags like: <meta name="robots" content="follow"> and <meta name="robots" content="nofollow">, only "nofollow" will be taken into account.

Why? Because crawlers favor restrictive values.
Since "follow" is a default value, avoid using it. Simply use "nofollow" for links you want to prevent from being followed.

3. The confusion between noindex and disallow

Noindex prevents robots from indexing a page, not from crawling it! To stop your page from being crawled, you must apply the disallow directive in robots.txt.

And to deindex a page, add noindex then disallow in its header.

Our tip

And there you have it, you know everything about using these meta robots tags. If you don’t know how to use them to improve your SEO, feel free to hire a professional !