Blog

How to prevent Google from indexing a web page?

Yes, you read that right. How to prevent Google from indexing a web page? This question may unsettle you. Especially since it’s us asking it, Redacteur.com, content writing platform whose mission is to guarantee you quality, SEO-optimized texts for... good indexing of your web pages!

SEO is the key for a website to appear high in search results. But sometimes it may be necessary to block the indexing of certain content.

In which cases? And how do you do it? We tell you everything about the tips to know to prevent Google from indexing a web page and the mistakes not to make. Our primary goal is that you have a high-performing, effective website.

How to block Google from accessing pages on your site?

There are several methods to block access to pages for Googlebot and other crawlers. What are they and what are their limits?

Use the robots.txt file

A website’s robots.txt file is used to guide search engine robots. The instructions included in this file tell them whether a site’s pages should be crawled or not. To specify that a page should not be indexed, you must use the "Disallow" directive.

However, the robots.txt file has limits. It is mainly useful to avoid overloading your server with search engine requests. But it does not 100% prevent a page from being indexed. The instructions transmitted in the robots.txt file are interpreted by robots as guidelines rather than mandatory commandsSome bots may not follow them.

Moreover, your URL may be referenced elsewhere on the internetIf search engines discover it, they will index it. If you want your page to no longer appear in search results or you want to protect sensitive information, you should favor other, more effective methods.

Use the meta noindex tag

Meta tags are HTML elements that provide details about web pages. They are included in the HTML page's head section.

The meta "noindex" tag tells Google not to index the page. This solution does not require particular technical skills. You must add the line of code the following:

<meta name="robots" content="noindex">

For this instruction to be understood, the page must not be blocked by a robots.txt file. You can speed up the deindexing process from Search Console by submitting a crawl request to Google.

But beware, this method also has limits: the meta "noindex" tag does not prevent robots from crawling the page. It only forbids them from including it in search results.

Other meta tags can be added to strengthen the effectiveness of deindexing: “noimageindex” indicates not to crawl images, “noarchive” asks robots not to store pages in cache and "nosnippet" prevents the display of the meta description.

WordPress users who don’t have access to their pages’ code must use a plugin, such as the Yoast plugin, to deindex their content. Simply answer “No” to the question “Allow search engines to show this content in search results?”.

Protect access to certain pages with a .htaccess

To prevent Google from accessing and indexing pages, blocking the display of URLs via passwords is the most effective method.

You must block access to the pages in question, by limiting their display with a password for example using the .htaccess file.

The option is to modify the .htaccess configuration file. Used by Apache servers, this file is used to apply rules to directories. It allows, for example, protecting content with passwords. It is also essential for redirecting quality backlinks from an old page to a new one. However, it is a sensitive file to handle. An error can make the entire website inaccessible.

To make access and navigation easier for your users and reduce the risk of accidental changes to the .htaccess file, you can also create a private area, accessible by username and password.

Is it possible to make pages completely inaccessible?

The most radical solution to make your web pages inaccessible on Google is to… delete them. This process can be long. It is not simply about deleting the page from your website, but about remove its URL indexed by search engines. If you only delete the page on your site, you will cause a 404 error disliked by Googlebot and other crawlers. When you delete a page, you must therefore redirect its URL with a 301 redirect. Warning: even if the page disappears from Google, it may still be potentially accessible in archives.

Google offers a URL removal tool to help de-index pages. However, the block is only temporary, limited to 180 days.

To permanently remove a URL, you must use theobsolete content removal tool. Before filling in the request form, you must ensure that you have already performed one of the following actions:

  • Remove a page from your website
  • Block access to its content with a password or use the meta tag "noindex".

But above all, the page must not have been blocked using the robots.txt file.

If the request succeeds, your page will be permanently removed from Google.

This approach does not prevent other search engines from crawling and indexing your web page!

Mistakes to avoid when you want to prevent Google from indexing pages

You followed the rules to deindex your web page. Yet it continues to appear in Google’s search results. For many content creators, that’s a dream. For you, it’s a nightmare. Remember, no method is 100% effective. But perhaps you made a mistake?

Forgetting to remove links pointing to the deindexed page

“Link juice” — or “link equity” in English — can be the cause about the continued indexing of the page you want to deindex. In digital marketing, the more relevant and trustworthy a page is, the more “juice” it has. And the more “juice” it has, the more it appeals to search engines.

But a well-ranked page shares its “juice” with the other pages to which it’s linked via hyperlinks. The algorithm indeed considers that a page recommended by a quality page is necessarily relevant.

And this is where all your deindexing work can be ruined if you forgot to remove the internal links pointing to the page you want to deindex. Because that page will continue to benefit from the quality “juice” of the pages it’s linked with and therefore… be indexed by search engines.

Do you want to prevent Googlebot and other bots from indexing your page? You should try to remove from your website all internal links that point to the page to be deindexed.

Ideally, you would also identify backlinks pointing to your page and request their removal. But that can be more difficult.

If, for navigation reasons, you want to deindex a page but keep the links pointing to it (as with a legal notice page, for example), you can mark the link with the "nofollow" attribute to limit the transfer of SEO juice.

Forgetting to remove redirects from the deindexed page

Does your deindexed page contain links to other internal and external content? The principle of “link juice” works the same way. The source page benefits from the 'juice' of the pages it links to. And often you didn’t choose your links at random. They point to quality pages or authoritative sites with high “juice.” So if you forgot to remove the links from your deindexed content, those pages may continue to be indexed by Google via the landing pages of those links.

For your deindexed page to remain as discreet as possible on the web, you must remember to remove all links it contains that point to other content.

If you do not want to remove links that point to the page, you can add the 'nofollow' attribute to your link to tell the search engine you do not want to give 'SEO juice' to the target page.

Targeting the wrong page

Don't choose the wrong page when you act on the robots.txt file or via meta tags: elementary, you might say. Yet, when a website's content is large, it can be easy to get lost in the directories. And inserting a 'noindex' meta tag in the head of your homepage would be harmful to your site's SEO. A little common-sense reminder helps avoid mistakes.

There are several methods to prevent Google from indexing a page. But not all methods are necessarily 100% effective. Blocking or completely removing a URL is never guaranteed. Each method has its limits and mistakes and oversights are inevitable, so take your time when you want to deindex pages from Google.

Why would you want to prevent Google from indexing a page?

Blocking low-quality pages

Old pages on your website can be obsolete, or have lost their importance, or offer content similar to that of other pages.

These pages no longer match the image you want to convey of yourself and your business. They harm the user experience and are penalized by Google. By blocking certain pages, you direct your visitors and search engines to your high value-added content.

Ensure data confidentiality

You can want to limit access to certain content.

This is the case if you offer privileged content to certain customers who have subscribed to a premium plan.

It is also the case if you use your website to communicate with your partners in private areas.

It is finally the case for form submission pages through which users send you personal information.

Manage crawling traffic (crawl budget)

Preventing Google from indexing web pages helps to manage crawl traffic. This can prevent your server from being overwhelmed by the multiple requests from search engine bots, as well as prevent bots from indexing useless content.

This is even more important since the Google Helpful Content update which penalizes so-called "zombie" pages, those pages that do not provide added value to users.

Our tip to avoid having to deindex your web pages

There is another option to address obsolete, low-value content: update the text, improve its quality, and optimize its SEO.

Reworking or replacing existing content improves your website's performance and quality while avoiding the tedious work of deindexing web pages.

To optimize your content (blog posts, product pages, corporate content...), you can call on our experienced writers available on the Redacteur.com writing platform. With our professional writers, you are guaranteed quality, up-to-date, and optimized content that is attractive to your visitors and to search engines.

The article How to prevent Google from indexing a web page? first appeared on Redacteur.com.