Do you have low-quality or useless web pages that affect your crawl budget and hurt your organic SEO? Are you looking for a clean, effective method to deindex a web page ? Here are several tips to remove your web pages from search engine indexes!
Why deindex a web page?
If low-quality web pages harm the user experience, they also send a negative signal to search engine algorithms. To address this, some pages on your website should no longer be indexed. The problem on the Internet is that you can’t remove a page URL with the snap of your fingers. You must follow a specific process, starting with deindexing your page on search engines.
There are several situations that may lead you to need to deindex a web page :
- You have pages that contain internal duplicate content.
- Some pages were indexed by mistake.
- You have unnecessary pages that harm your SEO.
- Some pages raise legal issues.
What is an indexed page?
For a web page to appear in search engine results, the crawlers that visit and analyze websites add the site’s page to their index.
A page can only be indexed by crawlers if:
- The page you want to index is crawlable by bots and therefore not blocked by the robots.txt file.
- That it is indexable — that is, it meets all the technical indexing criteria.
But be careful: an indexable page is not necessarily indexed by Google — the search engine can perfectly well decide not to add it to its index for various reasons (page quality needs improvement, overall site quality… ).
Furthermore, Google assigns a crawl budget to each site. This budget corresponds to the number of pages the indexing robot (Googlebot for Google) will crawl. It is defined according to several crawling criteria such as page load speed, page depth, content quality…
Thus, to increase your crawl budget and/or deindex unwanted pages, deindexing is sometimes necessary…

Robots.txt: to prevent indexing
The robots.txt file, as its name indicates, is intended only to provide information about your site to crawlers. It tells them which web pages to crawl or not. Thus, the robots.txt file will not help you deindex a web page, but only ensure that it is not indexed.
If you want prevent indexing new web pages or a new site, here is the procedure to follow:
- Check for the presence of the robots.txt file at the root of your server by entering your domain followed by /robots.txt in your browser (it is normally generated automatically by your CMS when your site is created): https://yoursite.com/robots.txt
- If the file does not appear, you will have to create it. Warning: be sure to follow Google's guidelines when... creating your robots.txt file.
Then, three commands are available:
- User-agent to specify which crawler is allowed or not allowed to crawl your website.
- Allow to permit crawling of the page.
- Disallow to forbid it.
Once your file is finished, upload it to your website. Uploading your file will depend on your server and your site's architecture. If you encounter any difficulty, consult your host's documentation or contact them.
Once your file is uploaded, test it to verify that it is publicly accessible and correctly tagged using theSearch Console testing tool.
The "noindex" tag: Google's preferred method
The meta "noindex" tag is the best way to prevent a web page from appearing in Google's search results. When Googlebot next crawls your site, the "noindex" tag will tell it not to index or to deindex a web page.
Not only is it the preferred method of Google, webmasters, and developers, but it's also the easiest to implement, because it does not require any special technical knowledge.
- On the page or group of HTML pages you want excluded from the SERP, add the following code in your site's <head>: <meta name="robots" content="noindex">.
- If you don't want crawlers to index your page and to follow your links, add this directive: <meta name="robots" content="noindex,nofollow">.
- To request that robots do not index an image: <meta name="robots" content="noimageindex">.
Make sure your robots.txt file does not block access to the pages in question!
For your deindexing request to take effect, you must wait for the crawlers to run. However, you can speed up the process from Search Console by sending a crawl request to Google.
The X-Robots-Tag HTTP header noindex to deindex files without source code

Some files (PDFs, images, Word documents…) and web pages and files do not contain source code. Only setting the X-Robots-Tag HTTP header with noindex can give deindexing instructions to crawlers.
Be careful, this technique requires some expertise and if you do not have the necessary knowledge, it is preferable to call a developer. A wrong manipulation can cause serious malfunctions affect your site!
Start by modifying your .htaccess file by adding the following code:
To deindex all your PDFs:
<Files ~ ".pdf$">
Header set X-Robots-Tag "noindex"
</Files>
To deindex all your image files:
<Files ~ ".(png|jpe?g|gif)$">
Header set X-Robots-Tag "noindex"
</Files>
HTTP 404 and HTTP 410 status codes to deindex deleted pages
When you delete web pages, their deindexing is not immediate. If nothing replaces your pages, add one of the two codes to confirm their deindexing to Google :
- 404 status code (not found): the resource does not exist.
- 410 status code (gone): the resource does not exist and will not be replaced.
The canonical tag to deindex similar content
If you have identical or similar content, you risk being penalized for duplicated content.
By adding a canonical tag, you tell Google that only the specified page should be taken into account.
Add this code in the <head> section of all duplicate content pages on your website by inserting the URL of your main page: <link rel="canonical" href="https://yoursite.com/example-page/" />.
Deindex a page without losing its backlinks

The backlinks They are essential for your SEO optimization and it would be a shame to lose quality incoming links!
To avoid losing your backlinks, set up a 301 redirect from the old page to the new one. Note: the permanent redirect is implemented in your server's .htaccess file!
For a page:
RedirectPermanent /directory/page-to-redirect.html http://www.example.net/directory/destination-page.html
For a directory:
RedirectPermanent /directory http://www.domain-name.com/destination-directory
For a domain:
RedirectPermanent / http://www.domain-name.com/
De-index a web page with WordPress
Because WordPress users don't have access to the <head> section of their pages, the simplest solution is to install the Yoast plugin.
Once installation is complete, go to the page to deindex, then in the Yoast tab click "Advanced." Select "No" for the question "Allow search engines to show Article content in search results?" Save, and that's it!
Our tip for deindexing a page
You now know the techniques to deindex a web page. If you have many pages to deindex and want to speed up their removal, create a sitemap file listing all the URLs to deindex and submit that file in Search Console.
Our tip to avoid having to deindex your pages
To avoid having to deindex web pages because your content is poor, optimize your texts for SEO and quality by using our professional web copywriters.