Skip to content

Why are the number of indexed pages decreasing and how to fix it?

Getting your web pages indexed and ranked is an important part of SEO. It is desired to get indexed all your web pages and make sure that your pages do not get removed from google index.

Over time depending on google algorithm update or some changes on your website number of indexed pages can drop dramatically. You need to be aware of such drops in indexed pages and fix related issues.

In this tutorial you will learn how to detect when the number of indexed pages decreasing and ways to fix it.

I also share a case study where the number of indexed pages dropped and after some fixes started to recover. Most images in this tutorial are used from reports related to that case study.

Indexed pages decreasing- learn how to recover.

How to know if indexed pages are dropped?

There are two ways to check if the number of indexed pages dropped. First is an automated indexed page report using google search console and second method is manually checking using “site:” command in regular google search.

Now let’s look at how each method can be used to know when indexed pages are dropped.

Check page indexing report in Google Search console

Register your website with a search console to view detailed page indexing reports. In the Search console you can see historical data for the last 3 months.

Detect when indexed pages decreasing

Generally for growing websites you should see an always growing indexed pages graph. In case there is a drop in the number of indexed pages, the report will show it clearly.

Clicking the full report will reveal more info why indexed pages on your website have decreased.

Check number of indexed pages with site: command in Google search

Check indexed page count for any website with site:command

As a basic alternative to Google Search Console you can use the site:exampple.com command to get the total number of indexed pages. You can use it to learn the number of indexed pages for any website. Good to check your competitors for example.

Subdomains and subdirectories also can be used with this command. For example to view all indexed categories for a WordPress website you can use site:example.com/category/ command.

This command will show only the current number of indexed pages. For this reason it is not the best tool to check the progress of your indexed pages. You will not learn if they are increasing or decreasing.

Reasons for indexed pages to be removed (with solution)

There are many reasons for indexed pages being removed from google index. Some of them are desired removals that we want to be removed from google index. For example change of url permalink, removal of old content, noindex tag to remove not important pages, block access with robots.txt to protected content etc.

Increase in number of crawled but not indexed pages related to indexed pages decrease

Google also shows the source of removal. Currently there are 2 sources.

  • Website – reason for removal of indexed content is controlled by the website. You can fix this if removal from google index happens due to some error on your website.
  • Google systems – google algorithm decides to not index or remove already indexed pages. These cannot be fixed directly but you can fix it by improving your on page content. You have to add more content, make it unique and helpful to users.

In the above image we can see that the majority of web pages were deindexed with the description “Crawled – currently not indexed”. It is decided by the “Google system”. Not indexed 22k pages out of 40k page websites. 55% of pages are not indexed with this formulation.

Let’s look at each reason for pages being removed from index and ways to fix it in case it is not desired deindexation.

Website: Page with redirect

Indexed pages will be removed when they are redirected to a new page. Redirection is a common task when permalink to old content changes or when it is merged to some other page.

Solution:

If redirection is our decision then it is normal behavior that those pages are removed from Google index. If it is a mistake then check those pages by clicking the “Page with redirect” row. Remove any redirects from your website that should not be redirected.

Website: Blocked by robots.txt

Robots.txt can be used to hide some parts of your website from search engines. There are generally pages that should not be available publicly. For example some images or pdf files, admin pages etc.

Solution:

Check removed pages and remove them from your readme.txt file if they should be indexed.

Website: Not found (404)

Over time some pages on your website will be removed because they are very old and do not correspond to current demand. This can be old information that is not needed any more.

Solution:

Check if any of the old pages deleted by mistake or any current valid pages are not accessible to web crawlers and return a 404 status code. Fix those pages by removing 404 status or recreate them if they are important.

You can use 404 page report to find important old pages and redirect them to related new URLs. This can boost your page rank by fixing broken link juice flow to go to relevant page on your website.

Website: Excluded by ‘noindex’ tag

Noindex tag is used to hide some pages from google index on purpose. For example on category pages you can index the first page and noindex other paginated pages. Because the second page of the category is not very important to show in google index. Site owners prefer showing only first category page. As there is no much value showing 37th page of category in search results.

Same applies to tags. When tags and categories are used on the website you can prefer indexing only categories and avoid indexing tag pages because they may be duplicating content. For example let’s say you list products with tags smartphone, apple, ios, android, samsung, xiaomi etc. Most items in those tags will be repeated.

Solution:

Check the detailed report and remove the noindex tag if you think those pages should be indexed.

Website: Alternative page with proper canonical tag

Canonical tags used to point to the main page from differently formatted pages. For example if items in category are sorted by date, price, name and displayed as grid or list. All of them are variations of the same category and should point to the category page with default sorting.

Solution:

Check detailed reports and fix canonical tags if needed.

Website: Blocked due to access forbidden (403)

403 pages are generally used to block access to content for some users and bots.

Solution:

Check the detailed report and remove google bot brom block list in your web server.

Website: Soft 404

Pages are not found but do not return proper 404 status code.

Solution:

Make sure that all 404 pages return proper status code.

Google systems: Crawled – currently not indexed

Pages are crawled but not indexed after google algorithm evaluation. This is the most common reason for pages being deindexed.

This is a major reason for pages not being indexed or removed from index.

Exact reason is not clear but here are most possible reasons:

Thin content

When there is not much content on the page google does not think it is valuable and does not index related pages.

Solution:

Add more content to the page. Usually there should be 1000-2000 words of content on the page. When there are only a few words on a page a search engine does not understand it and does not know how to classify and rank it.

Write more content, add some images and videos. Also stay on the same topic for the whole page. Cover every aspect of the main topic with more details about it.

You will detect thin content by viewing detailed reports. Click on reported items to view your website. Check if those pages have not much content then write and more content.

Also you can combine thin content pages to one as long as they are related to one topic. After combining, redirect old pages to new combined pages.

Duplicate content

Duplicate content is also one of the reasons for being crawled but not indexed.

Solution:

There is no tool or report in Search Console to detect duplicate content. Instead use external services for checking duplicate content within the website.

Siteliner SEO tool for checking duplicate content

Check duplicate content using web service Siteliner. It is one of my favorite free SEO tools.

Siteliner will check the given url and follow all links it finds on your website. Siteliner is free for scanning 250 pages. It will be enough to detect duplicate content issues for most websites even with many thousand pages.

After putting your website address, the Siteliner will scan pages and generate reports. There you can see if duplicate content is too high for your website.

High percentage of duplicate content for th website

Clicking on the report you will see which pages have duplicate content.

Pages with duplicate content listed in detailed report

Duplicate content report will list pages with percentage of duplicate content. Check pages in the report and think of ways to fix duplicate content.

If a page has 99% duplicate content and they are variations of the same page then use canonical meta tag and point duplicates to the main page.

For pages with 30-80% duplicate content check pages and remove content that is appearing as duplicate.

In this particular website duplicate pages are game pages with related games. Most games have similar descriptions, playing instructions and related games. Fix it by writing unique descriptions and instructions.

There is not much you can do with related games because related games may be similar based on tags used to classify those games.

On category pages this site was showing excerpts generated from game description. Removing excerpts from category pages reduced duplicate content percentage on most category pages.

Generally to fix duplicate content you need to write long form unique content for each page. Making long form content helps to reduce duplicate percentage for pages.

Long form content reduces risk of duplicate content caused by repeated content.

For example a page with 100 words and related games (repeated 100 words on average for 20 game titles) will have 50% content duplicated.

Same page with 1000 words and 20 related games will detect about 10% duplicate content.

Low quality content

In order to effectively use storage google prefers to index pages with good main content. If page is not unique and does not add anything new to the existing web results then it will not be indexed. Because there is no point adding one more page to already existing 10k pages with similar content on the internet.

Google determines quality of the content before indexing.

Google will not index low quality content. It will appear as crawled but not indexed in search console report.

You can see it in indexing section of How Google Search Works documentation.

Low quality content is reported as “Crawled – currently not indexed” in Search Console.

Already indexed pages also can be dropped from google index because of poor quality.

Solution:

Create unique content that is more helpful than existing web results. Add information, images, screenshots, videos, personal experience to your page’s main content.

Page content should answer user query, be original, add value other than being one of many existing results, be complete so users will be satisfied, not boring, reliable so users will not leave your page.

Good quality content should be helpful, reliable and people-first. That is what google and users expect from your pages. Improve not indexed pages to match these quality criteria.

If page content cannot be improved then think about removing or merging it with other pages on your website.

Google penalty

If your website was using some black hat SEO techniques on site or off site backlink building then it can get deindexed.

Solution:

Do not use any black hat SEO techniques at all. Cheaters always will be detected and penalized.

Google systems: Discovered – currently not indexed

Google crawler regularly detects new URLs on your website.

Solution:

All detected URLs will be sooner or later crawled. You do not have to do anything with it.

Google systems: Duplicate, Google chose different canonical than user

Sometimes the google system will choose a different canonical than your defined one. This is usually a couple pages.

Solution:

In most cases you do not have to do anything. If you see real valuable pages here that are indexed with different URLs then change canonical for those pages to value that google choosen.

Learn more about indexed pages from google help page.

Case study: Recovery of indexed pages for 40k page website

Game website with 40K games had 26k pages indexed. My aim was to increase the number of index pages. Then a couple months ago the number of indexed pages started to dramatically decrease.

Case study: drop and recovery of indexed pages.

Reason for decrease of indexed pages

To find the reason for the decreasing number of indexed pages we checked the search console report. One report increased dramatically in reverse proportion to indexed pages. It was “Crawled but currently not indexed” pages.

However exact reason for crawling and not indexing is not displayed. To find a reason we used an additional SEO tool by Siteliner.

High percentage of duplicate content for th website
Additional screenshots from siteliner and Google search console are shown above.

In the detailed report category, the author, top games pages had too much duplicate content. It is because in those pages games were listed with excerpts truncated from default description text. Which was repeated in all pages.

Solution for recovering number of indexed pages

Quick fix: We solved it by removing game excerpts from game listing pages (category, author, top games etc.). Those pages now have only images and game titles.

Complete solution: Write unique description to category and game pages.

Games are aggregated from game distribution websites. All sites using those games get the same game name, description, categories and images. To make this website unique it is essential to write unique game descriptions, add more unique screenshots, make video game plays etc.

When there is too much content on a website, in this case it is 40k games. You can start optimizing from best performing pages like most visited, highly rated, most commented pages.

Applying a full solution requires more time and resources. It will be applied slowly over time.

Result after fixing content issues

Initially indexed pages were 26k. Then it decreased to 6k pages. After applying the quick fix mentioned above indexed pages started to increase and currently (August 2023) it is 12k.

Update March 2024: 14k indexed pages.

Update May 2024: Indexed pages are decreased again to 6k. This is because main content of pages are not unique. Game description and instructions are provided by developers and used on many gaming websites without altering. We need to write unique description, instruction and describe playing experience for each game.

FAQ

What happens if a page is not indexed?

If a web page is not indexed it will not show up in search results. Indexing and then ranking is required for web pages to appear in search results.

How do I fix page indexing issues?

First learn why your pages are not indexed. Depending on your findings, fix corresponding issues so search engines will index your web pages. Fix technical issues and increase the quality of your page. Make sure it is unique, long content that is helpful for people.

Why are my pages crawled but not indexed?

Duplicate content, thin content or google penalty can be reasons for pages being crawled but not indexed. In some cases Google algorithm change also can cause an increase in the number of not indexed pages.

Google wants to index unique content. If your pages have tens of similar duplicates without much distinction then do not expect from google to index all of them.

When you should worry about decrease in indexed page count?

You should worry if indexed pages drom from more than 100% to below 50%. This means that half your your main contnet is not indexed.
Do not worry if your indexed pages drop form 500% to 150%. Becase this means that google previously indexed too much content and now it is refining and cleaning upr their index.
You can get more than 100% indexation when there are tags, categories, various filters when listing your content. For example custom price range, custom colors, size or other custom properties. All these variations can create multiple listing variations of your content.
Think about indexnig your main content like products and posts. Do not worry much about categories and faceted navigation listings.

Conclusion

Page indexing is an important part of search engine optimization. In an ideal situation we want to have the number of indexed pages to be equal to the number of pages in our website. Website content can be blog posts, games, classified ads, e-commerce products, videos, recipes etc.

To keep our indexed page count close to the number of pages on our website we need to keep an eye on indexed pages. In case of dramatic reduction of indexed page count check for rising issues in detailed page indexing report. Additionally use external tools to check duplicate content on your website.

Fix by updating link and content structure when possible. Increase the quality of your content by adding more unique content to your pages.

When content updates on your website are done right you will see how your indexed page count will start slowly recovering.

Do not expect 100% of your pages to be indexed. Especially for big sites with more than 10k pages.

For small sites you may get more than 100% indexed pages because of categories, tags, author pages etc. Do not get excited about it. Your main goal should be indexing of unique content like blog posts. Because they are the main content of your blog.

Learn more SEO optimization tips for WordPress blog in our beginner friendly guide.

Leave a Reply

Your email address will not be published. Required fields are marked *