Getting your web pages indexed and ranked is an important part of SEO. It is desired to get indexed all your web pages and make sure that your pages do not get removed from google index.
Over time depending on google algorithm update or some changes on your website number of indexed pages can drop dramatically. You need to be aware of such drops in indexed pages and fix related issues.
In this tutorial you will learn how to detect when the number of indexed pages decreasing and ways to fix it.
I also share a case study where the number of indexed pages dropped and after some fixes started to recover. Most images in this tutorial are used from reports related to that case study.
Table of Contents
How to know if indexed pages are dropped?
There are two ways to check if the number of indexed pages dropped. First is an automated indexed page report using google search console and second method is manually checking using “site:” command in regular google search.
Now let’s look at how each method can be used to know when indexed pages are dropped.
Check page indexing report in Google Search console
Register your website with a search console to view detailed page indexing reports. In the Search console you can see historical data for the last 3 months.
Generally for growing websites you should see an always growing indexed pages graph. In case there is a drop in the number of indexed pages, the report will show it clearly.
Clicking the full report will reveal more info why indexed pages on your website have decreased.
Check number of indexed pages with site: command in Google search
As a basic alternative to Google Search Console you can use the
site:exampple.com command to get the total number of indexed pages. You can use it to learn the number of indexed pages for any website. Good to check your competitors for example.
Subdomains and subdirectories also can be used with this command. For example to view all indexed categories for a WordPress website you can use
This command will show only the current number of indexed pages. For this reason it is not the best tool to check the progress of your indexed pages. You will not learn if they are increasing or decreasing.
Reasons for indexed pages to be removed (with solution)
There are many reasons for indexed pages being removed from google index. Some of them are desired removals that we want to be removed from google index. For example change of url permalink, removal of old content, noindex tag to remove not important pages, block access with robots.txt to protected content etc.
Google also shows the source of removal. Currently there are 2 sources.
- Website – reason for removal of indexed content is controlled by the website. You can fix this if removal from google index happens due to some error on your website.
- Google systems – google algorithm decides to not index or remove already indexed pages. These cannot be fixed directly but you can fix it by improving your on page content. You have to add more content, make it unique and helpful to users.
In the above image we can see that the majority of web pages were deindexed with the description “Crawled – currently not indexed”. It is decided by the “Google system”. Not indexed 22k pages out of 40k page websites. 55% of pages are not indexed with this formulation.
Let’s look at each reason for pages being removed from index and ways to fix it in case it is not desired deindexation.
Website: Page with redirect
Indexed pages will be removed when they are redirected to a new page. Redirection is a common task when permalink to old content changes or when it is merged to some other page.
If redirection is our decision then it is normal behavior that those pages are removed from Google index. If it is a mistake then check those pages by clicking the “Page with redirect” row. Remove any redirects from your website that should not be redirected.
Website: Blocked by robots.txt
Robots.txt can be used to hide some parts of your website from search engines. There are generally pages that should not be available publicly. For example some images or pdf files, admin pages etc.
Check removed pages and remove them from your readme.txt file if they should be indexed.
Website: Not found (404)
Over time some pages on your website will be removed because they are very old and do not correspond to current demand. This can be old information that is not needed any more.
Check if any of the old pages deleted by mistake or any current valid pages are not accessible to web crawlers and return a 404 status code. Fix those pages by removing 404 status or recreate them if they are important.
Website: Excluded by ‘noindex’ tag
Noindex tag is used to hide some pages from google index on purpose. For example on category pages you can index the first page and
noindex other paginated pages. Because the second page of the category is not very important to show in google index. Site owners prefer showing only first category page. As there is no much value showing 37th page of category in search results.
Same applies to tags. When tags and categories are used on the website you can prefer indexing only categories and avoid indexing tag pages because they may be duplicating content. For example let’s say you list products with tags smartphone, apple, ios, android, samsung, xiaomi etc. Most items in those tags will be repeated.
Check the detailed report and remove the noindex tag if you think those pages should be indexed.
Website: Alternative page with proper canonical tag
Canonical tags used to point to the main page from differently formatted pages. For example if items in category are sorted by date, price, name and displayed as grid or list. All of them are variations of the same category and should point to the category page with default sorting.
Check detailed reports and fix canonical tags if needed.
Website: Blocked due to access forbidden (403)
403 pages are generally used to block access to content for some users and bots.
Check the detailed report and remove google bot brom block list in your web server.
Website: Soft 404
Pages are not found but do not return proper 404 status code.
Make sure that all 404 pages return proper status code.
Google systems: Crawled – currently not indexed
Pages are crawled but not indexed after google algorithm evaluation. This is the most common reason for pages being deindexed.
This is a major reason for pages not being indexed or removed from index.
Exact reason is not clear but here are most possible reasons:
When there is not much content on the page google does not think it is valuable and does not index related pages.
Add more content to the page. Usually there should be 1000-2000 words of content on the page. When there are only a few words on a page a search engine does not understand it and does not know how to classify and rank it.
Write more content, add some images and videos. Also stay on the same topic for the whole page. Cover every aspect of the main topic with more details about it.
You will detect thin content by viewing detailed reports. Click on reported items to view your website. Check if those pages have not much content then write and more content.
Also you can combine thin content pages to one as long as they are related to one topic. After combining, redirect old pages to new combined pages.
Duplicate content is also one of the reasons for being crawled but not indexed.
There is no tool or report in Search Console to detect duplicate content. Instead use external services for checking duplicate content within the website.
Siteliner will check the given url and follow all links it finds on your website. Siteliner is free for scanning 250 pages. It will be enough to detect duplicate content issues for most websites even with many thousand pages.
After putting your website address, the Siteliner will scan pages and generate reports. There you can see if duplicate content is too high for your website.
Clicking on the report you will see which pages have duplicate content.
Duplicate content report will list pages with percentage of duplicate content. Check pages in the report and think of ways to fix duplicate content.
If a page has 99% duplicate content and they are variations of the same page then use canonical meta tag and point duplicates to the main page.
For pages with 30-80% duplicate content check pages and remove content that is appearing as duplicate.
In this particular website duplicate pages are game pages with related games. Most games have similar descriptions, playing instructions and related games. Fix it by writing unique descriptions and instructions.
There is not much you can do with related games because related games may be similar based on tags used to classify those games.
On category pages this site was showing excerpts generated from game description. Removing excerpts from category pages reduced duplicate content percentage on most category pages.
Generally to fix duplicate content you need to write long form unique content for each page. Making long form content helps to reduce duplicate percentage for pages.
For example a page with 100 words and related games (repeated 100 words on average for 20 game titles) will have 50% content duplicated.
Same page with 1000 words and 20 related games will detect about 10% duplicate content.
If your website was using some black hat SEO techniques on site or off site backlink building then it can get deindexed.
Do not use any black hat SEO techniques at all. Cheaters always will be detected and penalized.
Google systems: Discovered – currently not indexed
Google crawler regularly detects new URLs on your website.
All detected URLs will be sooner or later crawled. You do not have to do anything with it.
Google systems: Duplicate, Google chose different canonical than user
Sometimes the google system will choose a different canonical than your defined one. This is usually a couple pages.
In most cases you do not have to do anything. If you see real valuable pages here that are indexed with different URLs then change canonical for those pages to value that google choosen.
Learn more about indexed pages from google help page.
Case study: Recovery of indexed pages for 40k page website
Game website with 40K games had 26k pages indexed. My aim was to increase the number of index pages. Then a couple months ago the number of indexed pages started to dramatically decrease.
Reason for decrease of indexed pages
To find the reason for the decreasing number of indexed pages we checked the search console report. One report increased dramatically in reverse proportion to indexed pages. It was “Crawled but currently not indexed” pages.
However exact reason for crawling and not indexing is not displayed. To find a reason we used an additional SEO tool by Siteliner.
Additional screenshots from siteliner and Google search console are shown above.
In the detailed report category, the author, top games pages had too much duplicate content. It is because in those pages games were listed with excerpts truncated from default description text. Which was repeated in all pages.
Solution for recovering number of indexed pages
Quick fix: We solved it by removing game excerpts from game listing pages (category, author, top games etc.). Those pages now have only images and game titles.
Complete solution: Write unique description to category and game pages.
Games are aggregated from game distribution websites. All sites using those games get the same game name, description, categories and images. To make this website unique it is essential to write unique game descriptions, add more unique screenshots, make video gameplays etc.
When there is too much content on a website, in this case it is 40k games. You can start optimizing from best performing pages like most visited, highly rated, most commented pages.
Applying a full solution requires more time and resources. It will be applied slowly over time.
Result after fixing content issues
Initially indexed pages were 26k. Then it decreased to 6k pages. After applying the quick fix mentioned above indexed pages started to increase and currently it is 12k.
What happens if a page is not indexed?
If a web page is not indexed it will not show up in search results. Indexing and then ranking is required for web pages to appear in search results.
How do I fix page indexing issues?
First learn why your pages are not indexed. Depending on your findings, fix corresponding issues so search engines will index your web pages. Fix technical issues and increase the quality of your page. Make sure it is unique, long content that is helpful for people.
Why are my pages crawled but not indexed?
Duplicate content, thin content or google penalty can be reasons for pages being crawled but not indexed. In some cases Google algorithm change also can cause an increase in the number of not indexed pages.
Google wants to index unique content. If your pages have tens of similar duplicates without much distinction then do not expect from google to index all of them.
When you should worry about decrease in indexed page count?
You should worry if indexed pages drom from more than 100% to below 50%. This means that half your your main contnet is not indexed.
Do not worry if your indexed pages drop form 500% to 150%. Becase this means that google previously indexed too much content and now it is refining and cleaning upr their index.
You can get more than 100% indexation when there are tags, categories, various filters when listing your content. For example custom price range, custom colors, size or other custom properties. All these variations can create multiple listing variations of your content.
Think about indexnig your main content like products and posts. Do not worry much about categories and faceted navigation listings.
Page indexing is an important part of search engine optimization. In an ideal situation we want to have the number of indexed pages to be equal to the number of pages in our website. Website content can be blog posts, games, classified ads, e-commerce products, videos, recipes etc.
To keep our indexed page count close to the number of pages on our website we need to keep an eye on indexed pages. In case of dramatic reduction of indexed page count check for rising issues in detailed page indexing report. Additionally use external tools to check duplicate content on your website.
Fix by updating link and content structure when possible. Increase the quality of your content by adding more unique content to your pages.
When content updates on your website are done right you will see how your indexed page count will start slowly recovering.
Do not expect 100% of your pages to be indexed. Especially for big sites with more than 10k pages.
For small sites you may get more than 100% indexed pages because of categories, tags, author pages etc. Do not get excited about it. Your main goal should be indexing of unique content like blog posts. Because they are the main content of your blog.