When carrying out a website SEO audit the number one go to tool that I will always use to get insights into organic and technical performance is Google Search Console (GSC). Whilst many 3rd party tools will review your on page SEO and make subjective suggestions on how you can improve your site’s organic and user performance, GSC gives you insights into how Google is crawling and indexing your site, what is working, what isn’t, and what is broken.
Whilst they don’t tell you is how to fix them, the information that is provided when assessed and resolved in the right way will undoubtedly help you take your website to the next level.
Key Sections In Google Search Console
The areas they provide insights on are as follows:
Indexing
Identifies and includes site pages in search results. It checks for issues like crawl budget, noindex tags, and duplicate content.
Experience
Evaluates user experience, focusing on metrics like load time, mobile usability, and interactivity. Pages with poor user experience may rank lower.
Shopping
Assesses product-related pages for proper indexing and display in shopping search results. It ensures product data is correctly formatted and accessible.
Enhancements
Reviews structured data to provide rich results in search, such as snippets, cards, and other interactive elements. Proper implementation can improve visibility.
Security & Manual Actions
Checks for security issues, such as malware or hacking, and guideline violations. It applies penalties or warnings, which can affect site visibility and ranking.
The area that this blog focuses on is the Indexing section – this is where most questions are raised, the most popular one being “why isn’t Google indexing my site” and one that I’ve looked to answer in a previous blog post here:
How To: Find Out If Your Sites URLs Are Being Crawled & Indexed by Google
As you can see from the search impression data for this post, lots of folk are continually wanting to know if Google is crawling and indexing their site:
The “New Reason Preventing Your Pages From Being Indexed” Email
Further to this, in my role as a freelance SEO consultant I work with a number of website and business owners who are set up on GSC and, if you are too, you will know that you regularly receive emails titled “New reason preventing your pages from being indexed”.
I get such emails forwarded on a regular basis, most of the time they are nothing to worry about but the tone of the mails always seems to worry folk that they have big issues with their site:
So, the aim of this blog is to explain the different types of reasons that Google reports as to why it isn’t indexing your page, and what you can do about them.
Google Search Console Page Indexing Report
When you log into GSC and navigate to the page indexing report, you will see two sections, pages which are indexed, and those which aren’t with the reasons and numbers listed beneath:
You can then click into each section to get examples of the URLs that fall into each category. The most common categories and what to do about them are as follows:
Not found (404)
The page returns a 404 error when Google tries to access it. This happens if the page has been deleted or the URL is incorrect.
Example:
A deleted blog post or product page.
Solution:
Use a 301 redirect to guide users to a relevant page if the content has moved or been replaced.
Soft 404
The page returns a user-friendly “not found” message but not a 404 HTTP response code. Google treats these as soft 404s. These have generally not been considered a bad thing by SEOs in the past, but here’s some insights from Google’s Gary Illyes as to why you should resolve them.
Example:
A custom error page that does not use a 404 status code.
Solution:
Ensure that truly “not found” pages return a 404 response code and provide more information on the page to clarify its status.
Duplicate, Google chose different canonical than user
Google has indexed a different URL as the canonical version instead of the one specified by the user.
Example:
Multiple URLs with similar content where Google prefers a different canonical URL. This often occurs on ecommerce sites where the URL exists on multiple URLs, e.g.
https://www.example.com/product-page
https://www.example.com/category-1/product-page
https://www.example.com/category-2/product-page
Solution:
Verify the chosen canonical URL and adjust it to align with the preferred content.
Duplicate without user-selected canonical
The page is a duplicate of another page but lacks a canonical tag. Google has chosen a different page as the canonical.
Example:
Several product pages with identical descriptions and no canonical tag. Search filter pages are often culprits of this kind of thing, e.g.
https://www.example.com/products?filter=color-red
https://www.example.com/products?filter=size-medium
https://www.example.com/products?filter=color-red&size-medium
https://www.example.com/products?sort=price-asc
https://www.example.com/products?filter=color-red&sort=price-asc
Solution:
Add canonical tags to all duplicate pages to specify the preferred version. Alternatively, add the noindex tag to any page with URL filters if they are indexed, and then once de-indexed block them using robots.txt
Excluded by ‘noindex’ tag
The page has a noindex directive, which prevents it from being indexed by Google.
Example:
Admin pages or thank-you pages with a noindex tag. Or search results URLs such as:
https://www.example.com/search?q=sneakers
https://www.example.com/search?q=running+shoes
https://www.example.com/search?q=athletic+shoes
https://www.example.com/search?q=footwear+shoes
https://www.example.com/search?q=sport+shoes
Solution:
Remove the noindex tag if the page should be indexed and visible in search results. If you have a high number of such pages that don’t need to be crawled, block them using robots.txt
Alternate page with proper canonical tag
The page is an alternate version (e.g., AMP, mobile) and correctly points to the canonical page.
Example:
AMP pages pointing to their desktop equivalents.
Solution:
No action required if the canonical tag is correctly implemented.
Page with redirect
The page has a redirect in place, leading users and crawlers to a different URL.
Example:
A page that has permanently moved to a new URL.
Solution:
Ensure the redirect is appropriate and leads to the correct destination.
Redirect Error
There is an issue with the redirect, such as loops or excessive length.
Example:
A redirect chain that is too long or loops back to itself.
Solution:
Simplify and correct the redirection paths to avoid errors.
Blocked by Robots.txt
The page is blocked from being crawled by directives in the robots.txt file.
Example:
Pages like admin sections or staging environments are blocked.
Solution:
Update the robots.txt file to allow crawling if these pages should be indexed.
Blocked Due to Access Forbidden (403)
The page returns a 403 error, indicating that access is forbidden.
Example:
Restricted access areas requiring authentication.
Solution:
Modify server settings to allow Googlebot access or remove access restrictions.
Blocked Due to Unauthorized Request (401)
The page returns a 401 error, requiring authorization to access.
Example:
Login pages or restricted content areas.
Solution:
Allow Googlebot to bypass authorization or adjust settings to permit access.
Blocked Due to Other 4xx Issue
The page returns a different 4xx error, such as 410 (Gone) or 451 (Unavailable for legal reasons).
Example:
Pages removed due to legal reasons or permanently deleted content.
Solution:
Investigate and resolve the specific 4xx issue.
Server Error (5xx)
The server returns a 500-level error, indicating server-side issues.
Example:
Server overloads, misconfigurations, or temporary outages.
Solution:
Fix server errors to ensure the page can be accessed and indexed.
Crawled – Currently Not Indexed
Google has crawled the page but decided not to index it yet.
Example:
Pages with low-quality content or thin pages.
Solution:
Improve page content and quality, then request indexing again.
Discovered – Currently Not Indexed
Google has discovered the page but has not yet crawled it, often due to crawl budget limitations.
Example:
New pages or recently updated content.
Solution:
Wait for Google to crawl the page or use the URL Inspection Tool to request indexing.
So To Sum It all Up
By understanding and addressing these common problems, you can ensure that your website’s pages are properly indexed and crawl rate is optimial which in turn improves visibility and performance in search results.
If you encounter any of these issues and need further assistance, feel free to contact me. I’m here to help you optimise your website and improve its visibility in search results. It’s what I do.
About The Author - Dave Ashworth
I would describe myself as an SEO Expert and a specialist in technical optimisation with a professional approach to making websites better for people and better for search engines.
When I'm not blogging, I deliver website optimisation consultancy and organic SEO solutions by addressing a website's technical issues and identifying opportunities for growth