I’ve just had someone contact me on Linkedin with a Technical SEO question which has resulted in this blog post being written as there must be loads of people out there who have issues with technical SEO and this problem in general. When you work on big sites, in particular, E-commerce websites then they grow arms and legs with new products resulting in thousands of pages and URLS being created over a period of time which all clearly us part of the ongoing growth of the business.
As an SEO the volume of pages is all good and well when they are all working and ranking well, but realistically when there are tons of URLS being indexed over time many of these for one reason or another will no longer actually be useful.
Products end up being discontinued and things change and before you know it you may have 5000 URLS that are no longer required to be indexed by Google, you simply cant afford to rip them down as serving 404 errors all over your site will potentially drain the power of your website. So an easy fix would be to simply no index the pages but this in a lot of cases will simply not work out well for you.
The problem with that is when Google have been able to crawl and index a page for a prolonged period of time and then you ask it to no index the page, in most cases, it will simply still crawl the page and the page will remain in Google’s index. I’m not Google and no idea why they won’t react to the tag, all I know is that on the occasions I’ve tried this it simply didn’t work. I can only presume that this is some kind of flaw in their indexing system and I’m sure one day it will be rectified but even in 2017 we are still running into these problems on a regular basis.
Now thinking about this logically if Google see a 404 error page it flags up as a warning, they won’t take the page of the index right away but they will come back a few more times and it the same error appears then that URL eventually get taken off the index.
Google can respond to error codes no problem and take your page of their index very quickly, however having loads of 404 errors can have a severe impact on your SEO so it isn’t wise to use this approach as your SEO could have been building links to this page and the URL may have some authority that you simply don’t want to throw away.
Now after trying all of the usual methods I read somewhere that configuring your server to serve a 410 status code is the best thing to do in this situation and being stuck with Google continuing to index pages then it was time to try and test a few methods that others believed to be true. By serving this code ( 410 ) when going to the URLS you want de-indexed you are basically telling Google the page is permanently gone and won’t be coming back. Which is why this status code is referred to as the 410 Gone code.
This is a fairly easy thing to configure and technically a 404 and 410 error code are the same thing but a 404 error is seen as a temporary error and a 410 is seen as the page to be completely gone and Google and Matt Cutts have confirmed that use of a 410 can be seen as different to a 404 error via this blog post.
I’ve had this issue several times over the past few years and the 410 status code worked a treat for me and got me the pages de-indexed. I was stuck in limbo for months on end trying to work out a solution and when I found out this simple and easy method worked a treat I was kicking myself for not trying it earlier.
So hopefully this will help anyone who is having problems with Google not de-indexing their pages after trying the no index and other methods that are out there.
If you do have any questions do get in touch.