Every blogger acknowledges the importance of repairing broken links on Google. Finding webmaster tools infested with 12,000 crawl errors that are staring at you can get quite overwhelming. The key is really to know which errors can be brushed aside, and which ones are the crippling ones – those that require your immediate attention.
It is really important to keep a close eye on these errors because of the impact they can have on your users as well as on Google’s crawlers.
Crawl errors usually originate due to server issues or if a new post is still linked to a post you have already deleted, thereby providing you with an incorrect count whenever an online search engine crawls your post. It then shows a 404 error and if this is not addressed on time, an increasing number of 404 errors leaves your users with a poor experience. If the user gets a 404 error multiple times during one session, your site credibility takes a massive hit.
You would also not want to lose any juice that your site might be getting when other sites point to your site’s dead URL. Redirecting your link to a good error by fixing the crawl error can help you capture that link and resuscitate your rankings.
Also remember that Google has a pre-assigned budget allotted for crawl for every site including yours. If the robot spends most of its time crawling your erring pages, the more valuable pages that are actually working on your site will be overlooked.
Let’s now look at the main categories of crawl errors that show up in Google Webmaster Tools error reports.
The pages that show 403 errors are usually returned by this section and are not the biggest problem in webmaster tools.
2. In sitemaps
The pages listed in the current sitemap to have returned a 404 error are the ones that cause errors in sitemaps. Google has an irritating habit of constantly crawling your old sitemaps that have long since been deleted by you to check if the URLs and the sitemaps are really dead.
Fix: Make sure you have created your sitemap 404. Also, ensure that your current sitemap is not getting redirected from your erring sitemap if you don’t want being crawled because of the old sitemap that you have removed from the webmaster tool.
3. Not Followed
Often caused due to redirect errors, these can be fixed by minimizing redirect chains, reducing redirect timers to shorter periods, and avoiding the use of Meta refreshes in your page heads.
Fix: Things to watch out for after implementing redirects:
• When you permanently redirect a page, make sure you have received a proper HTTP status code and 301 is moved permanently
• Redirect loops pointing back to themselves should always be avoided
• Pages must not be redirected to 404, 503 or 403 pages. Instead, they should point to valid pages
• Redirects must point to pages which actually exist and not to the ones that are empty
Absolute and note relative links must always be used to prevent content scrapers from scraping your images or content.
4. Not Found
Not found errors are mostly the 404 errors that your site creates and can occur in the following ways:
• A page on your site is deleted and does not 301 redirect it
• The name of a page on your site is changed and does not 301 redirect it
• An internal link on your site has a typing error which is linked to a non-existent page
• Another site with a typing error links to your site
• Your sub-folders do not match properly after you migrate your site to a new domain
Fix: 301 redirect your 404’d good links to a page that the link was supposed to go to, or to a similar or the parent page if that page has been removed permanently. It is ok to 404 if you have a large set of pages or an old page. Google recommends letting the Googlebot know which pages are not wanted anymore.
5. Restricted by robots.txt
These are informational errors that show robot.txt files have blocked some of your URLs.
Fix: This can be fixed by ensuring only the URLs you want to be blocked are on the list when you check out your robot.txt file. Sometimes, the files that are not explicitly blocked might be on the list which then needs to be addressed at an individual basis to find out the reasons of them being there. Running the erring URLs through URI valet to see the response code is a good method to investigate the problem.
6. Soft 404s
If your pages look like landing pages and / or have very little content, then they might get categorized as a soft 404. Classification as a soft 404 is not ideal and needs to be fixed so that it returns a hard 404. If you are getting a soft 404, then it means that the 404 Page Not Found response code is not returning as a HTTP response code.
Fix: It is recommended by Google that a 404 or a 410 response code is always returned in response to a request for a page that does not exist.
7. Timed Out
The Googlebot will stop trying to call a page after a while if it takes too long to load.
Types of timed out errors:
• DNS lookup timeout errors take place when Googlebot requests do not reach your domain’s server. At that time, you should check your DNS settings. If everything looks fine at your end, the problem might be at Google’s end
• URL timeout is an error received from a specific page and not the entire domain
• Robots.txt timeout happens when your server times out despite having an existing robots.txt file when Google attempts to crawl it
Fix: Server logs should be checked for issues to ascertain the loading speed of your pages that are timing out.
Internal server or DNS issues can result in unreachable errors. These errors can also come up if the crawler, while visiting a page, is blocked by the robot.txt file. No response, 500 error and DNS issue errors are a few possible headers that fall under the unreachable error.
There a few tools that can be used to fix errors. These include:
• URI Valet or Check server tool to check your redirects
• Screaming frog as an ideal tool for knowing which pages on a site are giving 301, 404 or 500 errors
• Put Siteopsys search engine indexing checker in your URLs list since it helps you in checking the redirects