![]() |
|
|
#1 (permalink) |
|
Registered User
Join Date: Sep 2005
Location: Yorkshire, England
Posts: 34
|
Hi,
one to watch out for if you want your site ranked well by Google (don't we all!) When you update your web site, particularly when you are restructuring, some of the pages inevitably change address leaving old addresses no longer in use. If you were to do nothing about this at all, a visitor coming to the site sees a standard error message along the lines of 404 Not Found and some bits and pieces about how to get to the home page or check again later. If you are using ASP.NET, this message is less useful as it does not give that link, just a warning that the resource was not found. Either way, most web site creators like to change this page for something a little more interesting and, usually, styled with the graphical layout used for the rest of the web site. You can do this by simply adding your new error page and, with Catalyst2, setting up custom error pages in the Helm control panel. This all seems wonderfully easy except for a gotcha with Google when using this method. (NB: This does appear to be very Google-specific, Yahoo and MSN don't seem to do this). When IIS (sorry, don't know the impact if you are Linux-based but hopefully someone will be able to add to this thread) sends out the standard 404 error page, the header that is not seen by the human web site visitor includes the details of the error, that is "404 Not found". When you change the standard error page to a custom error page using Helm, the page is displayed correctly but now the page's status is not 404 but instead, "200 OK". The human web site visitor does not care about this as it is not generally visible in the web browser. However, when Googlebot (Google's site indexing robot) visits your web site, it remembers the addresses of all of the pages that it has indexed and revisits these pages. When the page has been moved, Googlebot sees a nice new page (your custom error page) and the page's status is 200 OK. Googlebot assumes that this is an updated web page within the site and indexes it. (Problem number 1, impact minor - you don't want your error pages indexing generally). Googlebot continues indexing your site and may see many of these pages after a site update. This becomes a problem because, as Google's guidelines state, duplicated content is frowned upon and causes your page ranks to be lowered. In the worst cases, your site can be removed from Google altogether for posting duplicate content. (Problem number 2, impact serious). So how do you resolve this issue? If you are using simple HTML custom error pages then you cannot fix this. The only solution is to change your pages to a different standard. For my site, I use ASP.NET so this solution is geared around that technology. What you need to achieve is to make sure that when Googlebot visits an error page, it knows that an error has occurred and does not try to index the page. Using ASP.NET, your '404notfound.aspx' page needs to display a nicely formatted error to your web site visitor and return the "404 Not Found" exception for Googlebot to process. This is actually quite simple to do. In the page_load event for the error page, simply add the following line of code: Response.Status = "404 Not Found";This changes the page's status to the error message. You should, of course, have different statuses for different errors. You would not want to pass a 404 and tell Google that the page does not exist when actually there has been an internal server error. Richard |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | Rate This Thread |
|
|