by MaxPowers » August 20th, 2007, 2:18 pm
AutoMapIt offers the Link Mapper to help you track down where URLs come from on your site. If you enter the URL that doesn't exist (but shows up on your Tweak Page), you will be able to see your pages that link to that URL as well as the URLs that page links out to.
AutoMapIt will 'cleanse' your list of URLs each time the system visits your site and delete all URLs that returned a 4xx code last time it ran. If it receives a 404 HTTP Status Code, it will properly mark your page as non-existent, the URL will not be used in your sitemap, and the next visit will begin by removing any URLs with a 404 HTTP Status Code again.
You can check to see if your server and scripts are properly reporting 404's by using the <a href="http://www.automapit.com/headerchecker.html">Header Checker</a> here at AutoMapIt. If you enter one of the URLs that has been removed from your site, but still shows on the Tweak Page, it should be returning a 404 (Not Found). If it returns a 3xx or a 2xx number, then your server is not reporting missing pages correctly.
If this is the case, I would suggest getting that fixed so that other search engines aren't crawling old URLs only to find that they are being reported as being Found (or 'over there'). You'd be amazed at how fast your site-wide keyword analysis begins to return terms like 'page not found' when the spiders can't distinguish between pages that are there (200) and pages that aren't (404).
In Short (The Rules):
There is no way to 'remove' a URL from the Tweak Page, but by checking the 'Block' box, you can block my system from spidering or otherwise analyzing these URLs.
Any URLs that are blocked on the Tweak Page will not be used on your sitemaps.
Any URLs that return a 4xx status code are considered 'bogus' and will not be used on your sitemaps.
Any URLs with a 4xx status code will be deleted from the database when the spider runs next time. (If the spider finds a link to the broken page, it will check that link again and set it to 4xx if it wasn't found.
All of these 'rules' will survive the next set of changes I'm working on adding to the system which should be out within a week or so.