Hello again

The Tweak Page is available to all members and allows you to micro-manage every URL found by our crawlers. You can block pages from being in your sitemaps, set priority and change-frequency for your Google sitemap, and view other stats listed per URL.

Hello again

Postby rjcx » August 20th, 2007, 1:12 pm

Although reactivating my account i have still not managed to get a usefull spider out of your software. I admit the first time was becuase of the settings and the second becuase i had my ftp turned off (sily me) thankyou for your help on my first post.

This time is becuase for some reason i have URLs that dont exsist anymore in my sitemaps. I checked my code for the url but cant find it. I went to the tweaks page (first thing i knew about that was just now when i saw it on the forum) and the URLs are in there, my question is

how do i remove URLs form the tweak page?

thanks.
rjcroasdale
www.01systems.co.uk
rjcx
 
Posts: 11
Joined: July 8th, 2007, 12:08 am

Postby MaxPowers » August 20th, 2007, 2:18 pm

AutoMapIt offers the Link Mapper to help you track down where URLs come from on your site. If you enter the URL that doesn't exist (but shows up on your Tweak Page), you will be able to see your pages that link to that URL as well as the URLs that page links out to.

AutoMapIt will 'cleanse' your list of URLs each time the system visits your site and delete all URLs that returned a 4xx code last time it ran. If it receives a 404 HTTP Status Code, it will properly mark your page as non-existent, the URL will not be used in your sitemap, and the next visit will begin by removing any URLs with a 404 HTTP Status Code again.

You can check to see if your server and scripts are properly reporting 404's by using the <a href="http://www.automapit.com/headerchecker.html">Header Checker</a> here at AutoMapIt. If you enter one of the URLs that has been removed from your site, but still shows on the Tweak Page, it should be returning a 404 (Not Found). If it returns a 3xx or a 2xx number, then your server is not reporting missing pages correctly.

If this is the case, I would suggest getting that fixed so that other search engines aren't crawling old URLs only to find that they are being reported as being Found (or 'over there'). You'd be amazed at how fast your site-wide keyword analysis begins to return terms like 'page not found' when the spiders can't distinguish between pages that are there (200) and pages that aren't (404).

In Short (The Rules):
There is no way to 'remove' a URL from the Tweak Page, but by checking the 'Block' box, you can block my system from spidering or otherwise analyzing these URLs.

Any URLs that are blocked on the Tweak Page will not be used on your sitemaps.

Any URLs that return a 4xx status code are considered 'bogus' and will not be used on your sitemaps.

Any URLs with a 4xx status code will be deleted from the database when the spider runs next time. (If the spider finds a link to the broken page, it will check that link again and set it to 4xx if it wasn't found.

All of these 'rules' will survive the next set of changes I'm working on adding to the system which should be out within a week or so.
MaxPowers
Site Admin
 
Posts: 233
Joined: May 27th, 2006, 4:40 pm


Return to Tweak Page

Who is online

Users browsing this forum: No registered users and 0 guests

cron