AutoMapIt Sitemap Creation Service

Keep your website healthy

Link Checkers of all types

February 14th, 2007

While the broken link checker has been around on AutoMapIt for a while now, I figured there had to be a good way to check inbound links coming in to your site as well. We are in the middle stages of creating a backlink checker. We find up to 1,000 or more of your backlinks and check them for a variety of stats including their rank, the number of links that are on their page, their IP, and any backlink text that is associated with the link to your site.

The goal of the inbound link checker is to allow you to look at who is pointing at you and what their link is saying about you in order to help you guide your link building campaigns using multiple class ‘C’ IP addresses and varying text that applies to your site.

Stay tuned for updates on this feature which will only be available to upgraded members. upgraded members receive access to all of our tools, increased update frequency for your sitemaps, and a whole lot of extra goodies!

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Reddit
  • Technorati
  • Slashdot
  • del.icio.us
  • Fark
  • Furl
  • YahooMyWeb

Page Not Found… or was it?

January 25th, 2007

Recently I was helping a client of mine with her website and noticed some strange things happening as AutoMapIt spidered her site(details excluded to protect the innocent). Many pages were coming up as ‘OK’ while the text on them read something like “Page not found, please return to the index page and try again”. Although the automated check that AutoMapIt provides for this showed that her SERVER was returning the correct 404 for ridiculous files such as quijibo869.htmlqrst, if the script that listed the articles was given a bum article, the page said “error” to a human, but returned “OK” to the search engines… after all an error page WAS being found and returned instead of a themed article.

After some URLs changed, the old ones in my system began to take on a new life of their own. All of a sudden the site-wide theme tool on AutoMapIt was showing that her site was about “page not found”. When I checked the HTTP headers coming from some of her pages, it showed a status code of 200 (OK, Page Found).

what can you expect after fixing this sort of problem? You will likely see the count of your pages fall dramatically in the search engines after this, but your ranking for ‘real’ terms may come up as a result of them no longer seeing your site as having a theme related to “page not found” (or whatever page text you use). The pages that drop off the search engine will not be real pages, but an endless string of pages that tell humans “Error” but tell the search engines “ok”.

If you are after “quality above quantity”, please use the HTTP Header Checker on this site to test your error pages. If you don’t see a 404, but instead see a 200… you’re in trouble. I know that in PHP, this can be fixed by adding a header() function that sends a proper 404 error code on your error pages before any text is output to the browser. The key point here is “before any text is output to the browser”.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Reddit
  • Technorati
  • Slashdot
  • del.icio.us
  • Fark
  • Furl
  • YahooMyWeb

Google Sitemaps Do Not Hurt Your Ranking

August 2nd, 2006

I don’t think that it is the use of Google sitemaps that hurts sites so badly. Many of the sites that I’ve seen talking about being reduced are in the mega-numbers… ok, more than 100 pages and usually MUCH more.

One site that I worked on a few months back had nearly 300,000 pages indexed in Google until I made some changes to the scripting. When the site: command returned 3,000 pages, the site owner flipped!

What I was doing to the site was setting up ‘proper’ 404 redirects instead of the ‘hosting Control Panel’ variety that redirects to the homepage and returns an HTTP Status code of 200 (Found).

What had happened was they lost all of the OLD URLs that weren’t being used anymore. Once Google crawled the site and picked up the bad URLs (some were REAL badly broken), it kept returning over time. Two years later and 300,000 pages strong, this website finally ‘confessed’ to Google that most of the indexed pages were actually no longer on the server.

Enough history… When you setup an account at Google Sitemaps, they require you to ‘fix’ your 404’s that report as a 200 (Found). Once they spider their old cached URLs, they realize that many of your pages are no longer around and drop them from their index.

In the case of my client, their ranks for current pages soared after this. It seems they may have also been hit with dupe content penalties for having 297,000 copies of their homepage!

This may not be the case for all pages dropped from the index, but it explains why it happens shortly after people sign up (and fix their 404’s).

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Reddit
  • Technorati
  • Slashdot
  • del.icio.us
  • Fark
  • Furl
  • YahooMyWeb

10 Steps to a Spider Friendly Website

July 20th, 2006

The 10 steps below will ensure that GoogleBot and other search engine spiders are able to crawl your site with little effort when they visit you. The search engines job is to index as many pages as possible from the web and they do this by trying to find the most efficient pages to process so that they may move on to the next. You can ensure their favor by helping them to index your site efficiently and effectively.

Proper Nesting

Many spiders are built to parse the HTML on your page. One of the things that may confuse a parser or cause it to work harder is improper nesting of HTML elements on a page. These are tags that are opened and closed out of order such as abc…cab versus abc…cba In order to have the spiders crawl through your page flawlessly, visit validator.w3.org and use their HTML validator specifically looking for error messages that say “end tag for ‘####’ omitted, but its declaration does not permit this” or “end tag for element ‘####’ which is not open”. These are errors that signify improper nesting and while they may allow some spiders to visit your website, your site will lose valuable SEO points because the spiders cannot visit your site as efficiently as it can visit other sites. Remember, search engines need to spider many pages quickly to stay in their game.

Use of Frames

This is a spider killer if there ever was one. You simply cannot link to a specific page in a framed site with any effectiveness. If you link directly to a page within the frame, the rest of the framed page disappears. If you link to the main frames page, there is no way to link to the subpages and only your homepage(s) will show. I-Frames offer a slight advantage as they are contained within a page that has it’s own URL, but many times the content within an I-Frame is inaccessible. Don’t place your navigation or main content into an I-Frame.

SessionID’s in Your URLs

These lead to a seemingly unending list of variations on each URL that have no real effect on the page. Every time a spider visits your pages, they get more and more versions of the same ‘other’ pages, but with different URLs each time because of the sessionID. If you want the spiders to feel at home on your site, then the barbed-wire fields of sessionIDs are not going to work. One way around this is to use a sitemap creation service like AutoMapIt.com that allow you to ‘ignore’ certain URL keys like the sessionID or keys used for sorting lists. This way, they have at least one link to each page that doesn’t include a sessionID. Even if they are turned off at your sitemap, when the spiders visit, they will find the IDs again.

SessionID’s may show up in otherwise friendly URLs. If your users have to log-in on your site or you use some types of tracking software, sessionID’s are what helps your site to keep visitors logged in. They are usually stored in cookies through your browser and don’t normally show up in your URL. If cookies are not available (like on most spiders), the sessionID then gets passed through the URL. You may want to turn cookies off in your browser and surf your site… do the sessionIDs show up?

Search-Friendly URLs

This goes beyond the sessionID’s in the URL and appeal more to the actual URLs being used. According to Google Webmaster Guidelines, “It helps to keep the parameters short and the number of them few”. Apache makes this easy through mod_rewrite and Windows servers offer Isapi-Rewrite to accomplish turning ‘ugly’ URLs into ’static’ URLs. You should NEVER use id= in your URLs as Google does not to spider those pages. Change it to cid= or lmnopid=, but if you use id=, your pages are already doomed from the start.

Proper Meta Tags

Despite their value decreasing, and a lot of talk about them being of no use now, meta tags are anything but dead. Google uses your meta-description as the description on their SERPs. Yahoo uses description tags and prefer sites that have them. Many other search engines use the meta keywords as well. Even if they only verify that it is present and correct, many search engines won’t touch your site without them. The meta robots tag is unnecessary unless it’s restricting access to a page. It is the spiders natural state to index a page and follow all links from it. Using it to ‘allow’ spiders only increases your code:text ratio needlessly and may hurt your potential rank by a slight amount.

Javascript/Images/Flash

Use these sparingly to accentuate your site for users. You should never rely on these for critical website functions like navigation, content, or other vital page info. Even though 98% of the internet uses javascript, turning it off along with images and multimedia in your browser (or using the Lynx browser) will allow you to visit your site as the search engines do. This 20 minute task will alert you to potential problems that your site will have when the search engines visit.

Links

Most everyone knows that links into your website bring spiders to visit and help your rank, but the links within your site to itself and other websites are also very important. Google limits you to 100 links per page and most other engines are rather close (but perhaps more generous) and allow you 150 or so links per page. This is total for links within your site and links to other sites.

Keyword stats and code:text ratio

Your goal should be to have the most content possible (human readable) while limiting the amount of code used to deliver this. Keywords should be limited to a few select phrases and possibly some variations on those phrases. If you are at all considering any meaningful SEO work on your site, you should really invest your first few dollars into finding the right keywords. There are many free keyword tools such as Wordtracker.com to at least give you a start, but the few extra dollars for the upgrades are usually well worth it if you want to target the correct markets and find terms that you weren’t expecting.

Standards Compliant/Accessible

This builds on the Validator tool used earlier from W3C. By making your website completely standards compliant, you remove any possible errors in the code that may hang-up a spider. While complete compliance is not as critical as parsing errors from open tags and truly broken code, it certainly helps to ease the process of being spidered.

Branding

The links within your site should all use the same www. or non-www. branding. Many search engines see those variants as two separate sites and Google will even apply different PR to a page based on www or non-www. This can be further enforced by changing your htaccess file to automatically switch versions if the other is visited. If you visit the non-www version of my sites, it will automatically be forwarded to the www version. This is not a critical step, but it is a major enhancement for not a lot of work involved. Why split your value in half when you can focus it like a laser? Just be sure to request links, add your link to profiles, and link to your own site from within your site using one version or the other, but not both.

Following these ten steps will help to ensure that your site is well-received by ALL of the search engine spiders and will not limit you to the bare essentials for only one spider (or bad ranking with all of the spiders equally). Let’s face it, if you have ever tried to get better rank on a search engine, you know that any break you can get will help. These steps should be performed at the creation of your web site and continue throughout it’s life-cycle to ensure that the spiders are always able to visit your site with as little effort on their part as possible.

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Reddit
  • Technorati
  • Slashdot
  • del.icio.us
  • Fark
  • Furl
  • YahooMyWeb

Your website Searchbox is on AutoMapIt

July 18th, 2006

AutoMapIt announces the release of it’s searchbox technology. This searchbox installs on your website as a simple HTML form that helps people who come to your site to find the content they are looking for. Code integration into your website is flexible enough to allow hard-coded ’search’ URLs as links that the webmaster feels are important or the more traditional HTML form that allows the user to select the keywords.

This service is free, ad-supported, and available to any member of the AutoMapIt Sitemap Creation Service. There is an upgrade available for those who wish to use this as an ad-free service. In order to keep the search results current with your website, the contents of your site are updated automatically by our spider on a daily, weekly, or monthly basis depending on your membership level.

Sign Up for AutoMapIt service or visit our Search Box forums.

Tags: Our new allows you to offer for your own content

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • Reddit
  • Technorati
  • Slashdot
  • del.icio.us
  • Fark
  • Furl
  • YahooMyWeb

FireStats iconPowered by FireStats