Impossible tri-bar

Digital Phenomena - Your first stop for internet consultancy 
Search Engine Optimization -- FREE!

Page 3 — Step 1: Get Crawled

There are quite a few things you can do to grab the attention of search engines and directories:

Clean Up Your URLs

Frames used to be the biggest roadblock to getting crawled, but no more: Both Google and Inktomi now crawl them (the section of Inktomi's support FAQ that claims this isn't so is out of date, according to the company). Instead, the problem with most e-commerce sites today is that their product pages are dynamically generated. While Google will crawl any URL that a browser can read, most of the other search engines balk at links with "?" and "&" characters that separate CGI variables (such as "artloop.com/store?sku=123&uid=456"). As a result, many individual product pages don't show up outside of Google.

One way to circumvent this difficulty is to create static versions of your site's dynamic pages for search engines to crawl. Unfortunately, duplicating your pages is a huge amount of extra work and a constant maintenance chore, plus the resulting pages are never quite up-to-date — all the headaches dynamic pages were designed to eliminate.

A far better strategy is to follow the lead of Amazon and rewrite your dynamic URLs in a syntax that search engines will gladly crawl. So URLs that look like this ...

amazon.com/store?shop=cd&sku= B00004WFIZ&ref=p_ir_m&sessionID= 107-6571839-6268523

... become ...

amazon.com/exec/obidos/ASIN/ref= B00004WFIZ/ref=pd_ir_m/107-6571839-6268523

Amazon's application server knows the fields in the URL are actually CGI parameters in a certain order, and processes them accordingly.

J.K. Bowman's Spider Food site explains how to fix URLs for most popular e-commerce servers. One of Artloop's Web programmers learned Apache rewrite rules that tell Apache how to translate slash-separated URLs into a format used by their Netzyme application server. On the back end, Netzyme is passed something like this:

artloop.com/cgi-bin/CssP.exe?CsspApp= ArtLoopClient1&CssServer=localhost%3A32401&CsspFn =@/details/ArtistDetail.html:@:getForm&ObjectLocation =ART&ArtistID=3918

But users and search engines see the tidier, Apache-served URLs, which look something like this:

artloop.com/artists/profiles/3918.html

Not only are the rewritten URLs crawlable by all search engines, they're also more human-friendly, making them easier to pass around the Net.

Many readers have written in to to ask if the search engines will begin crawling and indexing Flash content soon. The answer, as you might guess, is no. Unlike PDF files, Flash files rarely contain information in text format. Search developers don't want to clutter up their indexes with a million "Skip Intro" pages.

Submit your Site

There are a lot of automated search engine submission services that you can use to submit your site to as many search engines as possible. The one most recommended by people I talked to is Submit It, an early player that did so well, Microsoft bought them — Submit It is now part of MSN bCentral, and it charges a minimum fee of US$59 to keep a few URLs submitted for a year.

You can avoid the fees by simply submitting to individual search engines on your own. Start with UseIt's list of top referrers — that's where most of the traffic you can get will come from. And while you'd think submitting your site to one Inktomi-powered site would work for all of them, optimization experts have told us it works better if you hit them all.

Don't Forget the Directories

Submit It does submit your site to the busiest directory sites, except for the biggies: Yahoo, LookSmart (which MSN serves under its logo), and the Open Directory Project (which powers Lycos, Hotbot, and Netcenter categories). Some of these directories charge for submission, but $400-500 total will get your most important pages into the most trafficked places.

Yahoo still offers free submissions, except for business categories, which cost $199. But even the fee doesn't guarantee they'll accept your site, just that they'll decide on it within a week — with free submissions, you don't even get the promise that they'll ever get around to evaluating it, given the incredible volume of submissions.

Once you've submitted your pages, be ready to wait a month, two, or three before they're crawled and indexed. It's frustrating, but processing a billion Web pages takes time — at a nonstop rate of one hundred per second, it would still take almost four months.

Make a Crawler Page

It isn't necessary to submit every page on your site to the search engines. Just make sure they can find all the pages that matter by hopping links from your front door. To do that, make a "crawler page" that contains nothing but a link to every page you want search engines to crawl. Use the page's TITLE info as the link text — this helps improve your site score.

Basically, the crawler page is a site map that lists all the pages on your site — it may be a bit too big for humans to read through, but it will be no problem for a search engine. Add an obscure link to the crawler page on one of your site's top-level pages, using a small amount of text. MSN used to use 1x1 images for this trick, but the Google geeks warned us to avoid such obviously invisible tags. "Why not just label it 'site map?'" one asked. Search engine spiders will find it as soon as they get to your site, and suck down all the pages it finds on it.

Don't worry, the crawler page won't show up in search results. It does get pulled into the search engine's index, but because it has no text or tags to match a query, it isn't listed as a result. The pages it links to, however, will appear because the search engine's spider found them right after it visited the crawler page. Wired News, for example, uses hierarchical sets of crawler pages to make sure every story ever published is crawlable from the top of the site.

For Artloop, we decided to break the crawler pages down into 100KB pages or smaller, just to be careful — we wanted to prevent search spiders from timing out or deciding the pages were too big to crawl.

Pay to Play?

Not too long ago, in response to years of complaints from commercial site owners who demanded their pages be indexed and up to date, Inktomi announced a new service that lets site owners pay to have individual URLs crawled and indexed quickly. If you're wondering whether paid listings are worth it, I suggest trying just a couple of your URLs first — pick the ones you feel are poised to make the most money — to see if the return on investment meets your needs.

Remember that Inktomi will rank search results largely on the links to your page from other domains. And if no one is linking to you, expect to see your page appear at the end of the results list, not at the top.

There are ways, however, to get your site moving up through the ranks.

next page»


|Home|About Us|Services|Search|
|Software|Products|Support|Links|Latest|
W3C validatedW3C validated CSSCompatible with all browsers