 |
Robots In the Sandbox |
 |
For a search engine to be of any use, it needs to
know what's out there. To find out, a search engine uses a crawler
program. A crawler downloads web pages just as a browser does, but
instead of displaying the page, it uses it to update the search engine's
database of pages. The crawler records the words on the page along with
the page's URL. The engine's search page matches your search words against this
database. The crawler also remembers the links in the page so it
can visit those pages later, and record their words.
Programs which automatically surf the web are known as robots;
a search engine crawler is a variety of robot. Crawlers are also sometimes
called spiders. (What else would you find crawling around a web?)
Here is a list of recent visits by the crawlers of some
well-known search engines to the Computer Science Department's
web server. This page is generated from recent entries in the web
server logs. Downloads are counted, and several of the most recent ones
are listed.