Robots In the Sandbox

Robots In the Sandbox

For a search engine to be of any use, it needs to know what's out there. To find out, a search engine uses a crawler program. A crawler downloads web pages just as a browser does, but instead of displaying the page, it uses it to update the search engine's database of pages. The crawler records the words on the page along with the page's URL. The engine's search page matches your search words against this database. The crawler also remembers the links in the page so it can visit those pages later, and record their words.

Programs which automatically surf the web are known as robots; a search engine crawler is a variety of robot. Crawlers are also sometimes called spiders. (What else would you find crawling around a web?)

Here is a list of recent visits by the crawlers of some well-known search engines to the Computer Science Department's web server. This page is generated from recent entries in the web server logs. Downloads are counted, and several of the most recent ones are listed.