
Ch. 5: Web Searching
- Don't forget the library.
- Librarian's expertise.
- Not everything's online.
- Library web site.
- Information is usually organized in a hierarchy.
MC's Site
C-SPAN.
- Various depths of tree.
- Various depths within tree.
- The same leaf may appear in multiple places.
- Searching.
- Search engines.
- Google,
Yahoo,
MSN,
Ask.com,
Alta Vista,
etc.
- Crawler and index.
- Downloads pages, files them under each word.
- Words in the page.
- Words in links to the page.
- Adds links in pages to its “to do” list.
- Periodically re-scans to see changes.
- Recent
spider visits to the CS web site.
- Web changes quickly: There's always something missing.
- Query processor.
- Finds entries for words matching the query.
- Lists those pages.
- Page rank: Order response by number of links to it.
- Google's innovation.
- Greatly tweaked as folks take advantage.
- Building searches.
- List of words: Find any.
- Conjunctions: AND, OR. Use with parens.
- Use -word to eliminate that word.
- Suggestions from text:
- What kind of page: company home, index page, summary page.
- What type of organization will publish this?
- List terms likely to appear on the page.
- Locations.
- Times — businesses will give their hours.
- Terms specific to the type of business or its product.
- See if search is of correct scope.
- Too little. Remove restrictions, add words.
- Too much.
- Use fewer words.
- Use a not. Try to eliminate hits you don't want.
- Search within previous results.
- Reliability.
- Web v. print.
- Less editorial control.
- Harder to true source/owner.
- Hoaxes, jokes, errors. DHMO.
- Checking.
- Check for true owner. (Maybe use the
whois database.)
- Check for sponsor that exists in realityspace.
- Check for author's credentials.
- See if the site is well-organized and looks respectable.
- Maintained and up-to-date.
- Check other sources.
- Snopes.