------------------------------------------------------------------------------
MC logo
Ch. 5: Web Searching
[^] Chapter Outlines
------------------------------------------------------------------------------
<<Ch. 4: HTML Ch. 8: Bits and Bytes>>
  1. Don't forget the library.
    1. Librarian's expertise.
    2. Not everything's online.
    3. Library web site.
  2. Information is usually organized in a hierarchy. MC's Site C-SPAN.
    1. Various depths of tree.
    2. Various depths within tree.
    3. The same leaf may appear in multiple places.
  3. Searching.
    1. Search engines.
      1. Google, Yahoo, MSN, Ask.com, Alta Vista, etc.
      2. Crawler and index.
        1. Downloads pages, files them under each word.
          1. Words in the page.
          2. Words in links to the page.
        2. Adds links in pages to its “to do” list.
        3. Periodically re-scans to see changes.
        4. Recent spider visits to the CS web site.
        5. Web changes quickly: There's always something missing.
      3. Query processor.
        1. Finds entries for words matching the query.
        2. Lists those pages.
        3. Page rank: Order response by number of links to it.
          1. Google's innovation.
          2. Greatly tweaked as folks take advantage.
    2. Building searches.
      1. List of words: Find any.
      2. Conjunctions: AND, OR. Use with parens.
      3. Use -word to eliminate that word.
      4. Suggestions from text:
        1. What kind of page: company home, index page, summary page.
        2. What type of organization will publish this?
        3. List terms likely to appear on the page.
          1. Locations.
          2. Times — businesses will give their hours.
          3. Terms specific to the type of business or its product.
        4. See if search is of correct scope.
          1. Too little. Remove restrictions, add words.
          2. Too much.
            1. Use fewer words.
            2. Use a not. Try to eliminate hits you don't want.
        5. Search within previous results.
    3. Reliability.
      1. Web v. print.
        1. Less editorial control.
        2. Harder to true source/owner.
      2. Hoaxes, jokes, errors. DHMO.
      3. Checking.
        1. Check for true owner. (Maybe use the whois database.)
        2. Check for sponsor that exists in realityspace.
        3. Check for author's credentials.
        4. See if the site is well-organized and looks respectable.
        5. Maintained and up-to-date.
        6. Check other sources.
        7. Snopes.
<<Ch. 4: HTML Ch. 8: Bits and Bytes>>