Web Search Comparison
For this lab, please download these questions, and submit the document with your answers.

Search me

For optimal search experiences, the goal is to deliver what librarians call precision and recall while still having the most relevant results show up at the top. Search engines employ extraordinarily complex algorithms to deliver those optimized results, and as the scope of information changes, so must the algortithms. Similarly, look-and-feel changes help highlight or hide functionality based on changes made by the search provider. Why does this matter? Because you may develop a routine around searching and not realize that your access to certain parts of the search tool have changed, or the tool works in a different way than you have been used to. Keeping a critical eye on any tool you use is a good idea in this environment of change.

Putting search tools to work: compare and contrast

In this lab, you'll enter the same search terms across a variety of web search tools, including bing, Google, and Wolfram Alpha. We'll also take a closer look at other search tools, like finding materials in your library catalog.

Learning more

  1. Fingerprinting refers to a technique where a web site collects information from your browser to distinguish it from other browsers. This allows web site operators track your browser as you visit various web sites. Suppose you were watching pedestrians walking around a downtown area or a shopping mall. You could fingerprint an individual by recording the color of the person's clothing, type of shoes, hair color and style, and any other observable attributes which are unlikely to change in a short time. Store owners could observe their customers, build such fingerprints, then combine their observations to create a profile of shopping preferences for each individual, without ever knowing anyone's name. With enough data points, it would be very unlikely that two shoppers would have the same fingerprint.

    A similar technique can be used for browsers. Browsers send various information to web sites, which data can be collected into a fingerprint. The more information the browser sends, the more distinctive it is. Visit the Electronic Frontier Foundation's Panopticlick website (opens in a new window) and click on TEST ME to find out how distinct your browser is.

    • Click on the text that says “Show full results for fingerprinting.”
    • Notice if the site declares that your browser is “unique,” or how often browsers match yours.
    • Look closely at the column labeled “one in x browsers have this value”. Where is that number the highest (which means fewer people share that same information)?
    • Did you know you were sharing the information you see on the site?
    • Why do you think you are sharing it by default?
    • Finally, make a note of the numbers this site shares with you. What would happen if you went to the site more than once? Try it and see if your guess was correct.

  2. Search engines like Google, bing, and Yahoo!, as you may already know, collect information about where you are, and what you search for. Try searching for the word waffles in Google. Try it again in bing. Are any of the results close to you - in your city or state? Try the same search in the version of Google targeted at people who live in Spain. When results are customized to you, you might miss out on some results. This effect is sometimes called a "filter bubble".

    • What was your experience? Aside from any language differences, which results are the same? Why do you think those sites appear in the results list?
    • If you got results that were physically close to you, like a restaurant that serves waffles in your neighborhood or a map of places to get waffles in your city, how do you think the search engine decides to show those results?
    • When is knowing what city you are in when you search helpful? When could it be harmful?
    • If you tried these searches from your computer, try a search from someone else's computer, or a computer in a library, or from your phone. Do you get different results? Why do you think the results are the same (or different)?
    • Try doing the same search from the same computer in the same place a week from now, two weeks from now, and a month from now. What changed? What could be some reasons for any changes you see?

  3. Try comparing some web-based search tools to each other. Try the following searches in bing, Google, Google France, and Wolfram Alpha. Type in the searches exactly as they are shown below:
          apollo 13
          convert 100 euros to us dollars 
          how do I change the battery in my laptop?
          translate poisson
          a modest proposal
          the prince

    • What did you notice about the "type ahead" or "suggested search" features (if present) that try to complete your search for you? Were they always helpful? Do you know where they come from? Could you turn that feature off if you wanted to?
    • How do the search engines differ in their results? Why do you think they do? If you were a programmer working for one of these search companies, why would you choose certain search results as more relevant than others? How could you get those results to come up earlier in the "hit list" or results set?
    • Many times the first page of results will look similar from search tool to search tool. Take a look at the second pages of the search results for the search "java". Do the search tools' results sets begin to diverge (get less alike), or converge (get more alike)? How many of the results on the first and second pages are related to commercial products? Is the ratio of commercial sites different on the second page compared to the first? Is the ratio different from one search tool to another?
    • Were there any results you were suprised by? Which ones, if any?

    According to a Wall Street Journal (paywall), article from 2012, the Orbitz travel site would start out showing more expensive hotel offers if you are browsing from a Mac. Mac owners have higher incomes on average, and are more likely to choose a more expensive room. Is that sort of customization beneficial to you?

  4. Now take the same search terms, and enter them into a few library search tools. You can use our own institution's library, or you can search the New York Public Library, Arizona State University's library catalog, or ASU Library's discovery layer service, Library One Search.

    • What were the similarities and the differences between the web-based search tools and the library search tools?
    • Were the differences in the results what you expected? Why or why not?
    • Which tool was the best at each search? Were some tools just as good as the others, depending on the search?
    • Find the advanced search functions for each search tool. Could any of these options helped you get better results? Which ones?
    • Which search tools had ads at the top of the results list? Was it easy to tell they were ads?
    • What did the search tools assume about your search for "java"? Which "java" did you think you were searching for? Was it one of the listings on Wikipedia's disambiguation page for "java"? Do you think any of the results were affected by your own "filter bubble"? Why or why not?
    • Were any commercial results in the results sets you got from the library? How do you know? Were there any resources you were not allowed to get to because you were not a library patron? Could you get to those resources legally and for free anywhere else? Where?

  5. Your textbook describes and analyzes the joke page supporting the Pacific Northwest Tree Octopus. Another joke page is the Dihidrogen Monoxide Research Division home page. Have a look at the DMRD page. (If this is not the first time you have seen it, try to pretend it is.)

    • When you first see the page, does it look serious and official?
    • How long would it take before you start getting suspicious that this is not for real? How long before you are sure?
    • What features make you most suspicious? Do you have to follow links to get there, or are they on the first page?
    • Everyone looks at the world with certain presuppositions. What sort of leanings would cause a person to take this page seriously? What sort would make you dismissive immediately? Given your own way of looking at life, what sort of joke would you take the longest to “get”?

  6. The barriers to publishing a website are not very high. As you may gather from our earlier labs, you can do it yourself for free. What does that mean in terms of how accurate the information is that you find on the web? What are some things to keep in mind as you search the web, not just for papers you need to write, but other things you might need to do. How do you know the person tweeting as Lady Gaga is really her? If you don't know for sure, how would you find out? (Instructor's comment: There's no good reason to care.)