For this lab, please download
these questions, and submit the document
with your answers.
Search me
For optimal search experiences, the goal is to deliver what librarians call
precision
and recall while still having the most relevant results show up at the
top. Search engines employ extraordinarily complex algorithms to deliver those optimized
results, and as the scope of information changes, so must the algortithms. Similarly,
look-and-feel changes help highlight or hide functionality based on changes made by
the search provider. Why does this matter? Because you may develop a routine around
searching and not realize that your access to certain parts of the search tool have
changed, or the tool works in a different way than you have been used to. Keeping
a critical eye on any tool you use is a good idea in this environment of change.
Putting search tools to work: compare and contrast
In this lab, you'll enter the same search terms across a variety of web search
tools, including bing, Google, and Wolfram Alpha. We'll also take a closer
look at other search tools, like finding materials in your library catalog.
Learning more
-
Fingerprinting refers to a technique where a web site
collects
information from your browser to distinguish it from other browsers.
This allows web site operators track your browser as you visit various
web sites.
Suppose you were watching pedestrians walking around a downtown
area or a shopping mall. You could fingerprint an individual
by recording the color of the person's clothing, type of shoes, hair color and
style, and any other observable attributes which are unlikely to
change in a short time. Store owners could
observe their customers, build such fingerprints, then combine their
observations to create
a profile of shopping preferences for each individual,
without ever knowing anyone's name. With enough data points, it
would be very unlikely that two shoppers would have the same fingerprint.
A similar technique can be used for browsers.
Browsers send various information to web sites, which data can be
collected into a fingerprint.
The more information the browser sends, the more distinctive it is.
Visit the Electronic Frontier Foundation's Panopticlick
website (opens in a new window) and click on TEST ME to find out how distinct your browser is.
- Click on the text that says “Show full results
for fingerprinting.”
- Notice if the site declares that your browser is
“unique,” or how often browsers match yours.
- Look closely at the
column labeled “one in x browsers have this value”.
Where is that number the highest (which means fewer people share that same information)?
- Did you know you were sharing the information you see on the site?
- Why do you think you are sharing it by default?
- Finally, make a note of the numbers this site shares with you. What would happen
if you went to the site more than once? Try it and see if your guess was correct.
- Search engines like Google, bing, and Yahoo!, as you may already know,
collect information
about where you are, and what you search for. Try searching for the word waffles
in Google. Try it again in
bing. Are any of the results close to you -
in your city or state? Try the same search in the version of Google targeted at people who
live in Spain. When results are customized to you, you might miss out on
some results. This effect is sometimes called a "filter bubble".
- What was your experience? Aside from any language differences,
which results are the same? Why do you think those sites appear in the results list?
- If you got results that were physically close to you, like a restaurant that
serves waffles in your neighborhood or a map of places to get waffles in your city,
how do you think the search engine decides to show those results?
- When is knowing what city you are in when you search helpful? When could it
be harmful?
- If you tried these searches from your computer, try a search from someone
else's computer, or a computer in a library, or from your phone. Do you get
different results? Why do you think the results are the same (or different)?
- Try doing the same search from the same computer in the same place a week
from now, two weeks from now, and a month from now. What changed? What could be
some reasons for any changes you see?
Try comparing some web-based search tools to each other. Try
the following searches in bing,
Google,
Google France, and
Wolfram Alpha. Type in the
searches exactly as they are shown below:
apollo 13
123852/238.6
convert 100 euros to us dollars
how do I change the battery in my laptop?
poisson
translate poisson
a modest proposal
the prince
java
art
- What did you notice about the "type ahead" or "suggested
search" features (if present) that try to complete your search for you?
Were they always helpful? Do you know where they come from? Could you turn
that feature off if you wanted to?
- How do the search engines differ in their results? Why do you think
they do? If you were a programmer working for one of these search companies,
why would you choose certain search results as more relevant than
others? How could you get those results to come up earlier in the "hit list"
or results set?
- Many times the first page of results will look similar from search tool
to search tool. Take a look at the second pages of the search results for the
search "java". Do the search tools' results sets begin to diverge
(get less alike), or converge (get more alike)? How many of the results on the first
and second pages are related to commercial products? Is the ratio of commercial
sites different on the second page compared to the first? Is the ratio different
from one search tool to another?
- Were there any results you were suprised by? Which ones, if any?
According to a
Wall Street Journal (paywall),
article from 2012, the Orbitz travel site would start out
showing more expensive hotel offers if you are browsing from a Mac. Mac owners
have higher incomes on average, and are more likely to
choose a more expensive room. Is that sort of customization beneficial to you?
- Now take the same search terms, and enter them into a few library search tools.
You can use our own institution's library, or you can search the
New York Public Library,
Arizona State University's
library catalog, or ASU Library's discovery layer service,
Library One Search.
- What were the similarities and the differences between the web-based
search tools and the library search tools?
- Were the differences in the results what you expected? Why or why not?
- Which tool was the best at each search? Were some tools just as good
as the others, depending on the search?
- Find the advanced search functions for each search tool. Could any
of these options helped you get better results? Which ones?
- Which search tools had ads at the top of the results list? Was it
easy to tell they were ads?
- What did the search tools assume about your search for "java"? Which
"java" did you think you were searching for? Was it one of the
listings on
Wikipedia's disambiguation page for "java"? Do you think
any of the results were affected by your own "filter bubble"? Why or why not?
- Were any commercial results in the results sets you got from the library?
How do you know? Were there any resources you were not allowed to get to because you
were not a library patron? Could you get to those resources legally and
for free anywhere else? Where?
- Your textbook describes and analyzes the joke page supporting the
Pacific Northwest Tree Octopus.
Another joke page is the
Dihidrogen Monoxide Research Division home page.
Have a look at the DMRD page. (If this is not the first time you have seen it,
try to pretend it is.)
- When you first see the page, does it look serious and official?
- How long would it take before you start getting suspicious that this is
not for real? How long before you are sure?
- What features make you most suspicious? Do you have to follow links to
get there, or are they on the first page?
- Everyone looks at the world with certain presuppositions. What sort
of leanings would cause a person to take this page seriously?
What sort would make you dismissive immediately?
Given your own way of looking at life, what sort of joke would you take
the longest to “get”?
The barriers to publishing a website are not very high. As you may gather from
our earlier labs,
you can do it yourself for free. What does that mean in terms of how accurate
the information is that you find on the web? What are some things to keep in
mind as you search the web, not just for papers you need to write, but other
things you might need to do. How do you know the person tweeting as Lady Gaga is
really her? If you don't know for sure, how would you find out?
(Instructor's comment: There's no good reason to care.)