Internet Basics

From Fluency, 1st ed. supplement, with local modifications.

Part 1 Domains, Hostnames and IP Addresses

Hostnames instead of IPs. In Chapter 3, we saw how every computer on the Internet has a unique numeric address called an IP address. An IP address is actually composed of four numbers, each in the range 0 to 255, separated by periods which we read as, “dot.” IP addresses are hard for humans to remember, so we usually identify networked computers using hostnames, instead. (Networked computers are often called hosts, hence, “hostnames.”) Hostnames are automatically translated into IP addresses by a special system called DNS (short for Domain Name System).

The fact that hostnames are composed of words rather than numbers is only part of the reason why they are easier for us to remember. Hostnames are organized hierarchically, the same way folders on disks can be organized. Let’s look at an example hostname and an example file location to compare these two hierarchical organization schemes:

example hostname: www.cs.washington.edu
example filename: c:\personal\finances\taxes\1040.pdf

Hostnames and domains. Let’s begin by taking apart the hostname. Notice that the hostname is split into parts separated by periods, which, like with IP addresses, are read as “dot.” The parts of a hostname are ordered from more specific to less specific, which is the opposite of how parts of the filename are ordered. Note that with the filename, directory names are separated by backslashes, and the path begins with the most general categorization (the drive letter, C) and gets more specific as you read on. On the other hand, the most general categorization in the hostname is the .edu part. All hosts in educational institutions have hostnames ending with .edu, and we call this category of hostnames the edu domain.

With over seven million hosts in the edu domain (as of early 2001), subcategorization is necessary. The next rightmost part is sometimes called the second-level domain and specifies which educational institution within the edu domain this host belongs to—in this case, washington corresponds to the University of Washington (UW). Every college and university with computers on the Internet has a special name that identifies its group of hosts within the edu domain (e.g., umich for the University of Michigan, ucla for the University of California at Los Angeles).

A typical educational institution has so many hosts that yet another level of subcategorization is often used, usually based on academic department. The next part of the hostname, cs, indicates that the host is in the Computer Science department. Other departments at UW and their corresponding third-level domain names are Music (music), Physics (phys), Astronomy (astro) and School of Nursing (son).

Finally, the www part of the hostname specifies a particular computer within the domain cs.washington.edu. In this case, www is the name of the computer that serves web pages about the Department of Computer Science and Engineering at UW. It is typical for computers that serve web pages to be named www within their domains. We will learn more about what a web server does later in this lab.

All of these levels of subcategorization should make it easier to remember www.cs.washington.edu than to remember the corresponding IP address, 34.215.139.216. We will start with a few small experiments with hostnames and IP addresses, to learn more about how they correspond to each other.

DNS to the rescue! DNS makes life on the Internet more than just convenient. Because DNS translates hostnames to IP addresses, the IP address corresponding to a particular hostname can be changed and updated in DNS, allowing users to continue accessing the host by hostname without even being aware of the address change. In late July 2001, network security experts did exactly this to head off a massive, carefully coordinated virus attack on the White House web site by a virus called “Code Red.” A virus is a program that secretly copies itself onto a computer (usually via files transferred over a network or floppy disk) and performs unintended, often malicious, actions. Code Red was designed to rapidly spread across the Internet and wait until 5:00 pm Pacific on 19 July, at which time, every infected host (est. over 225,000 worldwide) would deluge the web server at IP address 198.137.240.91 (www.whitehouse.gov's IP address at the time) with data, effectively preventing anyone else from accessing it (a “denial of service” attack). White House network administrators acted fast, though, and before the coordinated attack began, they switched the web server's address to 198.137.240.92, outsmarting the virus by “moving the target.”

Open a web page by hostname. Start a web browser and open the URL http://www.imageafter.com (copy and paste the url).
Find out the IP address corresponding to a hostname. Start a new browser window and visit the page http://sandbox.mc.edu/lookup.php. Use this form to look up the IP address of www.imageafter.co and note it down somewhere.

What do you expect will happen if you point your browser at http://x.x.x.x, replacing x.x.x.x with the IP address you found above?
Check your guess by opening a third browser tab or window and go to the address URL for www.imageafter.com' by typing the number in place of the name. Does it look the same?

Normally, when you are using the Internet, you only have to remember hostnames and can forget the IP addresses they correspond to. It is still important, however, to understand that IP addresses are being used “behind the scenes” to interpret DNS-related error messages you might encounter as you use network software.

The above exercise is fading away. Using the IP address instead of the hostname works for imageafter (or it did when I was writing this). It used to work on pretty much any web site, but that is one of the many things changing online. The browser still uses DNS to convert the name to an address and contacts the server, but some newer arrangements keep the numeric URL from working. There are two main reasons:

Smaller web sites often share the same physical server, so each host name name converts to the same IP address. If you give the browser only the address, it can contact the server just fine, but then it can't tell it which web site it came for. Since the server won't know what page to send, it reports an error to the browser.
Most major web sites these days use secure connections. These used to be limited to sites that accept passwords or payment credentials, requiring extra security. Though your humble author can see very little point in it, the trend today is to secure all connections. Among other things, a secure web site must prove to the browser that it really has the name requested. But when no name is given, no such check can succeed, so no secure connection can be made.

Part 2 Servers and Clients on the Web

You have probably already heard the terms “web server” and “e-mail client,” but you might not realize that “server” and “client” are general terms describing roles that computers can play on a network, depending on what software they are running. In general, a server is a computer that provides some kind of data (e.g., web pages, database entries) or service (e.g., e-mail, printing), and a client is a computer that requests and receives the data or service. To illustrate this difference, we will begin by discussing an example with web pages.

When you view a web page on your computer, your computer is acting as a client to a web server somewhere on the Internet, the computer where the web pages are stored. Each time you click on a link, your web browser sends a web page request through the network to the web server, and the server responds by sending a copy of the requested page back to your computer, where the browser displays it for you.

Where to find a web resource.At the heart of each of these requests is a URL (short for Uniform Resource Locator), which is a standard way of specifying a file or resource on a particular computer on the network. Although URLs can be used to specify many kinds of network requests, since they are most commonly used for web pages, URLs are also called “web addresses” or “links.” URLs seem to appear everywhere now, from advertisements to local television news broadcasts and even boxes of breakfast cereal, as companies and other organizations set up web sites to accompany traditional media materials.

Every URL includes three important pieces of information: what kind of request it is (usually for a web page), the hostname of the server the request is going to, and the location and name of the file being requested. Suppose you are trying to view the web page at this URL on your computer:

http://www.webopedia.com/TERM/S/server.html

The first part of the URL is before the :// and, in this case, http indicates that this is a request for a web page. HTTP stands for Hypertext Transfer Protocol, where “hypertext” is a technical term for text that includes links to other documents, and HTTP is the standard method of transferring web files through the Internet.

One useful way of thinking about the last two parts of the URL is to interpret them together as a pathname (full file name) for a web page file. One important difference between URLs and pathnames is that a URL must specify which computer the file is on, something which is just assumed in the case of a pathname. The hostname is specified between the :// and the next /, and the remainder of the URL is just like a pathname—it specifies the location of the file on the given host, with slashes separating folder names. In some cases, the filename at the end of the URL can be omitted (e.g., http://nature.org), and the web server assumes that a file with a standard name like index.html or home.html is being requested.

To view the page at the URL given above, your web browser sends a web page (HTTP) request to the host www.webopedia.com. If this computer is properly set up as a web server, it is running software that listens on the network for these requests and will respond by sending back the appropriate web page—in this case, server.html in the folder TERM/S/.

A single computer can act as more than one kind of server by running more than one kind of server software at once. In fact, it is common for most hosts to be playing the role of at least a few different servers, e.g., web, mail, printing, and file storage. In the next part of the lab, we will see how another kind of server can be used to copy files between computers over the network.

Is the server the computer or the software? The term “web server” can be confusing, because it is often used in two different ways. The term commonly refers to a computer which is running software that enables it to send web pages on request, as in, “My home page is on the web server students.washington.edu.” However, some people also use the term “web server” to refer to this software, rather than the computer, as in, “If you’re running a Microsoft web server on your computer, you should regularly check for security problems.” Context usually disambiguates, but not always. In this lab, we use the term in the sense of a computer or a role that it plays, rather than the software, which we call “server software.”

Part 3 Copying Files Across the Network with FTP

FTP (short for File Transfer Protocol) is a standard method of sending files between computers through a network. The primary difference between FTP and HTTP is that using FTP requires that you identify yourself with a user name (also known as a login) and a password. (Imagine if HTTP were like this. You would have to type in a user name and password every time you clicked a link!) In this part of the lab, you will use FTP to put your first file on the web, i.e., use FTP to make a file publicly available via HTTP.

An FTP server is a lot like a bank for files, i.e., a computer on the network with lots of hard disk space that offers individual users access to private storage. Just like with a hard disk on your own computer, you can copy, delete and rename files on an FTP server, as well as maintain folders to keep your files organized. You use FTP client software to connect with an FTP server, just as you use a web browser (web client software) to connect with an HTTP server (web server).

We learned that web pages are stored and delivered from web servers, but how do web pages get to a web server in the first place? Sometimes, web authors create and edit them on the web server directly, but this is not the usual case. Typically, a web author edits a page on their own computer, makes sure the page looks right, then transfers a copy to the web server. As soon as a file is transferred to a web server, it generally becomes publicly available, so working on their own computer helps ensure that there are no mistakes in the pages that actually go on the web, or “go live,” as some people say, borrowing the expression from broadcast media. In this part of the lab, you will follow the same process to put your first file on the web. Writing even a simple web page is worth an entire lab in itself (the next one, in fact), so you will start by posting a plain text file in this lab.

Create a text file to be posted on the web. Use Notepad to create a text file with whatever content you wish to post to the web. You might consider writing a list of your favorite bands or a short autobiography. Save your file wherever you like, but make sure you can find it again; the desktop is a good place. Since your file is a text file, it will have the extension .txt.

You will upload this file to your web page space on the FTP server on sandbox.mc.edu Instructions follow.

To copy your file to the web server, we will use an FTP client. There are many possibilities, but we'll use a program called WinSCP which is installed in the labs. Here's how you can use it to send your file to the web server at sandbox.mc.edu:
Open WinSCP from the start menu. If it's not immediately visible, click on the start menu and find it, or do a search.
When WinSCP runs, you will get the form pictured to the right (without the red numbers). Set each selection as shown. For #1, select the FTP transfer protocol. The drop-down for #2 will not appear until you select FTP in the first one. Choose the option shown: TLS Explicit encryption. For #3, sandbox.mc.edu is the name of the server where your web space will be. For #4, enter your MC login name, the same you use to read mail or use other campus on-line services. (Don't type “username,” and don't include the @mc.edu.) The password is also your regular MC password. You may enter it here, or the system will ask for it later.
Press the Login button. You may get a warning window indicating that the server's certificate is unknown. Click Yes to continue connecting. (Though if you ever see a notice like this at a commercial web site, it might be something to worry about.)
You should now see a window looking something like the one below. The left panel shows files on your computer's hard drive, and the right panel shows your folder on the server (which won't contain much initially).
You can now drag files from anywhere on your computer to your folder on the server (or back the other way). Drag your text file from step one to the server folder (right frame of the WinSCP window).
Now go the url http://sandbox.mc.edu/users. You will see a list of login names. Click on yours. You should then see the file you uploaded. Click on it.
Notice the URL in the browser stripe. This is the URL for your file, and can be used to view your file from any web browser anywhere in the world.

If you would like to use a different FTP client program, you might try FileZilla at http://filezilla-project.org/, which includes a Mac version. This program is also installed in the lab.
Troubleshooting: If you are not able to access your text file using your web browser, do not panic. Start by identifying what kind of problem the server is having. Check the title bar of your browser window for an error code number and a brief description of the error. Here are some tips for dealing with the more common errors:
404 Not Found Your URL specifies a file that does not exist on the server. Double-check all parts of the URL, especially the hostname, your user name, and the filename itself.

Permission Denied The server has located the file, so your URL is correct, but the files properties on the server are set such that it is not publicly readable. This shouldn't happen in our setup. (But you never know.)

Cannot Find Server or DNS Error The hostname part of the URL is probably mistyped. Recall that DNS is the system by which hostnames are converted into IP addresses. If you request a host whose name is not registered with DNS, you will get this error.

Part 4 Uploading An Image

Don't miss these last two steps!

Use an image tool to create a simple image. MS paint will work fine. Save it with a file type of either GIF, PNG or JPG. (I believe the current version of paint defaults to PNG, but check the type in the save-as menu.)
Copy this file to the server also. You should be able to see your picture in the browser. Find it the same way you found your text file.