------------------------------------------------------------------------------
MC logo
CSc 220 Assignment 4
[^] CSc 220 Programming Assignments
------------------------------------------------------------------------------

55 pts

A Tangled Web

Due: Nov. 27

Note: The deadline has been advanced. There is an 8-pt bonus for any fully working submission by the original date, Nov. 22.
In this assignment, you are to create a facility for creating URLs. You must create a create two files: a header (.h) file which defines an interface, and an implementation file (.cpp) which implements the interface. Then a client program can use the your facility by calling the interface functions. The interface is made of plain function calls, even though it's in C++.

For our purposes, a URL is a collection of several parts (which are strings) that can be represented as a larger string with proper delimiters and encoding. The parts are:

  1. The protocol. For this program, the protocol can only be http or ftp.
  2. The user name, which is usually the empty string.
  3. The password, which is also usually empty.
  4. The host name.
  5. The port, which is usually empty.
  6. The path part.
The interface has methods for setting each of these parts, for extracting the URL as a string, and for making some other changes. Implement the following interface:
typedef url_t

Provide some definition for the url_t type name. You should define url_t to be a struct.

void init_url(url_t &u, string host, string path = "")

This initializes the URL variable u, much like a constructor for a class. It sets the protocol to HTTP, the host and path as indicated, and the other parts empty. The path defaults to empty. The host name may contain only certain characters (see the form of a URL below). If it contains anything else, write an error message and exit.

void ftp_url(url_t &u)
void http_url(url_t &u)

This sets the protocol as indicated.

void url_creds(url_t &u, string userid, string password)

This sets the userid and password parts of the URL. Either one may be empty.

void url_port(url_t &u unsigned int port)

Set the port number of the URL.

void url_up(url_t &u)

This removes the last component of the path, if there is one. If not, the URL is unchanged.

void url_extend(url_t &u, string morepath)

This adds to the existing path. If morepath does not start with a slash, add one to the front, then append morepath to the existing path.

void move_url(url_t &u string dest)

This moves the URL to a new place nearby given by dest. If dest begins with a /, it represents a new path, and should replace the existing path. If it does not, it replaces the last component. In that case, it behaves the same as a url_up followed by a url_extend.

string the_url(const url_t &u)

This returns a string representing the URL. See below for exactly what this should look like.

Dots and Slashes

For our assignment, the path part should always start with a slash, and must not end with one unless the entire path is just a slash. Also, the path must not contain two slashes in a row. If the user gives you a path that does not start with slash, or any operation produces such a path, add one. If it ends with a slash that shouldn't be there, remove it. Likewise, if you get a path which has multiple slashes together, change each run of multiple slashes to a single slash.

The path component consisting of a single dot (.) is special, as is the double-dot (..). Any component consisting of a single dot should simply be removed. The double-dot component means "up," and essentially destroys the component in front of it, if there is one. (If there is not, the component should just be removed.) As with slashes, whenever any operation produces a new path, make sure to resolve the dot components. For instance, the path

/fred/./barney/joe/../alex/bill/./frank/../../sally
should convert to
/fred/barney/alex/sally

Encoding Characters

Characters which are used part of the URL syntax, or which are used specially by other bits of the network, may not appear directly in the user name, password, or path parts of a URL. These must be specially encoded. For the purposes of this assignment, you must encode any character other than letters, numbers, digits, or any of $-_.+!*'(),/ when they appear in the user name, password, or path. The slash (/) must also be encoded in the user name or password.

These other characters are encoded as a percent sign followed by two hexadecimal digits (using digits and upper-case letters) giving the ASCII code for the character. For instance,

http://www.mc.edu/~some stuff/after/&stuff+.html
becomes
http://www.mc.edu/%7Esome%20stuff/after/%26stuff+.html

URL Format

The URL has the following format. Your the_url method should produce a result by these rules. (Note that there are no spaces in a URL, whether you think you see one here or not.)
protocol://username:password@hostname:port/path
Where: Note that you will need a test driver to test your program.

Some Hints

One thing you will find useful is the ability to convert numeric data to string data. C++ provides this through a variation on streams which allow you to "print" to a string, rather than a real output stream. These are (perhaps obviously) called string streams, and there are some examples here and here. (If you want to work in plain C, here is that one.) You may also want to review the histogram example which uses I/O manipulators to change the appearance of output. They work just fine with string streams.

Submission

When your program works, submit over the web here.
<<CSc 220 Assignment 3 CSc 220 Assignment 5>>