55 pts |
A Tangled Web |
Due: Nov. 27 |
For our purposes, a URL is a collection of several parts (which are strings) that can be represented as a larger string with proper delimiters and encoding. The parts are:
Provide some definition for the url_t type name. You should define url_t to be a struct.
This initializes the URL variable u, much like a constructor for a class. It sets the protocol to HTTP, the host and path as indicated, and the other parts empty. The path defaults to empty. The host name may contain only certain characters (see the form of a URL below). If it contains anything else, write an error message and exit.
This sets the protocol as indicated.
This sets the userid and password parts of the URL. Either one may be empty.
Set the port number of the URL.
This removes the last component of the path, if there is one. If not, the URL is unchanged.
This adds to the existing path. If morepath does not start with a slash, add one to the front, then append morepath to the existing path.
This moves the URL to a new place nearby given by dest. If dest begins with a /, it represents a new path, and should replace the existing path. If it does not, it replaces the last component. In that case, it behaves the same as a url_up followed by a url_extend.
This returns a string representing the URL. See below for exactly what this should look like.
The path component consisting of a single dot (.) is special, as is the double-dot (..). Any component consisting of a single dot should simply be removed. The double-dot component means "up," and essentially destroys the component in front of it, if there is one. (If there is not, the component should just be removed.) As with slashes, whenever any operation produces a new path, make sure to resolve the dot components. For instance, the path
These other characters are encoded as a percent sign followed by two hexadecimal digits (using digits and upper-case letters) giving the ASCII code for the character. For instance,
| CSc 220 Assignment 5 |