In this project, you
will create a class URL to create objects representing Internet URLs.
A URL object will be initialized with a URL string, which the
object will parse to extract its parts. Our URLs will have four:
A protocol, a host name, a port number, and a path.
This is a simplification,
as full URLs may have several other parts, and we
are limiting the protocols to
http and
https.
For instance, this code
URL url("http://www.somesite.com/this/place.html");
cout << url.to_string() << endl;
cout << url.get_host() << " " << url.get_path() << endl;
url.set_proto("https");
cout << url.to_string() << endl;
will create the object
url containing the given location. It then
prints it out again using the
to_string method, then
prints separately the host,
www.somesite.com, and the
path
/this/place.html. Finally, it changes the protocol to
https, and prints
https://www.somesite.com/this/place.html.
Once the url is created, there are methods for getting and setting
its parts, and for other sorts of changes.
Your class should implement the interface following, and satisfy the
other requirements in this section. The assignment is to implement
the class, not to build code that uses it. There is a
test driver attached to
exercise your class.
URL u(urlstring)
Your URL class should have a single constructor which takes a
string representing a URL. It initializes the
object u with the protocol, host name, port number and path given
therein. Details of the argument string format are given below.
If its format is not correct, throw the exception
std::invalid_argument.
u.get_proto()
u.get_host()
u.get_port()
u.get_path()
For any URL object
u, there is a getter
for each of the parts. Each method returns a string, except for
get_port(), which returns an integer.
Each of these methods
should be declared const, meaning it does not change u.
For instance,
get_proto() should be declared something like this:
string get_proto() const { ... }
u.set_proto(protostring)
Set the protocol in URL object u to protostring.
The argument is type string, which must be either
"http" or "https". If you send any other string, the
method should throw std::invalid_argument.
u.set_host(hostname)
Set the host name to the given string.
u.set_port(portno)
Set the port number to the given integer portno. The value of the port
number must be in the range 1 to 65535, or the program
should throw the std::invalid_argument exception.
u.set_path(path)
Set the path. Argument is a string. The path should start with a slash;
go ahead and add one if there is none. If a non-empty path ends with
a slash, you may discard that.
u.to_string()
Return a string representation of the URL.
This will generally look like the constructor argument, after any changes
made by the setters or other manipulating methods.
The port number should appear only when it differs from the default
port corresponding to the protocol.
This method should also be declared const.
u.up()
Remove the last component of the path, if there is one. If the path is
empty, does nothing.
u.up(n)
Runs the first up n times. If n is negative, does nothing.
u.rel(newurl)
This takes a string
newurl and applies it to
u. This
is similar to what happens when you click on a link in a browser.
The
newurl is a link you click on, and
u is the location
of the page you are leaving. This method changes
u to the new
location. There are three cases.
URL Syntax
The constructor accepts a string representing a URL having the
following syntax:
〈protocol〉://〈hostname〉[:〈port〉][〈path〉]
where
〈protocol〉 | The protocol name. For this project, we consider only
http or https.
|
〈hostname〉 | The server host name.
|
〈port〉 | The port number, which
is optional and usually omitted
along with the colon marker. (The brackets in the syntax indicate that
the colon and port number are optional, they are not part of the string.)
The port is a number used to identify the
service. If the port is omitted, the default depends on the protocol.
For http, it is 80; for For https, 443. If the port is
given, it must be in the range 1 through 65535.
|
〈path〉 | The url path. This may also be omitted, in which case
a single slash
is assumed. If the path
is not empty, it must start with a slash. If it is
non-empty and ends with a slash, you may ignore the trailing slash.
|
When building the return value of to_string, construct a
string of the same format. Omit the port number
unless it differs from the default value for the protocol.
The path should always start with a slash; if it is empty, represent it as
a single slash.
About URL Paths
The path of the URL is series of components,
separated by slashes in the input. When processing a string
representing a URL,
if it
has empty components, they are discarded. (Another way to say that
is that a group of slashes together is equivalent to a single slash.)
A component of a single dot (period) is also discarded.
A component of two dots is special. It is not only discarded
itself, but it removes the component before it (if there is one).
The following all denote the same URL:
http://host.com/this/path/here
http://host.com/this//path/./here
http://host.com/../this/other/../path/here/also/..
http://host.com/this//path/and/././the/stuff/../../../here
You must resolve empty, . and .. components on input. Do not generate
such components on output.
The strangeness of dots in URLs is a design feature from Unix file
system paths which was adopted into the URL format.
So that whom you blame.
Additional Requirements and Details
- You should place your class in a file with a .h extension.
- Your .h should use the #ifdef/#define/#endif
construct to prevent multiple definition errors.
- You should create a .cpp and place at least one method
body in that file.
- As noted in the interface, the getters and to_string should be
declared as const methods to declare that they do not change
the object they are run on.
- Your constructor should throw an exception when it is unable to parse
the string. This mainly means it cannot find a valid protocol name or other
markers, or if
the port number is out of range.
It is not required to make a careful check of the
syntax to see if it conforms to the official standards.
That's just more detail than we need to get into.
So your program will be able to construct objects with
technically incorrect URLs. We don't sweat the details.
Some Advice
You might want to start by looking over the
test driver. It may
clarify how some of the methods operate. You might
want to start by creating a URL class with stubs so you can
get a running program. Stubs are just versions of the methods which
have the correct signature, but have empty bodies or just return
zero or the empty string.
You will need to create four private data fields to store the
four parts (protocol, host name, port and path). It's quite
possible to keep the path as a string and scan for slashes whenever it
is needed to exact an individual component. I found it easier to store
the path as a std::list of strings, where each string is a component.
I split the path up when it is created, then don't have to worry about
scanning for slashes again. I can also resolve dot
components and discard empty ones when I make the list, and not
worry about them again.
Your methods will never need to print anything. They provide
service to some client, which will do any required printing. You may
well find it useful use temporary debugging prints, but if you
you think you need prints to implement any of the methods, you've missed
something.
I wrote several private helper methods. One thing you should consider is
creating a private method that does all the work of the constructor. The
actual constructor body then becomes a single-statement call to this private
method. The reason for this is the first case of the rel method.
It is a work-around for the fact that C++ will not allow you call
the constructor as a regular method.
Since I used a list of strings to store my path, I found it useful to have
a method that splits a string by its slashes and puts them into a list. I use
this in the constructor implementation, set_path, and rel,
and it's very nice not to have to write it three times.
You may find other code repeated between the constructor and setters.
Some small private functions can make this easier, and improve the quality
of the code.
Feel free to combine the two versions of up using a default
parameter value.
What We Are Ignoring
This assignment simplifies URLs in a number of ways. For instance, most
any protocol can be used, rather than the two allowed. URLs also allow for
embedded credentials, form data, and certainly other
things. We're not doing that, either.
URLs
use a fairly
limited character set, with a coding scheme to represent
characters not otherwise
allowed.
Host names are have even more limited character sets (unless
internationalized names are used, with its own separate coding scheme).
We are just sort of assuming that the caller is not violating
any character set rules.
A more sophisticated URL class would check these things and/or
perform appropriate recoding.
Another detail is that we discard the slash at
the end of a non-empty path. This is usually safe, but
ignores some subtleties.
Submission
When your program works and is properly formatted and commented,
submit it online
here.