CSc 423 Assignment 1

HEAD of the Line

Assigned
Due

Jan 27
60 pts
Feb 11
Use the Cleansocks interface to create a simple web client that takes a single URL on the command line. It should report the server's IP address, the response code, server type, value of the location header (if present), and and any cookies set by the server. If the server forwards the request, the client should follow. You might want to start with the posted URL downloader example. The output looks like this:
bennet@desktop$ headinfo http://sandbox.mc.edu/~bennet/cs423v2/syl.html Server at 167.160.210.32 responds 200 OK Server type: type Apache/2.4.62 (Fedora Linux) OpenSSL/3.2.2 mod_wsgi/5.0.0 Python/3.12. No cookies were set.
bennet@desktop$ headinfo http://www.google.com Server at 142.251.116.99 responds 200 OK Server type: type gws. Cookies: AEC: AZ6Zc-XMPARk9m2Oi9MFVvy8JdNHL7bJgv56vXM7o47_7mboryybzLyxCA NID: 521=Al9zIrHmK69jvULF7gesDbvu50_9OCtYBUMUhU2KEFS73TohyyRluGPPy9yHr0ZaZ17QbwIknPZvtE5ksKvugBxY3zdT3ZYY7GHmKMhlFP7vxFksR7vS940DDyvCJkAnmY_onzA9bIpGDKmeyhaGR4jRujWd-OMJKasqi4b8gdckV85xXu8tglsU2AY0-ITiv9m_GDcjpibZww
bennet@desktop$ headinfo http://www.parliament.uk/visiting/visiting-and-tours/tours-of-parliament/guided-tours-of-parliament/ Server at 104.17.177.119 responds 403 Forbidden Server type: type cloudflare. Cookies: __cf_bm: og7HHo4mjKnxGU5bo4Go3DVI_6YFpHYoDZqUFSC1HV4-1737864139-1.0.1.1-yHKSJsfBs3l5Kx3JMnwBY_G8xxEr5suRkQlznsVrITOU459l4VtFrSL40zvxqQ5__vxiXloK_NH54FFl8Cs59w

Your application must extract the host name and path from the URL and send an HTTP HEAD request to the indicated machine. Parse the response line and headers to get the information you need to report. The relevant header names are Server, Location and Set-Cookie. The Server will appear once or not at all. If it does not appear, just say that the server type is “Unknown”. The Location header may appear once, but usually not at all. If not present, simply don't mention it in your output. A server may set no cookies, or it may set more than one, so the Set-Cookie header may appear any number of times. If it does not appear, state that no cookies were set, otherwise list them all. You should list the name and value for each one (see below).

If some networking error prevents the reception of any response from the server, or the server's response cannot be parsed as an HTTP response, print an appropriate error message. Otherwise, give the numeric response code and message from the response (even if it is an error), then print the type of server, and list any cookies set by the server. Network errors are thrown as exceptions by cleansocks, so you will need to catch them and print the value of the exception's .what() method.

bennet@desktop$ headinfo http://www.forgetit.calm Error: [IPaddress::lookup(www.forgetit.calm)] Name or service not known

You need only consider very simple URLs. Accept only http or https URLs, and don't look for port number or passwords. If the URL is not simple or can't be parsed, just report an error and exit.

The value of the SetCookie header is a string which gives the name and value of the cookie. The form is something like this:
SetCookie: name=value; other stuff
Where the ; other stuff may or may not be present. (The standard calls this part “unparsed attributes,” even though the client must parse it. We'll discard it.) If you find a ; in the cookie string, discard the (first) ; and everything after it. If you don't find an =, then the cookie is invalid, and you should discard the whole thing. Otherwise, the name of the cookie is the portion of the string up to the first =, and the value is the portion after the first =. “First” is important here, because the value may contain additional equal signs. (Google seems to love these.) The standard says that either the name or value is allowed to be empty, but I don't know that I've seen this actually happen.

If the response code is in the 300s, and the Location header is set, the server is directing the client to another location. In this case, repeat the operation using the Location URL. Keep following forwards, but limit to five fetches. Loops are an error, but also a possibility. Looks like this:

bennet@desktop$ headinfo http://sandbox.mc.edu/~bennet Server at 167.160.210.32 responds 301 Moved Permanently Server type: type Apache/2.4.62 (Fedora Linux) OpenSSL/3.2.2 mod_wsgi/5.0.0 Python/3.12. Location: http://sandbox.mc.edu/~bennet/. No cookies were set. Following to http://sandbox.mc.edu/~bennet/ Server at 167.160.210.32 responds 200 OK Server type: type Apache/2.4.62 (Fedora Linux) OpenSSL/3.2.2 mod_wsgi/5.0.0 Python/3.12. No cookies were set.
bennet@desktop$ headinfo http://news.google.com Server at 142.250.113.100 responds 301 Moved Permanently Server type: type ESF. Location: https://news.google.com/. Cookies: NID: 521=AQnANbf6C58EMIaGN_SyJubHPvWiVMsJ-iyuJ70hiEuYv9TflR9k5KYu6HvELqY_2PMyiidRWCBtM0dG-2mNKPoLIezQqRW6aN9-P8g66YXQV8HSNbN4yTcaOIDEyyoNDoXa8_-t90e8rhcbSutOM-RBlBbhY_iqxXpWOBmK7pQ0Pt48j-3cYLTzeXi7oQsG Following to https://news.google.com/ Server at 142.250.113.138 responds 302 Found Server type: type ESF. Location: https://news.google.com/home?hl=en-US&gl=US&ceid=US:en. Cookies: GN_PREF: W251bGwsIkNBSVNDd2laNnRhOEJoRFFfb2xFIl0_ NID: 521=aCu2vM01ZCxnnsHhI_ySzYyswTEoBJFPMqZ8U3t-fZuE-OxHt19jjXQbQTTSQkbxVdJyU-Zy2igQRuguPEjBk5Am0PZhcO7T8u_gRcGQ0Bs8A9Pt956MbelRqZzsVlSYVOLs0ZScZdnCbyzdveIfI0xg9IYli2und2ki2qUfmBrMIotdMjjVbG-RLMIA7ppJEg Following to https://news.google.com/home?hl=en-US&gl=US&ceid=US:en Server at 142.250.113.101 responds 200 OK Server type: type ESF. Cookies: GN_PREF: W251bGwsIkNBSVNEQWlaNnRhOEJoQ2d5N1dlQVEiXQ__ NID: 521=00IPu9HXOqV3eO4fk8oPlv7wavE1FVW8jXRs4nUKF78H5Jipstgf4Tlrzof6_Hf9yn-DftyMQbmuZ8Ov19u9WECYAIMz_NrTNzeyXx3OMyV8IMJUsa9HTj4tapjcMpBtWcgTFef5oDbGGG-JdXuDQ2OtEHJDo_VCN2cGH26xxVq0gNNnsB5HLcP_61HIwnHg

Submission

When your program is working, nicely commented and properly indented, submit it using the form here.