Traditional Text-Based Application Protocols, Sec 4.1–4.16
  1. Application-Layer Protocols.
    1. Must define:
      1. The syntax and semantics of exchanged messages.
      2. Whether the client or server starts first.
      3. How to handle errors.
      4. How to know when you're done.
    2. Standard or private.
      1. Standard protocols are published by some authoritative organizations, such as the Internet Engineering Task Force (IETF) or the World-Wide Web Consortium (W3C).
      2. Anyone may make a private protocol and use it themselves or within an organization.
  2. Text-Based Protocols
    1. “Traditional” Internet application protocols use messages which are lines of plain text.
    2. Each line is terminated with the two-character sequence, \r\n.
    3. Built atop TCP streams.
    4. The line convention essentially breaks the stream into messages.
  3. Hypertext Transport Protocol (HTTP).
    1. RFC 2616, but there are several other relevant ones.
    2. Simple File Transfer Protocol.
    3. Traditional text protocol.
    4. Usually transfers HTML files, but may transport any type.
    5. Client-server protocol.
    6. Request
      1. Parts
        1. Request line starting with the request type,
        2. Zero or more headers, form name: value
        3. A blank line.
        4. A body, possibly empty. (Often empty on requests.)
        GET /index.html HTTP/1.0\r\n User-Agent: FredView/0.03\r\n Host: sandbox.mc.edu\r\n Accept: */*\r\n Connection: Keep-Alive\r\n \r\n
    7. Response
      1. Parts
        1. Response line including success code.
        2. Zero or more headers, form name: value
        3. A blank line.
        4. A body, possibly empty.
        HTTP/1.1 200 OK\r\n Date: Tue, 22 Jan 2019 17:57:45 GMT\r\n Server: Apache/2.4.34 (Fedora)\r\n Last-Modified: Fri, 05 Oct 2012 22:55:37 GMT\r\n ETag: "81b-4cb57c5327434"\r\n Accept-Ranges: bytes\r\n Content-Length: 2075\r\n Keep-Alive: timeout=5, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html; charset=UTF-8\r\n \r\n page contents
    8. The standard describes a few header names.
      1. These soled be used as described.
      2. Other headers can can be created at will.
      3. Clients ignore headers they don't understand.
    9. Request types.
        GETRequest a document. The body of the response will contain the document.
        HEADRequest the header for the document. Like a GET, but but the response body will be empty. The main use is to acquire the Last-Modified header to see if a local copy of the document should be refreshed.
        POSTSend data to the server. The request body will contain the data. This is usually used to send form data.
        PUTSend data to the server, and store it in the indicated file. It is not often used.
      1. A server may create new types, but not redefine these.
      2. A server is required to implement only GET and HEAD.
    10. Response codes
      1. Three-digit codes, first giving the category.
        1xxInformation; continuing.
        2xxSuccess
        3xxRedirection.
        4xxClient error
        5xxServer error
      2. For instance,
        200Ok
        206Partial Content
        301Moved Permanently
        400Bad Request
        403Forbidden
        404Not found
        500Internal Server Error
        501Not Implemented.
    11. Caching.
      1. Clients retain pages to avoid fetching them unnecessarily.
      2. Can use HEAD to tell if a page has changed without fetching it.
      3. Newer technique adds this header to a GET to save a round trip:
        If-Modified-Since: Wed, 21 Oct 2015 07:28:00 GMT
        Server returns 304 if document has not changed.
      4. Expires response header advises client how long to retain.
        Expires: Wed, 21 Oct 2015 07:28:00 GMT
    12. Versions.
      1. 1.0 Original
      2. 1.1 Multiple requests on one connection, and many other incremental changes.
      3. 2.0 Substantial changes.
        1. Adds a layer which provides multiple streams over one TCP stream.
          Note: This violates the TCP/IP stack design by implementing a transport facility at the application layer.
        2. Requests share headers to reduce redundancy.
        3. Server may send likely-needed documents before being requested.
  4. File Transfer Protocol (FTP) RFC 959
    1. Very old protocol; actually predates the Internet.
    2. Used for general file transfer.
    3. Login sequence:
      Server:220 Welcome to our FTP server\r\n
      Client:USER smith\r\n
      Server:331 Please specify password.\r\n
      Client:PASS Some Password\r\n
      Server:230 Login successful\r\n
    4. Transferring a file requires some setup.
      1. Binary mode means to transfer the file literally. Pretty much the only mode used these days.
      2. Passive mode means the client connects to the server.
      Client:TYPE I\r\n
      Server:200 Switching to binary mode\r\n
      Client:PASV\r\n
      Server:227 Entering Passive Mode (10,27,0,14,75,41)\r\n
    5. The client makes a second connection the server.
      1. The numbers in the 227 give the endpoint to connect to.
      2. IP 10.27.0.14
      3. Port 256×75+41=19241
      4. Connect to 10.27.0.14:19241
      Client:RETR somefile.txt\r\n
      Server:150 Opening BINARY connection for somefile.txt\r\n
    6. The client downloads the file contents on the second connection, the closes.
      Server:226 File send OK\r\n
      Client:QUIT\r\n
      Server:221 Goodbye\r\n
    7. Passive mode.
      1. The PASV request asks the server to listen and the client connects.
      2. In traditional use, the server connects to the client.
      3. The client-server idea was not part of the original Internet design.
    8. Multiple connections.
      1. Very unusual arrangement.
      2. If we sent the file on the control connection, it would be problematic if a file contains FTP commands.
    9. FTP is heck to firewall properly.
  5. Evolution of Email
    1. In the beginning (say the 1980's)
      1. No PCs. Users make text logins to timesharing systems.
      2. Mailboxes are just files (or directories) on these machines. Users read these messages locally.
      3. Mail sent is transmitted to the recipient's machine using the (Simple Mail Transfer Protocol) SMTP.
      4. Machines run SMTP servers which receive email from others and store it in a local mailbox.
      5. SMTP is a peer-to-peer protocol for sharing mail. No notion of clients and servers.
      6. SMTP servers accept mail from any connected host to any local recipient.
      7. SMTP servers may also accept mail in transit to another server. This helps mail transit to machines with poor or intermittent connections.
    2. The PC is invented.
      1. PCs aren't suitable to run SMTP servers.
        1. In the early days, simply not powerful enough.
        2. And people tend to turn them off at night.
        3. Besides, network admins get ticky about outside hosts connecting to PCs.
      2. Mailboxes
        1. Stay on the time-sharing system.
        2. Users use POP or IMAP to access their mailboxes.
          1. POP to just download the mail to the PC.
          2. IMAP to manage the mailbox remotely, but read it on the PC.
        3. Mail sent from the PC goes directly to the recipient by SMTP.
        4. The people quit logging in to the time-sharing system, and we start calling it a server.
    3. Spam is invented.
      1. The ability for any SMTP server to accept inbound mail from anywhere is a boon to spammers.
      2. Organizations designate certain servers as mail exchangers.
      3. Mail exchangers accept mail by SMTP
        1. From within their own organization.
        2. From designated mail exchangers of other organizations.
        3. From nowhere else (send attempts are refused by the SMTP server).
      4. When you send mail, it goes
        1. SMTP to the organization mail exchanger.
        2. SMTP to the recipient's mail exchanger.
        3. POP or IMAP to the recipient's PC.
      5. ISPs may configure blacklists so their mail exchangers won't accept messages from know spammers.
    4. People start reading mail in web browsers (thereby damaging the fabric of the universe).
      1. A web server now takes the role of the PC, and communicates to the mail exchanger by POP or IMAP.
      2. Software on the web server formats the messages into HTML and transmits to the browser via HTTP.
  6. Email Protocols
    1. Simple Mail Transport Protocol (SMTP) (RFC 821, updated as RFC 5321). Here one SMTP is sending an email to another. The sender initiates the connection.
      Rcver:220 mail.fred.com SMTP ready\r\n
      Sender:HELO sender.somewhere.com\r\n
      Rcver:250 OK\r\n
      Sender:MAIL FROM:<fsmith@somewhere.com>\r\n
      Rcver:250 OK\r\n
      Sender:RCPT TO:<jones@fred.com>\r\n
      Rcver:250 OK\r\n
      Sender:RCPT TO:<william@somewhere.com>\r\n
      Rcver:550 No such user here\r\n
      Sender:RCPT TO:<sally@somewhere.com>\r\n
      Rcver:250 OK\r\n
      Sender:DATA\r\n
      Rcver:354 Start mail input; end with <CRLF>.<CRLF>\r\n
      Sender:Date: Thu, 7 Jan 2016 16:18:01 -0600\r\n
      From: "(Fred Smith)" <fsmith@somwhere.com>\r\n
      To: jones@elsewhere.edu\r\n
      Subject: Backup tapes.\r\n
      \r\n
      Do you still have that backup tape from Tuesday?\r\n
      \r\n
      - Fred\r\n
      .\r\n
      Rcver:250 OK\r\n
      Sender:QUIT\r\n
      Rcver:221 Closing\r\n
      1. Message itself is a series of lines, ending with one that is just a period.
      2. Note that the sender and receiver are specified separately, and not just taken from the message headers.
        1. Spammers have long used this for spoofing.
        2. Current practice would be to check a bit better.
      3. No login password in the original password. May be required, but not always practical.
    2. Post Office Protocol (POP) (updated as RFC 1939)
      Server:+OK POP3 server ready\r\n
      Client:USER jones\r\n
      Server:+OK send password\r\n
      Client:PASS the password\r\n
      Server:+OK maildrop locked and ready\r\n
      Client:LIST\r\n
      Server:+OK 2 messages (386 octets)\r\n
      1 186\r\n
      2 200\r\n
      .\r\n
      Client:RETR 1\r\n
      Server:+OK 186 octets\r\n
      Date: Thu, 7 Jan 2016 16:18:01 -0600\r\n
      From: "(Fred Smith)" <fsmith@somwhere.com>\r\n
      To: jones@elsewhere.edu\r\n
      Subject: Backup tapes.\r\n
      \r\n
      Do you still have that backup tape from Tuesday?\r\n
      \r\n
      - Fred\r\n
      .\r\n
      Client:DELE 1\r\n
      Server:+OK message 1 deleted\r\n
      Client:QUIT\r\n
      Server:+OK pop server closing\r\n
    3. Internet Message Access Protocol (IMAP) (updated as RFC 3501). Similar to POP with more commands.
  7. Binary Email Content
    1. Email messages are like HTML messages, a series of headers, blank line, then a plain-text body.
      Date: Thu, 7 Jan 2016 16:18:01 -0600 From: "(Fred Smith)" &lt;fsmith@somwhere.com&gt; To: jones@elsewhere.edu Subject: Backup tapes. Do you still have that backup tape from Tuesday? - Fred
    2. Email protocols assume messages are ASCII text.
      1. RFC 821: all communication is in ASCII, and
      2. messages may not have lines over 1000 characters.
    3. Nowadays, we like to send binary attachments: programs, images, word-processor documents.
    4. A binary file has non-ASCII codes, and need not contain a newline every thousand bytes.
    5. Solution: Code the binary data as text.
    6. Multipurpose Internet Mail Extensions (MIME)
      1. Provides notation for dividing an email message into parts.
      2. Provides encodings for non-ASCII data, primarily base 64 for binary.
      3. MIME message.
    7. Base 64.
      1. Each three bytes is regrouped into four groups of six bits.
      2. A standard table assigns an ASCII character to each group. 52 letters (both cases), 10 digits, + and /
        1. Binary: 11010101 00000110 11010001
        2. Regroup: 110101 010000 011011 010001
        3. Assign: 1 Q b R
  8. Securing Old Protocols
    1. FTP, HTTP and the mail protocols were designed as plain text.
    2. Two ways to retrofit
      1. All TLS, all the time.
        1. After each network connect, add TLS to the channel.
        2. Need a different port number to distinguish plain secure.
        3. Client connects, endpoints complete TLS handshake, then operate exactly as the plain version.
      2. TLS on demand.
        1. Connects in the usual (plain) way, usual port.
        2. Negotiate TLS
          1. One end sends a (plain text) request to start TLS.
          2. The other agrees or refuses. Other details may be negotiated.
          3. If the ends agree, they perform the TLS handshake and proceed with a secure connection.
          4. If desired, one endpoint may refuse sensitive operations (such a login) if not secured.
        3. More flexible. Negotiation usually designed so new clients can just treat old as refusing.
      3. HTTP does it the first way: HTTP and HTTPS, ports 80 and 443.
      4. The others listed here can be done either way.