- Accessing Web Pages
- Initial model
- Client-server
- Client sends a request for a particular URL.
- The URL maps to a particular file on the server.
- Server responds, with the file contents or error indication.
- Pages are stored on the server machine. Server copies them out.
- Interaction is clicking on links.
- Common Gateway Interface (CGI). (Not to be confused with the
“Computer-Generated Imagery” in your favorite movie.)
- The earliest form of dynamic web page: Can respond to input
rather than deliver a pre-made file.
- The file the URL refers to is a program instead of text.
- The program is run, and the output is sent to the browser.
- Usually, the output is HTML.
- Can use any programming language supported by the server system.
- Note that, for each request, an external program must be run.
This is a considerable overhead.
- Evolution.
- When PHP was first created, the PHP interpreter was run as a new
process for each transaction.
- Apache server moved the CGI interpreter inside as part of its
process so it need not be started every time.
- The current version is to run a separate server to which the
web server sends the PHP code.
- The interface between the HTTP and PHP servers is an IPC
on the server using a specialized protocol.
- Other dynamic support
- There are servers similar to the PHP one for Python, which
is used by Flask and others.
- Other languages implement HTTP over an IPC and implement
the standard HTTP proxy standard.
- CGI has been replaced by fast-CGI, where the dynamic program can
be in any language, but must interact with the HTTP server
- HTTP Protocol
- Request and reponses each contain a header and a body.
- Request header gives the URL path requestes.
- Response header gives a three-digit code indicating success
(200 OK; 404 Not found,
etc.)
- Response body is the page requested, or may be empty on errors.
- Request body is usually empty, but contains the data on a
form post.
- Headers are key/value pairs.
There
are many of these, and implementers are free to invent
new ones.
- Transmitting other data.
- Other data must be transmitted besides the web page.
- Data the user enters into a form must be sent to the server.
- It can be sent by the GET method; simply add to the url:
https://www.google.com/search?q=wombats.
- It can be sent by the POST method, in which the browser transmits
the user data along with the request for a page that will use it.
- As we see later, an HTML form may contain
hidden fields which are sent to the server along with user entries.
- A dynamically-generated form may contain hidden values which are
returned then to the server.
- Cookies are key-value pairs which the server will ask the browser
to store, and the browser will return them with subsequent requests.
- Sessions and Login.
- HTTP is a stateless protocol: No need to log in.
- Original HTTP authentication, rarely used now.
- No need to create a page which requests credentials; browser
handles the credential request and transmission.
- Server does not need to store logins; creds are checked on
each request.
- Contemporary practice is to implement with a web page and
server code. No special part of HTTP needed.
- The user logs in using an ordinary form, and the server
creates a record.
- The identifier is usually a large random number, so it
is hard to guess.
- Knowing the random number is treated as having logged in.