Hypertext Markup Language
  1. A Web Page Is
    1. A text document.
    2. Contains instructions for formatting.
  2. Where to web pages come from?
    1. Our web pages will come from a text editor. We will create the tags by hand.
    2. Other places to get HTML:
      1. Editors and content generation systems which output HTML.
      2. Export from programs such as word processors.
      3. Web sites for user content.
    3. Why do it by hand?
      1. More control.
      2. More understanding.
      3. Can fix the fancy tools when they break, or go around them to get what you need to.
  3. Tags
    1. Undecorated text is displayed in the browser.
    2. Tags structure the document.
      1. Opening and closing tags: <tagname>Content of the tag.</tagname>
      2. Open and close tags must balance, and may be nested. Think parentheses.
      3. The tags and contents are called an element.
    3. Attributes.
      1. Attributes are key-value pairs added to the (opening) tag to modify its behaviour.
      2. The value can be enclosed in single or double quotes.
      3. Quotes may sometimes be omitted, but it's asking for trouble.
      4. <tagname attrname="value">Content displayed accordinging</tagname>
    4. Block Elements. Control the arrangement of large blocks, such as paragraphs.
      1. Paragrpah: <p>
        1. The paragraph tag has many attributes, for instance, to align the text: <p align="center">, and left and right.
        2. Modern web pages should use CSS instead of these attributes.
      2. Headings: <h1> through <h6>
      3. Lists creation <ol>, <ul>, <li>. (More on lists below.)
    5. Inline Elements. These control what text looks like, but does not change the structure of the document.
      1. The first approach is to specify intent. The browser would decide how to actually display these.
        1. <em> Emphasize.
        2. <strong> Really emphasize!.
        3. <cite> The Title of Some Work.
        4. <samp> Sample program output.
      2. But designers don't like giving up that much control. So from the early days there are also specific formatting tags:
        1. <b> Bold.
        2. <i> Italic.
        3. <u> Underline.
        4. <big> Larger text.
        5. <small> Smaller text.
        6. <strike> Strikeout.
      3. Modern pages should use CSS to control appearance, rather than the tags mentioned here.
  4. Correct page structure:
    <!DOCTYPE html> <html> <head> <meta charset="utf-8"> </head> <body> Document contents. </body> </html>
    1. The <!DOCTYPE>, suprisingly, declares what type of document this is.
    2. The <html>, <head> and <body> tags structure the document.
    3. The head contains information about the document; the document itself is in the body.
    4. The <meta> tag can provide various information about a document. Here, it describes how the characters are encoded.
    5. The <!DOCTYPE> and <meta> tags are two of a very few that have no closing tag.
  5. Entities.
    1. Purposes
      1. To represent characters which would otherwise be markup.
      2. To represent characters which are not on the keyboard.
    2. Format
      1. Start with ampersand, end with semicolon.
      2. Numeric, &#60;
      3. Symbolic, &lt;
    3. Some entities:
      &lt;  ⇒  <
      &gt;  ⇒  >
      &amp;  ⇒  &
      &sum;  ⇒  ∑
      &divide;  ⇒  ÷
      &sube;  ⇒  ⊆
      &cent;  ⇒  ¢
      &pound;  ⇒  £
      &copy;  ⇒  ©
      &lambda;  ⇒  λ
      &Eacute;  ⇒  É
      &Uuml;  ⇒  Ü
      &atilde;  ⇒  ã
    4. Here is a list.
    5. Some Uses
      1. He paid €15 in München.
        He paid &euro;15 in M&uuml;nchen.
      2. A ⊆ { x | x ≥ 0 ∧ x < 2πr }
        A &sube; { x | x &ge; 0 &and; x &le; 2&pi;r }
  6. HTML comment: <!-- Not displayed -->
  7. Practical Stuff
    1. Create a file with a text editor.
    2. View it locally.
    3. Copy to the server. View with the web browser.
    4. Copy changed versions across, press the reload button.
    5. Browsers may behave differently, though any modern browser should show a correct page the same way.
    6. Browsers may be forgiving of errors.
      1. You may have errors without knowing it.
      2. You have have odd behavior because of an error.
      3. Browsers may treat the same error very differently.
      4. Most browsers have developer tools which you might want to explore.
  8. The header section contains metadata.
    1. Data about the document, rather than data which is part of the document.
    2. The browser does not display the header contents in the window.
  9. The title tag.
    1. This gives the title of the document.
    2. Generally displayed in the browser stripe.
    3. May be repeated in an H tag, but they are separate things.
  10. The meta keyword has several forms.
    1. <meta name="author" content="Your Name Here">
    2. <meta name="description" content="What this page is about">
    3. The similar keywords tag was intended for search engines, but they ignore it now because of abuse.
    4. The Mozilla tutorial discusses some newer types created by certain web sites.
  11. We've already seen meta charset
    1. Tells the browser how to interpret the bytes in your page as characters.
    2. The reasons why there is actually more than one choice are largely historic.
      1. Originally, computers pretty much only understood English. (Or may just American.)
      2. Various means were proposed to represent other characters; none was universally adopted.
      3. These conflicted, and browsers had to guess right or the page could look horrible.
      4. HTML finally standardized on the meta charset tag.
    3. The utf-8 encoding is a good compromize that can represent any language, but usually doesn't break under old software that hasn't heard of the rest of the world.
    4. We will be happy with utf-8 in this class, but feel free to experiment with the document type if you like.
  12. Can declare what language the page is written in use HTML tag:
    1. <html lang="en-US"> (or other).
    2. Helps with indexing and screen readers.
  13. The header may contain style, link and/or script tags, relevent to style sheets and Javascript, which we will discuss later.
  14. Lists.
    1. <ul>: Unordered (bullet) list
    2. <ol>: Order (numbered) list.
    3. Each item in a list (either type) is a list item.
    4. Lists may be nested. Default ordered list symbol may change.
  15. Linking
    1. A web page is located at a URL.
      1. For instance, http://sandbox.mc.edu/~bennet/cs302/outl/intro.html
      2. The first part, http, is the protocol. It tells the browser which rules to use to talk to the server. Usually http or https, but there are some others.
      3. The second part, sandbox.mc.edu is the name of the server where the page is. Names usually start with www, but don't have to.
      4. The third part, /~bennet/cs302/outl/intro.html in this case, is called the path. It's the location of the page on the server. It's called a path because it tells what folders you have to go through to get there from the top of the server.
    2. The anchor tag (with the href attribute) creates a clickable link. <a href="url">Highlighted Text</a>
      1. The Highlighted Text is usually shown underlined in blue by default. (But can by changed using CSS.)
      2. Click to go there.
    3. Various forms of URLs used as links.
      1. The destination of a link is a URL, but they may be abbreviated in various ways.
      2. Absolute: Go to another server: https://www.ibm.com/watson/services/natural-language-understanding/
      3. With the same directory: hist.html
      4. Relative to the current directory: ../cs220c/syl.html
      5. Relative to the same server: /index.php
      6. Within the same document: #bottom
      7. Within a different document.: hist.html#theend
    4. Types of URLs.
      1. Absolute: https://www.ibm.com/watson/services/natural-language-understanding/
        1. Start with a protocol and host name.
        2. Provide the complete location.
      2. Relative. Omit certain parts of an absolute URL.
        1. Starts with a slash: URL is a path on the same server: /files/goodfiles/greatfiles/wonderful.html
        2. Start with something else: relative to the starting directory (folder).
          1. Just a file name: look for it in the same folder. name.html
          2. Partial path: start from the same folder.
            1. Page location: http://www.example.com/some/folder/from.html
            2. Link URL: head/down/here.html
            3. Produces http://www.example.com/some/folder/from.htmlhead/down/here.html
        3. The .. component goes up one level by removeing a folder name.
          1. Page location: http://www.example.com/some/other/folder/from.html
          2. Link URL: ../../upfolder/fred.html
          3. Result: http://www.example.com/some/other/folder/from.html../../upfolder/fred.html
          4. .. is the antidirectory
      3. So, if you are looking at a page http://fromsite.xyz/some/location/there, and you must follow a link to dest, figure out the new URL like this:
        1. If dest starts with (or is entirely) http://differentsite.abc, then dest is where you go.
        2. If dest starts with '/', then go to http://fromsite.xyz/dest
        3. Otherwise,
          1. Start with http://fromsite/some/location/dest
          2. Remove any appearance of component/.. until there are no more.
          3. If that leaves any .. parts at the after the host name, just discard them.
      4. Internal links.
        1. A #id at the end of a URL specifies a location within a document.
        2. The location is marked by the id element on any tag:
          <anytag id="id" ...>
        3. If the URL starts with a #, it jumps to a label in the same document.
        4. In the earlier example, this internal link: #bottom moves to a tag <p id="bottom">.
        5. In the original HTML, the location is marked by an href-less a tag: <a name="id"></a>. You may still see that. If you're worried about older browsers, you may even want to throw one in along side your actual target.
      5. The effect of linking depends the document type.
        1. Text or HTML documents are generally loaded in the same window.
        2. Documents associated with an application may start that application. The window you clicked in remains unchnaged.
          1. Word processor documents start the word processor.
          2. Zip files may open an archiver.
          3. Computer source files may open a text editor or programming tool.
        3. Multimedia files may play in the browser, or open an external application, depending on the configuration.
        4. If the browser has no idea how to interpret the file, it will usually offer to save it.
        5. If linking non-HTML content, its a kindness to say so in the link text.
        6. Also, if the target is a large file, it is nice to say so for the benefit of users on a slow or expensive connection.
      6. Good links.
        1. Should give a clear description. Don't put the destination in the text, and then click here in the link.
        2. Should be as short as possible.
        3. Don't repeat the URL.
        4. Use different labels to different places.
        5. Use relative links whenever possible. Can be more efficient, and makes the web site easier to re-arrange.
      Relative Links

    This is the bottom of the document. The markup is <p id="bottom">...</p>. The link above is to #bottom, so it moves here.