Types and Type Systems

I have put some links on sections below to video lectures discussing the relvant sections. The stream may be slow, so downloading might work better for you. Also, Firefox seems to have a bug where the stream won't start unless you put its window into full-screen. Once started, you can return to the smaller picture.

  1. Type. A set of objects (values) and operations.
    Objects in a general sense, need not be instantiations of a class declaration.
  2. Static and Dynamic.
    1. Static types: Variables are declared with a type. Assignment converts a value to a new type. C, C++, Java, Pascal, Ada; most compiled languages.
    2. Dynamic types: Types stay with values, and assignment copies both value and type. Lisp, Smalltalk, perl, Python, PHP, Ruby; most interpreted languages.
    3. Type Propagation
      1. The type of each expression is determined by the types of its sub-expressions.
      2. In a statically-type language, these rules can be evaluated at compile time.
    4. Polymorphism with class inheritance is something of a hybrid.
      1. A Java reference to a base class may refer to a derived-class object. (Likewise C++ pointers and references.)
      2. Assignment copies the type, but only types derived from the base class are allowed.
    5. Static types make execution more efficient
      1. Must perform operations according to types.
      2. Declarations allow the choice one type at translation.
  3. Type Errors.
    1. Type error: any operation which is not defined for the type of data it is applied to.
    2. Type system: precise definition of the type bindings, the types values, and the legal operations on them.
    3. Strong typing: All type errors are detected by the system.
      1. Java is strongly-typed.
      2. C Is not.
      3. C++ attempts to strengthen C's type system.
      4. For static types, many type errors may be detected at compile time.
        1. Compile time is cheaper.
        2. Compile time is more reliable.
  4. Checking and Conversion
    1. Figure out what + in a + b means.
    2. Check that uses of variables agree with operations.
    3. Dynamic typing: Checking must be done at run time.
    4. Static typing: Checking may be done at compile time.
    5. Static can be more efficient by choosing operations once.
  5. Conversions and Coercions
    1. A type mis-match
      1. May just be illegal.
      2. May require a coercion: int + float.
    2. Conversions
      1. Explicit conversions: casts.
      2. Implicit conversions
        1. Should be limited to widening conversions.
        2. Many older languages (such as C) violate this rule.
  6. Basic Types.
    1. Usual hardware integers and floats.
    2. C and most languages leave sizes to implementer; Java specifies.
    3. Character sets. Has been ASCII (7 or 8 bit). Now Unicode (16 bit) or UTF-8 (variable).
  7. Complex Types
    1. Some interpreted languages (Ruby, Python, Lisp) provide unbounded integers, which are not hardware types.
    2. Enumerations.
    3. Pointers.
      1. Most interesting is linked structures, where pointers point to objects which contain pointers.
      2. Creation.
        1. Pointers to allocated objects.
        2. Some languages (C) allow creation of pointers to normal variables.
      3. Garbage collection.
        1. Most compiled languages don't have it, except Java.
        2. Most interpreted languages do.
  8. Compound Types.
    1. Compound types are built of objects of other types.
    2. Arrays.
      1. Members are laid out contiguously in memory.
      2. Access to a particular member is by offset calculation.
      3. Generates to however may dimensions desired.
      4. Subscripts (selectors) are numbers.
        1. Minimum subscript.
          1. C, C++, Java, many others: always 0.
          2. FORTRAN, Smalltalk: always 1.
          3. Pascal, Ada, others: user-defined.
        2. Subscript values may be variables computed at run time.
          1. Typical case allocates the array as a block. Component addresses are computed from subscripts.
          2. Java makes multi-dimensional arrays by allocating an array of array references.
            1. Needs only the one-dimensional location formula.
            2. Repeat for each dimension.
        3. Dope vector (or array descriptor).
          1. A list of the data needed to describe the array.
          2. Might be the number of dimensions and constant terms from the location computation formula.
          3. Might be α and bounds pair for each dimension (allows for checking.)
          4. Might be both: the constants (for finding the lvalue) and the bounds (for checking).
          5. It is also possible to create “fake” dope vectors to represent portions of larger arrays.
          6. The dope vector may be allocated with the vector and stored at its front, or it may be allocated separately.
          7. Your book includes the content type, which would only be needed for dynamic types, where it would probably not be useful, since those arrays are not usually homogenous.
      5. Homogeneity
        1. Most compiled languages require array members to be the same type (homogeneous), so all the slots are the same size.
        2. Most interpreted languages allow various types (heterogeneous).
          1. Usually implemented by storing references to the values in the array, rather than the value itself.
          2. The values may be of different sizes, but the references will all be the same, so the location formula still works.
      6. Slicing.
        1. Allow a portion of an array to be treated as an array.
          Python: a[3:8]
          Ruby: a[3..7]
        2. Can often be implemented by crafting a special dope vector without copying the array.
    3. Strings.
      1. Early languages had little string support; some in COBOL.
      2. Plain C supports arrays-as-strings. Brain damaged.
      3. Pascal/Ada family have fixed-length strings.
      4. Java has nice semi-builtin variable-length strings.
      5. C++ has a nice variable-length string class.
      6. It's interesting that it took so long to get here.
    4. Structs or records.
      1. Selectors are field names; not computable at run time.
      2. Like classes without methods or inheritance.
      3. Heterogeneous.
  9. Functional Types.
    1. Functions passed as parameters.
      1. Some languages allow functions to be passed as parameters. FORTRAN, Pascal, C.
      2. Example uses.
        1. Plotting an arbitrary function.
        2. Finding roots of an arbitrary function.
        3. Comparisons for sorting or other data structures.
        4. qsorter.c
      3. In C/C++, this is a special case of a pointer-to-function type.
      4. Java uses interfaces for this purpose.
    2. Anonymous functions.
      1. Lisp lambda expressions.
      2. Many scripting languages: perl sub operator, Python lambda, Ruby proc object.
      3. Java lambda expression (Java 8)
        1. (string z) -> { int x = x + 1; System.out.println(z + x);}
        2. (int x) -> x * x
        3. Is a closure.
        4. Return type generally inferred. Parameter types may be as well.
        5. Type is technically a kind of Runnable object.
      4. C++ anonymous function (C++ 11)
        1. [](int x) -> { return x * x }
        2. [z](int x) -> int { return z + x * x }
        3. Return type can be inferred.
        4. The square brackets are variables captured from the creation context. These are the only globals it may use.
        5. Is not technically a closure, since it only brings what you capture, and those can dangle.
        6. Frighteningly, the 2020 draft standardard seems to allow template anonymous functions. My head hurts.
    3. Dynamically-created functions.
      1. Lisp lambda expressions.
      2. Many scripting languages, usually through an eval function.
  10. Type Equivalence
    1. Structural equivalence.
    2. Name equivalence
      Name most common. C uses structural for arrays, but only for parameter passing since you can't assign arrays.
  11. Subtypes
    1. A type with constraints on its values.
      subtype Degrees_Arc is Integer range 0..360;
    2. Inheritance can be viewed as a form of this.
  12. Generics.
    1. Array-based Stack Examples.
      1. Ada: Generic package, Implementation, User.
      2. C++: Template class, User.
      3. Java: Generic class, User.
    2. Ada and C++ allow template constant intgers as parameters.
    3. Using generic types.
      1. Ada and Java require type parameters to be constrained. For instance,
        1. In Ada, you have to say it's an array of something if you want to subscript in the package.
        2. in Java you have to say it extends Comparable if you want to run compareTo on it.
        3. Checks uses of the type inside the package or class against the constraints.
        4. Checks that types sent on use comply with the constraints.
      2. C++
        1. Allows some limited forms of constraint, but mostly it waits to find what concrete type you send, and complain if it fails.
        2. Compiles the generic class assuming the parameter types can do whatever you do to them.
        3. Complains when you send a type that can't do that.
    4. Private types.
      1. See above Ada examples.
      2. Don't need it when you have classes.
  13. Type Inference.
    1. Some languages will assign static types based on use.
    2. auto
      1. A recent and simple form is the auto keyword of C++ post 2011.
      2. auto x = 20;
      3. auto i = some_map.find("nimrod");
      4. auto f(int a, int b) { return 3*a + 2*b; }
      5. The type is just the type of the expression.
      6. Most useful when the type is complicated or difficult to find.
    3. Technique is older and was developed in static functional languages.
    4. Functions in ml don't even have a syntax to declare return type; it is inferred from the return expression.