Chapter 5: Real-World Architectures
  1. Traps and interrupts.
  2. Real-World ISAs
    1. Much more arithmetic.
      1. Multiply and divide instructions.
      2. Floating point instructions.
      3. Sometimes BCD support.
    2. Logical instructions.
      1. Bit-wise and, or, xor, not.
      2. Shifts and rotates.
    3. Control transfer.
      1. MARIE-style skips are not common now.
      2. Conditional branches.
        1. Branch performs the comparison, and branches if true.
        2. Compare sets flags, and branch tests flags (Pentium-style).
    4. Variable memory units.
      1. Byte-addressable memory.
      2. Register size of four bytes or more.
      3. Instructions to store/fetch in 1, 2, 4, maybe 8 byte units.
      4. Longer units have multiple addresses
        1. Little-endian: Low-order byte in lowest address.
        2. Big-endian: High-order byte in lowest address.
        3. Pentium uses little-endian.
    5. Multiple registers.
      Pentium (32-bit) EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP, plus status word
      MIPS $0-$31.
      ARM R0-R15, plus status word.
      1. Arithmetic is often, or only, between registers.
      2. Some registers have special purpose, either in the architecture, or by convention.
    6. A system stack.
      1. One register is the stack pointer. It holds an address which is the top of the stack.
      2. Push and pop by moving the stack.
      3. Function calling generally uses the stack.
    7. Immediate operands. Most machines have an instruction to load a constant into a register, usually of limited size.
    8. Addressing modes.
      1. Relative: address = register + offset.
        1. Widely (universally?) available.
        2. Offset of zero allows indirect addressing.
        3. Often used with the stack pointer, to access data relative to the current stack top.
      2. Direct addressing (like Marie) is rarely used.
      3. Sometimes indirect addressing through memory, like FetchI has been provided, but not usually on current architectures.
    9. Instruction format.
      1. Older designs (notably Pentium) have instructions of differing size. (Current architectures have instructions all the same size.)
      2. Format must be able to specify at least two arguments.
      3. Op codes may be variable sized.
        1. Book describes this under “Expanding Opcodes”.
        2. For instance:
          1. First 15 op codes are four bits 0000–1110.
          2. Next 15 are eight bits: 11110000–11111110
          3. And 8 more at eleven bits: 11111111000–11111111111.
        3. Can use the short codes for instructions which are common or have longer arguments.
  3. Stack architectures
    1. Not to be confused with a system stack.
    2. Operations are applied to the stacktop instead of registers.
    3. Not common.
  4. Pipelines
    1. Speed up the CPU by working on multiple instructions at the same time.
    2. Most like an assembly line..
    3. The assembly line is made of large registers. The instructions, along with needed partial results, pass from one to the next.
    4. Results are saved at the end; partial results may be safely discarded.
    5. Data dependence is a problem
      1. An instruction uses a value produced by a previous one.
      2. Pipeline must wait, or values must be copied backwards.
    6. Conditional branches are a problem
      1. Don't know what the next instruction is.
      2. Approaches.
        1. Just stall the pipeline: Don't fetch instructions past the branch until the result is known.
        2. Static branch prediction
          1. Always assume instruction does not branch.
          2. When you find out that it does, discard following partial results.
        3. Dynamic branch prediction
          1. A small cache keeps track of recent branch results.
          2. One-bit: Retain last branch result and presume next will do the same.
          3. Two-bit: Retain last two results. Make the same predition until its wrong twice. twice.
        4. Fetch both instructions, one for branch and one for not
          1. Discard the wrong one as soon as you know.
          2. Generally requires duplicating many components so you can do both branches at once.
        5. Delayed branch
          1. The ISA semantics specify that the instruction after the branch is executed unconditionally, and any branch takes effect after that.
          2. Depends on a smart compiler to find something to do during the delay slot.
          3. If there's nothing useful to do, code a no-op.
  5. Super-scalar: Multiple pipelines.
    1. Instructions must be dispatched in groups.
    2. Dependencies within the group are a problem.
      1. Hardware manages the dependencies.
      2. They are just errors; compiler must arrange the instructions.
  6. MIPS
    1. Each instruction is 32 bits, in one of three formats.
      R-Format (add)
      op
      r1
      r2
      rdest
      shamt
      func
      I-Format (load, store)
      op
      rb
      rsd
      offset
      I-Format (branch)
      op
      r1
      r2
      offset
      J-Format (jump)
      op
      target

      The op and func codes are each 6 bits, the register numbers are 5, the offset is 16, and the jump target is 26.
    2. The R-format uses a form of what the book calls “Expanding Opcodes.” The “op” for all math operations is the same, distinguished by the “func” code.
    3. Register numbers are 0-31, but 0 is a constant “register” which always contains zero.
    4. Assember programs can designate a register with either its number or a standard symbolic name.
    5. Symbolic names reflect conventional use, such as $sp for stack pointer.
    6. MIPS Example (runs on the SPIM simulator).
  7. Pentium
    1. EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP are the main program-accessible registers.
    2. Instructions have a complex, variable length code, described in some detail here.
    3. Briefly, each instruction has up to six parts, all but the op code optional.
      1. Prefixes, 0–4 bytes. Various behavior modifiers.
      2. Op code, 1 or 2 bytes.
      3. ModR/M, specifies what type of arguments are used (registers, memory references, type of memory reference computation, or immediate). It also contains one or two argument register numbers, as needed.
      4. SIB, if the ModR/M specifies it should be present, describes an argument address computation using two registers, one shifted by a power of two. It is one byte long.
      5. Displacement, 1, 2 or 4 bytes. A constant added to the computation specified by the SIB. The ModR/M specifies if this needs to be present.
      6. Immediate value, 1, 2 or 4 bytes. Present when one of the arguments is a constant.
    4. Designed to minimize the memory space needed to store a program. Folks don't worry so much about this anymore.
  8. Don't need to know about the JVM discussion in the text.