Chapter 4: MARIE: A Very Simple Architecture

Major Components.
1. Central Processing Unit (CPU)
  1. Data path and control.
  2. Registers.
  3. ALU.
  4. The control unit.
  5. The control unit directs data around the path.
  6. The CPU is a sequential machine that changes state whenever the clock ticks.
2. Buses.
  1. Parallel lines to send data between devices.
  2. Point-to-point or shared.
  3. Typical bus.
    1. Address lines.
    2. Data lines.
    3. Control lines.
  4. Bus types.
    1. Processor-memory.
    2. Back-plane buses. Connect CPU, memory, I/O devices.
  5. PC Buses
    1. System bus. Connect CPU, memory, and a few other internal devices.
    2. Expansion bus. Connect peripherals and expansion slots.
    3. Local bus. High-speed connection for some peripherals.
  6. Clocking.
    1. Synchronous buses change with the clock like many components.
      1. Connect components driven by the same clock.
      2. Connected components can be assumed in sync.
      3. Since clock periods are short, the bus must be short: The time for the signal to traverse the bus must be small compared to the clock period.
        
        At 133MHz, clock period is 7.52ns. Light travels a bit over 7 feet in that time.
        At 1Ghz, about a foot.
    2. Asynchronous buses are not clocked.
      1. Connect components with separate clocks.
      2. Require more complicated protocols. Discussed in Ch. 7. e.g.,
        
        Put data on the line.
        Raise the data ready line.
        Keep data until the receiver raises the acknowledge line.
      3. Can go longer distances.
  7. Bus control.
    1. A transfer has steps, which must be coordinated.
    2. If more than one device can initiate bus operations, they must take turns (arbitration).
      1. Daisy chain. Single, interrupt-able control line.
      2. Centralized controller. Separate control line to each device.
      3. Distributed self-selection. Any sort of deterministic turn-taking.
      4. Distributed, collision detection.
3. Clocks.
  1. Master clock regulates the CPU.
  2. Some buses have their own clocks.
  3. Overclocking
    1. The practice of running components, esp. the CPU, with a faster clock than the manufacturer specifies.
    2. Can improve performance.
    3. Can make components behave unreliably or overheat.
4. I/O System. Allows the CPU to communicate with other devices.
  1. Interface allows the running program to communicate over the I/O bus.
  2. Memory-mapped: certain memory addresses refer to data located in peripheral devices rather than RAM.
  3. Instruction-based: information in devices has its own set of addresses used with special input and output instructions.
5. Memory.
  1. Matrix of bits, described as length × width.
    1. Width is the size of the item addressed, usually 8 for a byte.
    2. Length is the number of items, usually bytes.
    3. A 4M × 8 memory has 4M bytes.
    4. It also needs 2^(22) address lines to specify which byte to read or write.
    5. A 64K × 8 Memory
  2. Memories are usually composed of several such chips.
    1. Multiple banks.
    2. Part of the address selects the bank; the rest selects the address within the bank.
    3. High-Order Interleaving
    4. Low-Order Interleaving
    5. Low-Order can be faster: Imagine fetching a word.

Marie

Marie Architecture.
1. 4K 16-bit words of main memory.
  1. Word-addressable, not byte-addressable.
  2. 12-bit memory addresses.
2. Each instruction is 16 bits.
3. Data registers are 16 bits: AC, MBR, IR
4. Registers which hold addresses are 12: MAR, PC.
5. I/O registers are 8, except in the simulator where they are 16.
6. The ALU is 16 bits wide.
Registers and purposes.
1. AC: The accumulator holds computation results and data in use.
2. MAR: Holds the address when accessing memory.
3. MBR: Holds a value being read from or entered into memory.
4. PC: Holds the address of the next instruction.
5. IR: Holds a copy of the currently-executing instruction.
6. InREG, OutREG: Holds data to be read by, or data written out by, the CPU.

Instruction Set Architecture.

Instruction Format:

Opcode 15-12

Address 11-0

The first four bits is the operation code, which tells what the instruction does.

0000	JnS X	Store the PC in memory at address X, then jump to X+1.
0001	Load X	Copy the value in memory at location X into AC.
0010	Store X	Copy the value in the AC to memory at location X.
0011	Add X	Add the value in memory at address X to the AC.
0100	Subt X	Subtract the value in memory at address X from the AC.
0101	Input	Copy the value from the input register to AC.
0110	Output	Copy the value from AC to the output register.
0111	Halt	Stop the machine.
1000	Skipcond cond	Skip the next instruction if the condition is true. Conditions are 00 for the AC is negative, 01 for zero or 10 for positive.
1001	Jump X	The next instruction to execute is the one at address X in memory
1010	Clear	Set the AC to zero.
1011	AddI X	Treat the low 12 bits of the value at address X in memory as another address, fetch the value from that address and add it to AC.
1100	JumpI X	Treat the low 12 bits of the value at address X in memory as another address, which is the location of the next instruction to execute.
1101	LoadI X	Treat the low 12 bits of the value at location X in memory as another address, fetch the value from that address and copy it to AC.
1110	StoreI X	Treat the low 12 bits of the value at address X in memory as another address, and copy the contents of the AC there.

Seems to me a load immediate would be a better use of op code A: Copy X to AC.
The I in the last four instructions stands for indirect.
1. They all specify some address indirectly, by specifying where to find the address, rather than giving the address itself. Better: Think of them as using X as a pointer, and following it.
2. The JnS and JumpI can be used for call and return (Example 4.4).
3. LoadI, StoreI and JumpI are useful with arrays. The X value is effectively a pointer (Example 4.1, 4.3.)
4. JumpI can also be used with some control constructs, or to implement function pointers or virtual function calls.

Marie Data Path.
1. Major components are connected via a 16-bit-wide bus.
2. During each clock cycle, the bus can transfer one 16-bit quantity from one specified device to another.
3. Some additional connections also exist. These can make additional transfers during the same cycle as a value is transferred on the bus.
4. Each instruction is implemented by a series of computations and transfers.

RTL Definitions.

Describe how the datapath is used to implement the instructions.
The X refers to the instruction parm. Really IR[11-0]

Instructions which can be executed simultaneously are listed on a a single line.

JnS X

MBR ← PC MAR ← X M[MAR] ← MBR MBR ← X AC ← 1 AC ← AC + MBR PC ← AC

Load X

MAR ← X MBR ← M[MAR] AC ← MBR

Store X

MAR ← X, MBR ← AC M[MAR] ← MBR

Add X

MAR ← X MBR ← M[MAR] AC ← AC + MBR

Subt X

MAR ← X MBR ← M[MAR] AC ← AC - MBR

Input

AC ← InREG

Output

OutREG ← AC

Skipcond

If IR[11-10] = 00 then If AC < 0 then PC ← PC + 1 else if IR[11-10] = 01 then If AC = 0 then PC ← PC + 1 else if IR[11-10] = 10 then If AC > 0 then PC ← PC + 1

Jump X

PC ← IR[11-0]

Clear

AC ← 0

AddI X

MAR ← X MBR ← M[MAR] MAR ← MBR MBR ← M[MAR] AC ← AC + MBR

JumpI X

MAR ← X MBR ← M[MAR] PC ← MBR

LoadI X

MAR ← X MBR ← M[MAR] MAR ← MBR MBR ← M[MAR] AC ← MBR

StoreI X

MAR ← X MBR ← M[MAR] MAR ← MBR MBR ← AC M[MAR] ← MBR

Marie Logisim Data Path
1. Signal Definitions
2. The above circuit implements the Marie processor using Logisim. The datapath is on the first page.
3. Make sure the input marked “Controller Select” near the top right is set to zero. This allows manual control of the datapath through the various inputs shown on the right side of the circuit.
4. I have included a dedicated increment unit for the PC. There is no indication of it on any of the diagrams, but p. 258 mentions the assumption that the PC can increment itself. This is also needed to make sense of much of the RTL and other discussion of the control unit.
5. Note that Fig. 4.15 gives a detail of the connection of the MBR to the bus. Some difference and problems:
  1. Ignore the version in the 3rd edition. It's broken. Use the 4th edition version.
  2. Fig. 4.15 uses AND gates to generate select signals from the control signal groups P5P4P3 and P2P1P0. The left AND gate generates a store enable that causes the register to save the bus value on the next clock, and the one on the right generates a bus enable which places the registers value onto the bus for some other device to receive. My implementation uses two decoders, each of which generates eight selects at once, rather than using two sets of AND gates for each register.
  3. The logisim program provides a built-in register object, which I have used rather than entering 16 individual D flip-flops. These simulated devices have a pin for store enable to which I connect the select line.
6. Fig. 4.9 shows three inputs to the AC (Bus, ALU and MBR). Since I'm using a multiplexer to choose which input to send the register, I have to round up to a power of two. So I added a source of constant one, thinking it might be useful. So far, it hasn't been.
7. The list in Sec. 4.8.1 on p. 231 says that the input and output registers are 8 bits. The input and output registers in the Marie simulator have 16 bits, though they only display the lower byte when set to ASCII mode. My circuit has 16 bit input and output registers.
8. I've attached a 4x4 LED matrix to the output register. Each LED corresponds on one bit in the 16-bit output value, and lights if the bit is a one. It's not particularly useful, but it can be amusing.
Fetch-execute cycle.
1. Generally.
  1. Fetch the instruction as indicated by the PC.
  2. Decode the instruction (figure out what to do next).
  3. Fetch any required operands from memory.
  4. Perform the instruction, perhaps saving results to memory.
2. Marie specific.
  1. Copy the PC to the MAR.
  2. Fetch the instruction at the address given in MAR, and place it in the IR, while also incrementing the PC.
  3. Decode the high four bits of the IR to adapt to the particular instruction, while copying the lower 12 bits into MAR.
  4. If the particular instruction needs the value located a M[MAR], fetch it.
  5. Perform the instruction, including any other required memory operations.
The control unit.
1. Takes certain input from the datapath.
  1. Op code and skip code from the IR.
  2. AC negative and zero signals.
2. Generates control signals to move data.
3. Marie Control State Diagram
4. Approaches.
  1. Implement the state state diagram using a ROM.
    1. High-Level Design
      1. The State register holds the current state.
      2. The Signal ROM is indexed by the state. That entry contains all the the datapath signals for that state, and specifies the next state.
      3. If there can be only one next state, its number is part of the entry.
      4. If there can be multiple next states, look up the correct one in the Jump ROM.
    2. See the “Simple ROM Controller.”
      1. Contents of state ROM
      2. Contents of jump ROM
  2. Build from gates.
    1. Simple approach: Replace ROMs with gate circuits that produce similar signals.
    2. Text Approach.
      1. Instead of current state, keep a counter of the instruction cycle step.
      2. The decoder.
        
        Mainly decode the op code to one line for each code.
        Might decode other fields.
        Might just be combined with main control matrix.
  3. Microcode controller
    1. Another form of ROM controller.
    2. Instead of putting signals in the ROM, it contains codes which which decode to the signals.
    3. The ROM entries are viewed as instructions forming a microprogram.
    4. Instead of a state register, there is a microinstruction counter.
    5. Each instruction either branches, or the counter is incremented.
    6. Microinstruction Format
    7. Microperation Codes
    8. Partial Microprogram
      Note: The two above figures have been modified to have MBR ← PC for code 01000. This is needed for the JnS RTL.
Traps and Interrupts.
1. Marie does not implement these, but any serious CPU does.
2. Interrupts.
  1. Interrupts are initiated by an I/O device, which sends a signal to the CPU. The signal indicates that the device has completed some operation, is available for use, or has data to be collected.
  2. At the top of the instruction cycle, before fetching from M[PC] check if an interrupt has been raised. If so, fetch the next instruction from a pre-defined address instead of PC. This is an asynchronous jump.
  3. The address may be a single fixed address, one of a list depending on the device, or stored in a table located at a standard place.
  4. The code located at this location is the service routine or interrupt vector. This is part of the operating system.
  5. The service routine must save the CPU state, service the device, then restore the state so that the interrupted program sees no change.
  6. The interrupt is blocked until the service routine acknowledges it.
  7. Most interrupts may also be temporarily blocked by software (generally the OS). They should not be blocked for a long time, as data from the device may be lost.
3. Traps
  1. Also called software interrupts or exceptions.
  2. Generated by an instruction, not an external device.
    1. Errors, such as division by zero.
    2. Deliberate traps to invoke the operating system.

Questions

Chapter questions:

4th, 5th ed: 2, 4, 6, 10, 15, 20, 21, 22, 26, 29, 37, 43, 44, 47, 54

3rd ed: 2, 4, 6, 9, 13, 17, 18, 19, 23, 26, 33, 41, 47

(For 22/19, perhaps do only do some of the instructions.)