Chapter 7: I/O Systems

We will not cover the appendix, 7A. Depending on time, we may abbreviate or skip part or all of 7.6–7.9.

Speedup of a component.
1. Text refers to this as Amdahl's Law, which I've only seen stated in a more limited form.
2. If a system is modified to run faster (or slower), the speedup is just the ratio of the old speedup to the new: S=To/Tn. So if the new version takes half as long to complete, the speedup is 2. The faster the newer version is (the less time it takes), the greater the speedup.
3. We assume that the system speedup results from speeding up some component of interest, whose running time changes from to to tn. The speedup of that component is given as k=to/tn.
4. Since the faster system speed is achieved by replacing the old version of the COI with the new one, Tn=To−to+tn.
5. The portion of the total (original) time time used by the COI is given by f=to/To.
6. Put on your algebra hat:
  S=
  To
  
  Tn
  =
  To
  
  To−to+tn
  =
  1
  
  1−to/To+tn/To
  =
  1
  
  1−f+(to/To)(tn/to)
  =
  1
  
  (1−f)+f/k
7. f = , k = S =
8. If f is small, a large k doesn't help much.
9. Speeding up the CPU doesn't help much if your system is spending most of its time doing I/O. And conversely.
Bus Arrangements.
1. Fig. 7.1.
Addressing devices.
1. Devices have registers which can be fetched and stored.
2. Some registers may contain data being stored, fetched, or displayed.
3. Other registers contain device status. Often, individual bits mean different things: Printer status register may have on-line, paper present, busy printing, etc.
4. Some stores or fetches have side-effects. For instance, storing printer register may cause the character to be printed.
5. Memory-Mapped I/O.
  1. Some memory addresses are re-directed to device registers.
  2. CPU can communicate with regular store and fetch instructions.
6. Port I/O.
  1. Device registers are assigned port numbers, an addressing space separate from memory.
  2. CPU can communicate with devices using in and out instructions, which are like store and fetch.
    (If you recall, Marie has input and output instructions, but only one device register.)
7. The book confuses these terms with control methods.
Controlling devices.
1. What is needed?
  1. Devices must be given data to display or record.
  2. Data from the device must be moved to or the CPU to use.
  3. Lots of delay and waiting.
    1. Keystrokes or networking packets arrive without warning.
    2. There is a long delay between requesting a disk sector and the availability of the data.
    3. After a device accepts some data, there may be a long processing delay before it may accept more, or take some other action.
    4. Some operations, disks particularly, may involve a series of steps with waits between.
2. Programmed I/O.
  1. The CPU directly commands all steps.
  2. It must frequently poll devices so it knows what to do.
    1. Periodically ask the keyboard if there is a character in its register.
    2. After asking for a disk sector, repeatedly ask the drive if the data is available.
  3. Very inefficient: CPU wastes a lot of time with “are we there yet?”
3. Interrupt-Driven I/O.
  1. Devices notify the CPU of changes in status.
  2. Sends an interrupt signal to the CPU. The service routine manipulates the device.
  3. The CPU can work on other things while the I/O works.
    1. For instance, ask the disk for a sector and to interrupt when available.
    2. Work on something else.
    3. When the interrupt occurs, retrieve the contents of the sector.
  4. Typically, the O/S handles the interrupts and lets a user program run while waiting on devices.
4. Direct Memory Access I/O.
  1. The DMA controller is a separate processor.
  2. The CPU gives it a block to transfer. It does so, and interrupts the CPU when done.
  3. From the CPU's viewpoint, it groups several interrupting operations into one.
  4. I've never seen anyone else put the registers describing the block in the CPU rather than the DMA controller itself.
  5. Shared data buses.
    1. The device that directs the operation of a bus is the bus master.
    2. In simplest case, only one device attached to a bus is master, but here the CPU and DMA must share.
    3. If the DMA controller is mastering, the CPU will wait. Memory units can usually wait, but I/O device timing is often critical. Called cycle stealing.
  6. Fig. 7.6.
5. Channel I/O.
  1. More advanced DMA controller.
  2. A channel runs a “channel program”, which allows it to perform more general operations than the DMA transfer-a-block.
Block devices.
1. Usually disks or other storage devices.
2. Transfer a fixed-size block at a time.
  1. Typically 512 bytes for a hard drive.
  2. 2K for a CD-ROM.
3. Usually can seek a particular block before transfer.
4. Tapes are block devices that cannot seek.
Character devices.
1. Transfer a byte at a time. Keyboard, mouse.
2. Transfer variable blocks. Network card.
3. No seeking.
Buses.
1. Bus structure
  1. Each horizontal line represents several lines.
  2. Address and data lines may be shared.
2. Synchronous buses.
  1. Bus uses a share clock which synchronizes all operations.
  2. One of the bus lines is a central clock which synchronizes all operations.
  3. Simplifies operation.
  4. All connected devices must use the same clock (and be of similar speed).
  5. Limits either speed or length.
3. Asynchronous buses.
  1. Connected devices have independent clocks.
  2. Operations must by synchronized using a handshake protocol, much like a network connection.
  3. Works better to connect devices of various speeds.
4. Timing diagrams.
  1. Timing diagrams show the handshake interaction. Often, one change must wait for another.
  2. Synchronous bus timing diagram
  3. Asynchronous bus timing diagram (printer).
  4. Other examples in here.
  5. Single signals are single lines.
  6. Bars are groups of signals which are not all the same.
  7. Transitions are drawn diagonally to indicate that changes are slow because they are long. “Settle time”.
  8. Shaded bars indicate that the data need not be valid. No device should use the contents of the lines during this period.
  9. Synchronous diagrams show the clock, and operation is divided into specific periods. Signals change on clock ticks.
  10. Asynchronous do not have a clock. Signals change in response to signals from the other end.
5. Bus Circuit
  1. Based on Fig. 7.11.
  2. The book describes this as a disk bus, but I'm storing things in a RAM, since there's no disk device in Logisim.
  3. The RAM controller has some pseudo-random delays, to make the behavior slightly more disk-like.
  4. The other controller spends an extra cycle on fetches for no good reason. I'll try to fix this eventually.
6. Serial v. Parallel.
  1. Parallel buses have multiple data lines; serial has just one.
  2. Parallel should be faster.
  3. Trend is to serial, since the simpler operation seems faster in practice.
Hard drives.
1. Organization
2. Sectors.
3. Rotate constantly, traditionally 3600RPM. Now 7200 is common. Also common to spin down automatically if not used for a while.
4. Organization
  1. Data is stored in blocks called sectors, typically 512 bytes.
  2. Newer disks may have more sectors on outer tracks.
  3. Sectors are arranged in circular tracks.
  4. The head can move to any cylinder.
  5. Tracks located over each other form a cylinder. All the data in a cylinder can be read without moving the head.
  6. Cylinders are numbered consecutively from the outside.
  7. Sectors may not be consecutive around the track; they may be interleaved.
5. Times
  1. The seek time is how long it takes to move the head to the desired track (cylinder).
  2. The rotational delay is the time waiting for the sector to reach the head.
  3. The latency is the time to transfer the sector.
  4. Access time equals seek time + rotational delay.
  5. Transfer time equals access time + latency.
  6. Seek time is by far the largest. File systems usually aim to minimize it.
6. Errors.
  1. The CPU may lose track of the head, in which case it moves to 0 and starts over.
  2. Each sector contains a header with its own location and ECC code. Errors can be fixed or re-reread.
  3. Disks are manufactured with spare sectors which the controller can substitute for ones which start to fail.
CD-ROM
1. Designed for music; data an afterthought.
2. Bits = Pits & Lands.
3. Spiral instead of concentric tracks.
4. Constant Linear Velocity (CLV)
  1. The track moves past the head at a fixed linear rate.
  2. So the rotational speed much change with the track.
5. TIMP(CH07_FIG0718_plim.jpg, Track format (mode 1 for CD-ROM).)
6. Chunks of 2352-byte sectors, containing 2048 bytes of data in mode 1.
7. Music is played at 75 sectors per second. Computer CD-ROMs operate at some multiple; 52x is typical.
DVD.
1. Higher-frequency laser allows smaller pits. (650nm instead of 780nm).
2. Can have TIMP(CH07_FIG0720_plim.jpg, two layers by focusing the laser.)
3. Better error correction.
Blue-Ray/HD-DVD, 405nm laser.
Writable CDs use a dye layer which changes color under the heat of a laser.
RAID
1. Multiple disks to improve reliability.
2. Level 0
  1. Simple interleaving: assign blocks in order across the drives.
  2. Speeds up operations due to parallel operation.
  3. Actually reduces reliability.
  4. Strips and stripes.
3. Level 1
  1. Duplicate each stripe.
  2. Efficiency gains of level 0.
  3. May be other gains on read if head position is considered.
  4. Doubles the amount of storage needed.
4. Level 2
  1. Each strip is one bit.
  2. Hamming codes are error-correcting. Number of drives is logarithmic.
  3. Any drive that fails can be reconstructed.
  4. Not often used.
5. Level 3
  1. Similar to level 2, but a single parity bit is kept.
  2. pk=dk0⊕dk1⊕dk2⊕…⊕dkn
  3. Exclusive or has the nice property: x⊕y⊕x=y
  4. Any single failed disk can be reconstructed.
  5. One extra disk, but reliability improvement is reduced if you're too cheap.
6. Level 4
  1. Same idea, but the strip is back to a reasonable size.
  2. Single parity disk creates a bottleneck on write.
7. Level 5
  1. Changes level 4 to distribute the parity information.
  2. Read efficiency are similar to level 0.
  3. Updating parity may be expensive, but..
    1. The whole stripe can be read in parallel.
    2. Followed by a write of the update parity.
    3. This need not delay the caller, since it can happen after the write. Assuming the system as a whole has sufficient capacity.
  4. This level is popular in practice.
8. Level 6
  1. Two different correction disks, one using a different scheme.
  2. Can survive two disk failures within the repair time.
  3. Write performance is even worse.
9. Other multi-check schemes.
Supplemental sources.
1. Computer Organization and Design by David A. Patterson and John L. Hennessy, 4th ed., pub. by Morgan Kaufmann.
2. A nice discussion of buses by Edward Bosworth.