Chapter 6: Memory Systems

Types of memory.
1. Random-Access Memory (RAM).
  1. Volatile.
  2. Static RAM (SRAM)
    1. Transistor memory, much like registers.
    2. Retain contents so long as power is applied.
  3. Dynamic RAM (DRAM).
    1. Collection of small capacitors.
    2. Loses contents after a few ms. Motherboard refreshes to maintain contents.
    3. Slower than SRAM.
    4. Denser and cheaper than SRAM.
    5. Comprises most of a PC's main memory.
2. Read-Only Memory (ROM).
  1. Holds fixed contents and cannot be stored by the CPU.
  2. Also random-access. (Unfortunate naming.)
  3. Non-volatile (of course).
  4. Primarily needed for booting.
  5. Types
    1. Basic ROM chips are manufactured with content and unchangeable.
    2. Programmable ROMs (PROMs) can be “burned” once with appropriate equipment.
    3. Erasable-Programmable ROMs (EPROMs) can also be erased and re-programmed.
    4. Electronically- (EEPROMs) can be erased by the computer that contains them, under program control, though they don't respond to plain store operations.
    5. Flash a variety of EEPROM.
Memory Hierarchy
1. Speed up a large, slow, cheap (per-byte) storage technology by keeping the most active contents in a small, fast, expensive memory called a cache.
  1. Check in the cache.
  2. If present, produce the answer.
  3. If absent, fetch from the larger store.
    1. Save the value in the cache, perhaps replacing older there.
    2. Produce the answer.
2. Primary hierarchy
3. Each level is a cache for the on its right.
4. Cache performance terms.
  1. Hit: The information was found in the cache.
  2. Miss: It wasn't.
  3. Hit rate: The proportion of references which are hits.
  4. Miss rate: The proportion of references which are misses.
  5. Hit time: The time required to access information at a given level.
  6. Miss penalty: The time required to process a miss, including the overhead of adding it to the cache.
5. Locality of reference
  1. Temporal locality: Recently accessed locations tend to be accessed again soon.
  2. Spatial locality: After a reference, nearby locations are more likely to be next.
  3. Sequential locality: Spatial locality that resulting from sequential instruction execution.
Cache memory.
1. Made of SRAM. Holds entries from RAM.
2. Managed automatically by hardware.
3. PC's typically have at least two, one on the CPU chip, and one on the motherboard.
4. Arrangements.
  1. Direct Cache.
    1. Division:
      Tag
      Block (Line Number)
      Offset (Within Block)
    2. Item is located on the indicated cache line, at the indicated offset.
    3. The valid bit is set to indicate the line contents are valid.
    4. A new reference must use that line, and replace anything that's already there.
  2. Fully-Associative Cache.
    1. Division:
      Tag
      Offset (Within Block)
    2. Any cache row can take any location.
    3. Look-up by search, not an index number from the address.
    4. Search is done in parallel.
  3. Set-Associative Cache
    1. Division:
      Tag
      Block (Line Number)
      Offset (Within Block)
    2. A compromise.
    3. Divide the address in three parts and choose a line.
    4. Each line is a small associative cache.
    5. Gives more flexibility, so a few rows with the same line may be retained.
5. Replacement policies: What to evict?
  1. Least-Recently Used (not practical).
  2. First-In First-Out.
  3. Random.
6. Can start the cache and memory access in parallel.
7. When does caching not work well?
  1. A program might not exhibit good locality.
  2. An array scan where the step size equals the row size can be very bad.
8. Write policy: What to do with written data.
  1. Write-through: Stores sent both to cache and memory.
  2. Write-back.
    1. Stores just go to the cache.
    2. Copied from cache to memory when the entry is replaced.
9. Special caches.
  1. Separate instruction and data caches.
  2. Victim cache: a small associative that holds entries evicted by conflict.
  3. Trace cache: hold decoded instructions.
10. Cache levels.
  1. Usually multiple levels of cache.
  2. Level 1 is on the chip with the CPU.
  3. Level 2 on the system motherboard.
  4. Some computers have a high-speed level 2, and a slower level 3.
  5. Inclusive: Entry replicated in higher-level caches.
  6. Exclusive: Entry in just one place.
Virtual Memory
1. Name refers to a system which allows a program to use more RAM than the system has.
2. Implemented by keeping memory contents on disk, and moving things into and out of memory as needed.
3. Uses RAM as a cache for information stored on disk.
4. Page Mapping.
  1. Addresses used by the program are not the true addresses of the information in RAM.
    1. The program uses virtual addresses
    2. The data are located in RAM at real addresses
  2. Memory contents is divided up into fixed-size blocks called pages.
    Typical page size 4K.
  3. Addresses are broken into a page number and and in-page offset.
    
    Page Number
    Offset (Within Page)
  4. The page size is a power of two, and the offset size is the power.
    1. All addresses within the page have the same page number.
    2. Offsets are from 0 to page size minus one.
  5. Main memory is divided blocks of the same size, called page frames.
  6. Each frame holds one page (or is perhaps empty).
  7. Each address in memory is divided in a similar way.
  8. Each reference is
    1. Divided into a page number and offset.
    2. The page number translated to the frame number where the page is located.
    3. The frame number and the offset is sent to the actual RAM.
  9. Programs must be rounded up to a multiple of the page size. This waste is is called internal fragmentation.
5. The page map.
  1. The mapping from page number to frame number is stored in a page table.
  2. Entries are located at the page number offset.
  3. A page table entry contains
    1. A valid bit indicating if the entry contains valid data.
    2. The frame number where the page is located.
    3. The referenced and modified bits (later).
    4. Permission bits specifying the operations allowed on the page.
  4. References are translated by looking up the PTE in the table, and using the frame number to build the real address.
  5. Translation Look-Aside Buffer (TLB) speeds up translation.
    1. A special cache of PTEs.
    2. Fully associative.
    3. Avoid needing an extra memory access (for the page table) for every memory access.
6. Demand paging.
  1. There are more pages than page frames. Extra pages are kept on disk.
  2. When such a page is needed,
    1. The program waits until the page can be copied in from disk.
    2. It is placed in a free frame, if any, or a page is removed from memory to make room.
  3. Translation is handled by hardware.
  4. If a page is absent from RAM, its PTE is not valid, and the translation fails.
  5. This failure causes a page fault, which is a trap that invokes the O/S.
  6. The O/S is responsible to:
    1. Suspend the running program.
    2. Choose a frame and read the page into it.
    3. Update the page table.
    4. Restart the program.
7. Full Procedure.
  1. Note that part of this is performed by hardware, and part by software.
  2. Hardware produces a page fault to make it the software's problem.
  3. Depending on the CPU, this will be after the failure of either the PTE lookup or the page table lookup.
8. Referenced and modified bits.
  1. When a PTE is used, the hardware sets its referenced bit.
  2. If the use is a store, it also sets the modified bit.
  3. When the O/S brings a page in from disk, it clears both.
  4. When a page is removed from RAM, the modified bit tells if it needs to be copied back to disk.
  5. The referenced bit is used by the O/S to help choose which page to replace.
    1. The O/S has some replacement policy
    2. Most involve periodically clearing the referenced bit so it tells if the page was referenced recently.
9. Virtual memory as caching.
  1. Fully associative (a page may go anywhere in memory).
  2. Write-back.
Segmentation.
1. A segment is a logical division of the program: a single function, library, object or data structure.
2. Segments differ in size.
3. Addresses are explicitly two-part: [ segment number, offset (within segment) ], denoting a location inside a specified segment.
4. A segment table contains a descriptor for each segment.
  1. Offset into the table to find the segment descriptor.
  2. Descriptor contains the base, limit (size), and permissions for the segment.
  3. Verify the offset < the segment size, or fault.
  4. Verify the permissions allow the operation, or fault.
  5. Add the base to the offset to form the real address.
5. Variable-sized segments are difficult to manage.
  1. Segments must be given contiguous slots in RAM.
  2. Segments are frequently added or removed.
  3. Over time, unusably-small chunks accumulate.
  4. This is called external fragmentation
6. Segmentation with paging.
  1. Eliminates external fragmentation in favor of internal.
  2. Classic: Instead of a base address, the segment descriptor contains the location of a page table. Treat the offset as a virtual address using this table.
  3. Pentium: Add the base to the offset as above to produce a virtual address, and process it against a global page table.
7. History.
  1. Used in GE-645 supporting the Multics O/S.
  2. Pentium supports a Mutics-inspired segmenting system. No one uses it.
Pentium.