Introduction to Computer Organization

Cache

Storage options

Option 1: SRAM

~2ns access time. Super fast. The decoders are big and the array is big.

Really damn expensive, and with a high area requirement. Volatile: requires constant power, or resets memory.

Option 2: DRAM

Slower: ~60ns access time. Hurry up and wait design philosophy doesn't work here.

Costs 0.2% as much as SRAM per megabyte. Still really expensive. Volatile: requires constant power, or resets memory.

Option 3: Flash

Slower still: ~250ns access time.

Costs 10% of DRAM per megabyte. Non-volatile: Does not get reset with computer reset.

Option 4: Disks

These are obnoxiously slow, because they are waiting for a physical device which is spinning so you can read the actual bits you want.

Gives access times that are in 300Hz range. Maybe 300-400 times a second.

However, this is super super cheap, and non-volatile.

Option 5: Optical Disks (Blu-Ray)

Sounds crazy and ridiculous, but people really do this. Facebook has a system called "Cold Storage", where they burn your super old photos.

Memory Hierarchy

Goals

  • Fast: Ideally run at processor clock speed
    • 1 ns access
  • Cheap: Ideally free
    • Not more expensive than rest of system

Choose one.

Can we combine multiple memory technologies?

Structure

You can build multiple levels of SRAM caches, then you can switch to DRAM, and finally go down to flash or disk.

  • Use a small array of SRAM
    • Fast!
    • For the cache (hopefully covers most loads and stores)
  • Use a larger amount of DRAM
    • Cheaper than SRAM, faster than flash/disk
    • For the main memory
  • Use a lot of flash and/or disk
    • Non-volatile. Cheap. Big.

Don't try to buy \(2^{64}\) bytes of anything. Virtual memory makes it look like the entire address range is available. A few TB is enough for most machines today.

Definitions

The architectural view of memory is

  • Defined by ISA
  • What the programmer sees: just a big array

Don't tell programmer about this!

Breaking up the memory system into different pieces – cache, main memory and disk – is not architectural. Part of the microarchitecture

Function of the Cache

The cache will hold data that we think is most likely to be referenced.

  • Minimize the average memory access latency.
  • Maximize the number of references that are serviced by the cache
  • How do we decide this?

How to design a cache

Cache memory can copy data from part of any part of main memory

  • Has 2 parts:
    • The TAG (CAM) holds the memory address
    • The BLOCK (SRAM) holds the memory data
  • Compare reference address with the TAG

CAMs: Content Addressable Memories

Instead of thinking of memory as an array of data indexed by a memory address, think of memory as a det of data matching a query

  • Instead of an address, we send a key to the memory, asking whether the key exists, and, if so, what value is associated with that
  • Memroy answers: yes/no (hit/miss for caches) and gives associated value (if there is one)

Operations:

  • Search: Decide whether value exists or not
    • If found, either return index or value
  • Write: Send data to CAM to remember
    • Where should it be stored if CAM is full?
    • Replace oldest data in CAM
    • Replace least recently searched data

Is there an optimal replacement policy?

You want to kick out the one that results in the fewest total misses in the program execution. A guy named Belady made an algorithm (Belady's Algorithm) that goes to an oracle and tells you the line that is accessed the furthest into the future.

Picking the most likely addresses

If programs were random, any address is just as likely to be reused as any other address. HOWEVER, programs are not random. They tend to use the same memory locations over and over. We can use this to pick the most referenced locations to put into the cache.