Introduction to Computer Organization

Cache Organization: Set-associative caches

Recap: Direct Mapped Cache

In a fully-associative cache, a memory block can be associated with any cache line. In a direct-mapped cache, each memory block is associated with one of a discrete set of cache lines, which are given a color.

Hardware Implementation

In a direct-mapped cache, you have three components:

Tag
Line index
Block offset

The tag store has a decoder which takes the line index and maps it to a single tag. This tag then is fed in to the data store, which has the cache line we're looking for. Then, using the block offset, a MUX tells which of the bytes in the data block that you want. This returns a hit or miss, and the data you are looking for.

Set-Associative Cache

Set-associative caches have the advantages of both direct-mapped caches and fully associative caches. It partition memory into regions (fewer than direct mapped), and then associates a region to a set of cache lines. It checks tags for all lines in a set to determine a hit.

Another way to think about this: each line is treated like a small, fully associative cache. The search can be done in parallel, and this still allows for a LRU replacement policy.

Within each cache set, there may be multiple ways, or divisions of the set into blocks.

Hardware Implementation

With each address, you store:

Tag
Set index
Block offset

First, you use the set index to check which set you want. Rather than looking at one tag, you look at two tags, one in each way. You look at the tag bits for the blocks in both ways, and find which one matches. You take this tag address, and you index a data store with that address, mux it to choose the way, and mux it again with the block offset to get the value.

Reasons for Cache Misses

So now that we have this wonderful sparkling unicorn of a cache system, how could it ever go wrong? There are three ways.

First reference to an address. You can't avoid this (Compulsory). This is called a "cold start" miss.
Cache is too small to hold all data. Capacity miss. Would have hit with a large enough cache.
Replaced it from a busy set. Conflict miss. Would have hit with a fully associative cache.

How to fix this?

Capacity misses

Reduce this by increasing the cache size.

Conflict misses

Increase the associativity.

Cumpolsory misses

Increase the block size.

Cache Parameters vs. Miss Rate

In addition to the three methods above, you can also improve the replacement policy. There has been a lot of research on it.

Cache Size

Cache size is the total data (not including tag) capacity. Bigger can exploit temporal locality better. However, this is not always better - the bigger it is, the slower it is to access. Too large of a cache adversely affects the hit and miss latency. Smaller is faster and bigger is slower.

The working set size is a happy medium where you get most of the data you need in the cache and maybe not all of it.

It's the set of data you're using "now-ish".

Block Size

Block size is the data that is associated with a tag. If blocks are too small, doesn't exploit spatial locality well and has larger tag overhead. If blocks are too big, you likely have a lot of data transfers into and out of the cache. There is a sweet spot around 32B-128B where you have the peak rate for block size.

Associativity

How many blocks can we map to the same index (or set)?

With larger associativity, you have a lower miss rate and less variation among programs. However, there are diminishing returns. With smaller associativity, you have a lower cost, and faster hit time. This is important for L1 caches. You always want a power of 2 for block size and set size so you have the full addressable range with binary.