Introduction to Computer Organization

Introduction to Pipelining

There are a few common performance metrics:

Response time: How long does a job take?
Throughput: How much work can get done within a certain time?

In general, if you increase the number of programs your throughput will get better and your response time will get worse.

Response time for a program is its execution time:

$$\text{Execution time} = \text{number of instructions}\text{CPI}\text{clock period}$$

The CPI is the number of clock cycles for an application. For multi-cycle datapath, the CPI is around 4-5 depending on how many lw commands you have.

To know the CPI for a program, you have to know how long each instruction takes, and what mix of instructions you have.

Pipelining

Pipelining is a way to improve the CPI. You find instructions, line them up, and execute them at the same time so you have multiple instructions in flight where each one is doing something different.

As you go to more pipelining stages, the clock period improves.

In our new datapath, there are five stages:

Fetch
- Use PC index to read instruction
- Increment PC
- Write instruction to IF/ID program register (to decode).
Decode
- Reads IF/ID register
- Pass instruction bits
- Pass PC + 1
- Pass contents of the registers to ID/EX register
Execute
- Performs ALU instructions and writes to EX/MEM register
- Returns PC + 1 + offset, register B, ALU result, and instruction bits
Memory
- Unless this is a load or store instruction, you really don't have to do anything.
- Opcode bits control read and write for load and store instructions.
Writeback
Write back
- Write back ALU result if it is an add or nand instruction, or to memory if it is a ld instruction.

What can go wrong?

Data hazards: Since register reads occur in stage 2, and register writes occur in stage 5, it is possible to read the wrong value if it is about to be written.
Control hazards: A branch instruction may change the PC, but not until stage 4. What do we fetch before that?
Exceptions: How do we handle them?

Data Hazards

Let's say you execute an add instruction, and then a nand instruction. You will be writing back at the same time as you decode an instruction. This could be a problem. Insert some noops to fix it. Problem: this makes your program larger. Another approach is that you do fetch, and then fetch, and you detect if there is a hazard and you hold one instruction in its state until it's ready.

The problems with detect and stall:

CPI increases each time a hazard is detected
It's not always necessary