Lecture 11:
Exceptions and Pipelining Basics
Professor Mike Schulte Computer Architecture
ECE 201
Exceptions
°Exception = unprogrammed control transfer
• system takes action to handle the exception
- must record the address of the offending instruction - record any other information necessary to return afterwards • returns control to user
• must save & restore user state user program
normal control flow:
sequential, jumps, branches, calls, returns
System Exception Handler Exception:
Two Types of Exceptions
°Interrupts
• caused by external events
• asynchronous to program execution • may be handled between instructions • simply suspend and resume user program
°Traps (or Exceptions) • caused by internal events
- exceptional conditions (overflow) - invalid instruction
- faults (non-resident page in memory) - internal hardware error
• synchronous to program execution • condition must be remedied by the handler
• instruction may be retried or simulated and program continued or program may be aborted
MIPS convention
° Exception means any unexpected change in control flow, without distinguishing internal or external;
° Use the term interrupt only when the event is externally caused. Type of event From where? MIPS terminology I/O device request External Interrupt
Invoke OS from user program Internal Exception Arithmetic overflow Internal Exception Using an undefined instruction Internal Exception Hardware malfunctions Either Exception or
Additions to MIPS ISA to support Exceptions?
° EPC–a 32-bit register used to hold the address of the affected instruction.
° Cause–a register used to record the cause of the exception. To
simplify the discussion, assume
• undefined instruction=0
• arithmetic overflow=1
° Status - interrupt mask and enable bits and determines what
exceptions can occur.
° Control signals to write EPC , Cause, and Status
° Be able to write exception address into PC, increase mux set PC to exception address (C000 0000hex).
° May have to undo PC = PC + 4, since want EPC to point to offending instruction (not its successor); PC = PC - 4
Big Picture: user / system modes
°By providing two modes of execution (user/system) it is possible for the computer to manage itself
• operating system is a special program that runs in the
priviledged mode and has access to all of the resources of the computer
• presents “virtual resources” to each user that are more convenient that the physical resurces
- files vs. disk sectors
- virtual memory vs physical memory • protects each user program from others
°Exceptions allow the system to taken action in response to events that occur while user program is executing
How Control Detects Exceptions in our FSD
°Undefined Instruction–detected when no next state is defined from state 1 for the op value.
• We handle this exception by defining the next state value for all op values other than lw, sw, 0 (R-type), jmp, beq, and ori as new state 12.
• Shown symbolically using “other” to indicate that the op field does not match any of the opcodes that label arcs out of state 1.
°Arithmetic overflow– included logic in the ALU to detect overflow, and a signal called Overflow is provided as an output from the ALU. This signal is used in the modified finite state machine to specify an additional possible next state
°Note: Challenge in designing control of a real
machine is to handle different interactions between instructions and other exception-causing events such that control logic remains small and fast.
• Complex interactions makes the control unit the most challenging aspect of hardware design
Modification to the Control Specification
IR <= MEM[PC] PC <= PC + 4 PC <= exp_addr cause <= 0 (UI)
EPC <= PC - 4 PC <= exp_addr cause <=1 (Ovf) overflow
Pipelining is Natural!
°Pipelining provides a method for executing multiple
instructions at the same time.
°Laundry Example
°Ann, Brian, Cathy, Dave
each have one load of clothes to wash, dry, and fold
°Washer takes 30 minutes
°Dryer takes 40 minutes
°“Folder” takes 20 minutes
A B C D
Sequential Laundry
°Sequential laundry takes 6 hours for 4 loads
°If they learned pipelining, how long would laundry take?
A
B
C
D
30 40 20 30 40 20 30 40 20 30 40 20
6 PM 7 8 9 10 11 Midnight
T a s k
O r d e r
Pipelined Laundry: Start work ASAP
°Pipelined laundry takes 3.5 hours for 4 loads
A
° Pipelining doesn’t help
latency of single task, it helps throughput of entire workload
° Pipeline rate limited by
slowest pipeline stage
° Multiple tasks operating simultaneously using different resources
° Potential speedup =
Number pipe stages
° Unbalanced lengths of pipe stages reduces speedup
° Time to “fill” pipeline and time to “drain” it reduces speedup
° Stall for Dependences
The Five Stages of the Load Instruction
°Ifetch: Instruction Fetch
• Fetch the instruction from the Instruction Memory
°Reg/Dec: Registers Fetch and Instruction Decode
°Exec: Calculate the memory address
°Mem: Read the data from the Data Memory
°Wr: Write the data back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Ifetch Reg/Dec Exec Mem Wr Load
Pipelined Execution
°On a processor multple instructions are in various stages at the same time.
°Assume each instruction takes five cycles
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB Program Flow
Single Cycle, Multiple Cycle, vs. Pipeline
Clk Cycle 1
Multiple Cycle Implementation:
Ifetch Reg Exec Mem Wr
Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Load Ifetch Reg Exec Mem Wr
Ifetch Reg Exec Mem
Load Store
Pipeline Implementation:
Ifetch Reg Exec Mem Wr Store
Clk
Single Cycle Implementation:
Load Store Waste
Ifetch R-type
Ifetch Reg Exec Mem Wr R-type
Cycle 1 Cycle 2
Graphically Representing Pipelines
°Can help with answering questions like:
• How many cycles does it take to execute this code? • What is the ALU doing during cycle 4?
• Are two instructions trying to use the same resource at the same time?
I n s t r.
Time (clock cycles)
Inst 0
Inst 1
ALU
Im Reg Dm Reg
ALU
Why Pipeline? Because the resources are there!
Time (clock cycles)
Inst 0
• 100 instructions are executed
• The single cycle machine has a cycle time of 45 ns
• The multicycle and pipeline machines have cycle times of 10 ns • The multicycle machine has a CPI of 4.6
°Single Cycle Machine
• 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
°Multicycle Machine
• 10 ns/cycle x 4.6 CPI x 100 inst = 4600 ns
°Ideal pipelined machine
• 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
°Ideal pipelined vs. single cycle speedup
• 4500 ns / 1040 ns = 4.33
Can pipelining get us into trouble?
°Yes: Pipeline Hazards
• structural hazards: attempt to use the same resource two different ways at the same time
- E.g., two instructions try to read the same memory at the same time
• data hazards: attempt to use item before it is ready
- instruction depends on result of prior instruction still in the pipeline
add r1, r2, r3 sub r4, r2, r1
• control hazards: attempt to make a decision before condition is evaulated
- branch instructions beq r1, loop add r1, r2, r3
°Can always resolve hazards by waiting • pipeline control must detect the hazard
• take action (or delay action) to resolve hazards
Mem
Single Memory is a Structural Hazard
I
Time (clock cycles)
Load
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
ALU
Reg Mem Reg
ALU
Mem Reg Mem Reg
Structural Hazards limit performance
°Example: if 1.3 memory accesses per instruction and only one memory access per cycle then
• average CPI = 1.3
• otherwise resource is more than 100% utilized
°Solution 1: Use separate instruction and data memories
°Solution 2: Allow memory to read and write more than one word per cycle
°Solution 3: Stall
°Stall: wait until decision is clear
• Its possible to move up decision to 2nd stage by adding hardware to check registers as being read
°Impact: 2 clock cycles per branch instruction => slow
Control Hazard Solutions
I n s t r.
O r d e r
Time (clock cycles)
Add
Beq
Load
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
ALU
Reg Mem Reg
°Predict: guess one direction then back up if wrong
• Predict not taken
°Impact: 1 clock cycle per branch instruction if right, 2 if wrong (right - 50% of time)
°More dynamic scheme: history of 1 branch (- 90%) Control Hazard Solutions
I
Time (clock cycles)
Add
Beq
Load
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
Mem
ALU
Reg Mem Reg
°Redefine branch behavior (takes place after next instruction) “delayed branch”
°Impact: 1 clock cycles per branch instruction if can find instruction to put in “slot” (- 50% of time)
°Launch more instructions per clock cycle=>less useful Control Hazard Solutions
I
Time (clock cycles)
Add
Beq
Misc
ALU
Mem Reg Mem Reg
ALU
Mem Reg Mem Reg
Mem
ALU
Reg Mem Reg
Load
MemALU
Data Hazard on r1
add
r1
,r2,r3
sub r4,
r1
,r3
and r6,
r1
,r7
or r8,
r1
,r9
xor r10,
r1
,r11
Problem: r1 cannot be read by other instructions before it is written by the add.
• Dependencies backwards in time are hazards Data Hazard on r1:
I n s t r.
O r d e r
Time (clock cycles)
add
r1
,r2,r3
sub r4,
r1
,r3
and r6,
r1
,r7
or r8,
r1
,r9
xor r10,
r1
,r11
IF ID/RF ALUEX MEM WB
Im Reg Dm Reg
ALU
Im Reg Dm Reg
ALU
Im Reg Dm Reg
Im
ALU
Reg Dm Reg
ALU
•“Forward” result from one stage to another
• “or” OK if define read/write properly Data Hazard Solution:
I
Time (clock cycles)
add
r1
,r2,r3
• Dependencies backwards in time are hazards
• Can’t solve with forwarding:
• Must delay/stall instruction dependent on loads Forwarding (or Bypassing): What about Loads
Time (clock cycles)