12 Processor Structure and Function [Compatibility Mode]

(1)

Willia m St a llings

Com put e r Orga nizat ion a nd Archit e c t ure

8t h _{Edit ion}

Cha pt e r 1 2

(2)

CPU St ruc t ure

• CPU m ust :

—Fet ch inst ruct ions

—I nt erpret inst ruct ions

—Fet ch dat a

—Process dat a

(3)

(4)

(5)

Re gist e rs

• CPU m ust have som e w orking space

( t em porary st orage)

• Called regist ers

• Num ber and funct ion vary bet w een

processor designs

• One of t he m aj or design decisions

(6)

U se r V isible Re gist e rs

• General Purpose

• Dat a

• Address

(7)

Ge ne ra l Pur pose Re gist e rs (1 )

• May be t rue general purpose

• May be rest rict ed

• May be used for dat a or addressing

• Dat a

—Accum ulat or

• Addressing

(8)

Ge ne ra l Pur pose Re gist e rs (2 )

• Make t hem general purpose

—I ncrease flexibilit y and program m er opt ions

—I ncrease inst ruct ion size & com plexit y

• Make t hem specialized

—Sm aller ( fast er) inst ruct ions

(9)

H ow M a ny GP Re gist e rs?

• Bet w een 8 - 32

• Fewer = m ore m em ory references

• More does not reduce m em ory references

and t akes up processor real est at e

(10)

H ow big?

• Large enough t o hold full address

• Large enough t o hold full word

• Oft en possible t o com bine t wo dat a regist ers

—C program m ing

—double int a;

(11)

Condit ion Code Re gist e rs

• Set s of individual bit s

—e.g. result of last operat ion was zero

• Can be read ( im plicit ly) by program s

—e.g. Jum p if zero

(12)

Cont rol & St at us Re gist e rs

• Program Count er

• I nst ruct ion Decoding Regist er

• Mem ory Address Regist er

• Mem ory Buffer Regist er

(13)

Progra m St at us Word

• A set of bit s

• I ncludes Condit ion Codes

• Sign of last result

• Zero

• Carry

• Equal

• Overflow

• I nt errupt enable/ disable

(14)

Supe r visor M ode

• I nt el ring zero

• Kernel m ode

• Allows privileged inst ruct ions t o execut e

• Used by operat ing syst em

(15)

Ot he r Re gist e rs

• May have regist ers point ing t o:

—Process cont rol blocks ( see O/ S)

—I nt errupt Vect ors ( see O/ S)

• N.B. CPU design and operat ing syst em

(16)

(17)

I nst ruc t ion Cycle

• Revision

(18)

I ndire c t Cycle

• May require m em ory access t o fet ch

operands

• I ndirect addressing requires m ore m em ory accesses

(19)

(20)

(21)

Dat a Flow (I nst ruc t ion Fe t ch)

• Depends on CPU design

• I n general:

• Fet ch

—PC cont ains address of next inst ruct ion

—Address m oved t o MAR

—Address placed on address bus

—Cont rol unit request s m em ory read

—Result placed on dat a bus, copied t o MBR,

t hen t o I R

(22)

Dat a Flow (Dat a Fe t ch)

• I R is exam ined

• I f indirect addressing, indirect cycle is perform ed

—Right m ost N bit s of MBR t ransferred t o MAR

—Cont rol unit request s m em ory read

(23)

(24)

(25)

Dat a Flow (Exe c ut e )

• May t ake m any form s

• Depends on inst ruct ion being execut ed

• May include

—Mem ory read/ w rit e

—I nput / Out put

—Regist er t ransfers

(26)

(27)

(28)

Pre fe t ch

• Fet ch accessing m ain m em ory

• Execut ion usually does not access m ain

m em ory

• Can fet ch next inst ruct ion during execut ion of current inst ruct ion

(29)

I m prove d Pe rfor m a nc e

• But not doubled:

—Fet ch usually short er t han execut ion

– Prefet ch m ore t han one inst ruct ion?

—Any j um p or branch m eans t hat prefet ched

inst ruct ions are not t he required inst ruct ions

(30)

Pipe lining

• Fet ch inst ruct ion

• Decode inst ruct ion

• Calculat e operands ( i.e. EAs)

• Fet ch operands

• Execut e inst ruct ions

• Writ e result

(31)

(32)

T im ing Dia gra m for

(33)

(34)

(35)

(36)

(37)

Pipe line H a za rds

• Pipeline, or som e port ion of pipeline, m ust st all

• Also called pipeline bubble

• Types of hazards

—Resource

—Dat a

(38)

Re sourc e H a za rds

• Tw o ( or m ore) inst ruct ions in pipeline need sam e resource

• Execut ed in serial rat her t han parallel for part of pipeline

• Also called st ruct ural hazard

• E.g. Assum e sim plified five- st age pipeline

— Each st age t akes one clock cycle

• I deal case is new inst ruct ion ent ers pipeline each clock cycle

• Assum e m ain m em ory has single port

• Assum e inst ruct ion fet ches and dat a reads and writ es perform ed one at a t im e

• I gnore t he cache

• Operand read or w rit e cannot be perform ed in parallel w it h inst ruct ion fet ch

• Fet ch inst ruct ion st age m ust idle for one cycle fet ching I 3

• E.g. m ult iple inst ruct ions ready t o ent er execut e inst ruct ion phase

• Single ALU

• One solut ion: increase available resources

— Mult iple m ain m em ory port s

(39)

Dat a H a za rds

• Conflict in access of an operand locat ion

• Tw o inst ruct ions t o be execut ed in sequence

• Bot h access a part icular m em ory or regist er operand • I f in st rict sequence, no problem occurs

• I f in a pipeline, operand value could be updat ed so as t o produce different result from st rict sequent ial execut ion • E.g. x86 m achine inst ruct ion sequence:

• ADD EAX, EBX / * EAX = EAX + EBX

(40)

(41)

Type s of Dat a H a za rd

• Read aft er w rit e ( RAW) , or t rue dependency

—An inst ruct ion m odifies a regist er or m em ory locat ion —Succeeding inst ruct ion reads dat a in t hat locat ion

—Hazard if read t akes place before writ e com plet e

• Writ e aft er read ( RAW) , or ant idependency

—An inst ruct ion reads a regist er or m em ory locat ion —Succeeding inst ruct ion writ es t o locat ion

—Hazard if writ e com plet es before read t akes place

• Writ e aft er writ e ( RAW) , or out put dependency

—Tw o inst ruct ions bot h writ e t o sam e locat ion

—Hazard if writ es t ake place in reverse of order int ended sequence

• Previous exam ple is RAW hazard

(42)

(43)

C

o

n

tr

o

l

H

a

z

a

(44)

Cont rol H a za rd

• Also known as branch hazard

• Pipeline m akes wrong decision on branch

predict ion

• Brings inst ruct ions int o pipeline t hat m ust subsequent ly be discarded

• Dealing wit h Branches

—Mult iple St ream s

—Prefet ch Branch Target

—Loop buffer

—Branch predict ion

(45)

M ult iple St re a m s

• Have t wo pipelines

• Prefet ch each branch int o a separat e

pipeline

• Use appropriat e pipeline

• Leads t o bus & regist er cont ent ion

(46)

Pre fe t ch Bra nch Ta rge t

• Target of branch is prefet ched in addit ion t o inst ruct ions following branch

• Keep t arget unt il branch is execut ed

(47)

Loop Buffe r

• Very fast m em ory

• Maint ained by fet ch st age of pipeline

• Check buffer before fet ching from

m em ory

• Very good for sm all loops or j um ps

• c.f. cache

(48)

(49)

Bra nch Pre dic t ion (1 )

• Predict never t aken

—Assum e t hat j um p will not happen

—Always fet ch next inst ruct ion

—68020 & VAX 11/ 780

—VAX will not prefet ch aft er branch if a page

fault would result ( O/ S v CPU design)

• Predict always t aken

—Assum e t hat j um p will happen

(50)

• Predict by Opcode

—Som e inst ruct ions are m ore likely t o result in a

j um p t han t hers

—Can get up t o 75% success

• Taken/ Not t aken swit ch

—Based on previous hist ory

—Good for loops

—Refined by t wo- level or correlat ion- based

branch hist ory

• Correlat ion- based

—I n loop- closing branches, hist ory is good

predict or

—I n m ore com plex st ruct ures, branch direct ion

correlat es w it h t hat of relat ed branches

(51)

• Delayed Branch

—Do not t ake j um p unt il you have t o

(52)

(53)

(54)

(55)

I nt e l 8 0 4 8 6 Pipe lining

• Fet ch

— From cache or ext ernal m em ory

— Put in one of t w o 16- byt e prefet ch buffers

— Fill buffer w it h new dat a as soon as old dat a consum ed

— Average 5 inst ruct ions fet ched per load

— I ndependent of ot her st ages t o keep buffers full

• Decode st age 1

— Opcode & address- m ode info

— At m ost first 3 byt es of inst ruct ion

— Can direct D2 st age t o get rest of inst ruct ion

• Decode st age 2

— Expand opcode int o cont rol signals

— Com put at ion of com plex address m odes

• Execut e

— ALU operat ions, cache access, regist er updat e

• Writ eback

— Updat e regist ers & flags

(56)

(57)

(58)

(59)

(60)

M M X Re gist e r M a pping

• MMX uses several 64 bit dat a t ypes

• Use 3 bit regist er address fields

—8 regist ers

• No MMX specific regist ers

—Aliasing t o lower 64 bit s of exist ing float ing

(61)

(62)

Pe nt ium I nt e rrupt Proc e ssing

• I nt errupt s

—Maskable

—Nonm askable

• Except ions

—Processor det ect ed

—Program m ed

• I nt errupt vect or t able

—Each int errupt t ype assigned a num ber

—I ndex t o vect or t able

—256 * 32 bit int errupt vect ors

(63)

ARM At t ribut e s

• RI SC

• Moderat e array of uniform regist ers

—More t han m ost CI SC, less t han m any RI SC

• Load/ st ore m odel

—Operat ions perform on operands in regist ers only

• Uniform fixed- lengt h inst ruct ion

—32 bit s st andard set 16 bit s Thum b

• Shift or rot at ion can preprocess source regist ers

—Separat e ALU and shift er unit s

• Sm all num ber of addressing m odes

—All load/ st ore addressees from regist ers and inst ruct ion fields

—No indirect or indexed addressing involving values in m em ory

• Aut o- increm ent and aut o- decrem ent addressing

—I m prove loops

• Condit ional execut ion of inst ruct ions m inim izes

(64)

(65)

ARM Proc e ssor Orga nizat ion

• Many variat ions depending on ARM version

• Dat a exchanged bet ween processor and m em ory

t hrough dat a bus

• Dat a it em ( load/ st ore) or inst ruct ion ( fet ch)

• I nst ruct ions go t hrough decoder before execut ion

• Pipeline and cont rol signal generat ion in cont rol

unit

• Dat a goes t o regist er file

—Set of 32 bit regist ers

—Byt e & halfword t wos com plem ent dat a sign ext ended

• Typically t wo source and one result regist er

(66)

ARM Proc e ssor M ode s

• User

• Privileged

—6 m odes

– OS can t ailor syst em s soft ware use

– Som e regist ers dedicat ed t o each privileged m ode – Sw ift er cont ext changes

• Except ion

—5 of privileged m odes

—Ent ered on given except ions

—Subst it ut e som e regist ers for user regist ers

(67)

Privile ge d M ode s

— Soft ware int errupt usedd t o invoke operat ing syst em services

• Abort m ode

— I nt errupt signal from designat ed fast int errupt source

— Fast int errupt cannot be int errupt ed

— May int errupt norm al int errupt

• I nt errupt m ode

(68)

ARM

User System Supervisor Abort Undefined Interrupt Fast Interrupt

R0 R0 R0 R0 R0 R0 R0 R10 R10 R10 R10 R10 R10 R10_fiq R11 R11 R11 R11 R11 R11 R11_fiq R12 R12 R12 R12 R12 R12 R12_fiq R13 (SP) R13 (SP) R13_svc R13_abt R13_und R13_irq R13_fiq R14 (LR) R14 (LR) R14_svc R14_abt R14_und R14_irq R14_fiq R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC)

(69)

ARM Re gist e r Orga nizat ion

• 37 x 32- bit regist ers

• 31 general- purpose regist ers

—Som e have special purposes

—E.g. program count ers

• Six program st at us regist ers

• Regist ers in part ially overlapping banks

—Processor m ode det erm ines bank

• 16 num bered regist ers and one or t wo

(70)

Ge ne ra l Re gist e r U sa ge

• R13 norm ally st ack point er ( SP)

—Each except ion m ode has it s own R13

• R14 link regist er ( LR)

—Subrout ine and except ion m ode ret urn

address

(71)

CPSR

• CPSR process st at us regist er

—Except ion m odes have dedicat ed SPSR

• 16 m sb are user flags

—Condit ion codes ( N,Z,C,V)

—Q – overflow or sat urat ion in som e SMI D

inst ruct ions

—J – Jazelle ( 8 bit ) inst ruct ions

—GEE[ 3: 0] SMI D use [ 19: 16] as great er t han or

equal flag

• 16 lsb syst em flags for privilege m odes

—E – endian

—I nt errupt disable

—T – Norm al or Thum b inst ruct ion

(72)

(73)

ARM I nt e rrupt (Exc e pt ion) Proc e ssing

• More t han one except ion allowed

• Seven t ypes

• Execut ion forced from except ion vect ors

• Mult iple except ions handled in priorit y order

• Processor halt s execut ion aft er current inst ruct ion

• Processor st at e preserved in SPSR for

except ion

—Address of inst ruct ion about t o execut e put in

link regist er

—Ret urn by m oving SPSR t o CPSR and R14 t o

(74)

Fore ground Re a ding

• Processor exam ples

• St allings Chapt er 12