Willia m St a llings
Com put e r Orga nizat ion a nd Archit e c t ure
8t h Edit ion
Cha pt e r 1 2
CPU St ruc t ure
• CPU m ust :
—Fet ch inst ruct ions
—I nt erpret inst ruct ions
—Fet ch dat a
—Process dat a
Re gist e rs
• CPU m ust have som e w orking space
( t em porary st orage)
• Called regist ers
• Num ber and funct ion vary bet w een
processor designs
• One of t he m aj or design decisions
U se r V isible Re gist e rs
• General Purpose
• Dat a
• Address
Ge ne ra l Pur pose Re gist e rs (1 )
• May be t rue general purpose
• May be rest rict ed
• May be used for dat a or addressing
• Dat a
—Accum ulat or
• Addressing
Ge ne ra l Pur pose Re gist e rs (2 )
• Make t hem general purpose
—I ncrease flexibilit y and program m er opt ions
—I ncrease inst ruct ion size & com plexit y
• Make t hem specialized
—Sm aller ( fast er) inst ruct ions
H ow M a ny GP Re gist e rs?
• Bet w een 8 - 32
• Fewer = m ore m em ory references
• More does not reduce m em ory references
and t akes up processor real est at e
H ow big?
• Large enough t o hold full address
• Large enough t o hold full word
• Oft en possible t o com bine t wo dat a regist ers
—C program m ing
—double int a;
Condit ion Code Re gist e rs
• Set s of individual bit s
—e.g. result of last operat ion was zero
• Can be read ( im plicit ly) by program s
—e.g. Jum p if zero
Cont rol & St at us Re gist e rs
• Program Count er
• I nst ruct ion Decoding Regist er
• Mem ory Address Regist er
• Mem ory Buffer Regist er
Progra m St at us Word
• A set of bit s
• I ncludes Condit ion Codes
• Sign of last result
• Zero
• Carry
• Equal
• Overflow
• I nt errupt enable/ disable
Supe r visor M ode
• I nt el ring zero
• Kernel m ode
• Allows privileged inst ruct ions t o execut e
• Used by operat ing syst em
Ot he r Re gist e rs
• May have regist ers point ing t o:
—Process cont rol blocks ( see O/ S)
—I nt errupt Vect ors ( see O/ S)
• N.B. CPU design and operat ing syst em
I nst ruc t ion Cycle
• Revision
I ndire c t Cycle
• May require m em ory access t o fet ch
operands
• I ndirect addressing requires m ore m em ory accesses
Dat a Flow (I nst ruc t ion Fe t ch)
• Depends on CPU design
• I n general:
• Fet ch
—PC cont ains address of next inst ruct ion
—Address m oved t o MAR
—Address placed on address bus
—Cont rol unit request s m em ory read
—Result placed on dat a bus, copied t o MBR,
t hen t o I R
Dat a Flow (Dat a Fe t ch)
• I R is exam ined
• I f indirect addressing, indirect cycle is perform ed
—Right m ost N bit s of MBR t ransferred t o MAR
—Cont rol unit request s m em ory read
Dat a Flow (Exe c ut e )
• May t ake m any form s
• Depends on inst ruct ion being execut ed
• May include
—Mem ory read/ w rit e
—I nput / Out put
—Regist er t ransfers
Pre fe t ch
• Fet ch accessing m ain m em ory
• Execut ion usually does not access m ain
m em ory
• Can fet ch next inst ruct ion during execut ion of current inst ruct ion
I m prove d Pe rfor m a nc e
• But not doubled:
—Fet ch usually short er t han execut ion
– Prefet ch m ore t han one inst ruct ion?
—Any j um p or branch m eans t hat prefet ched
inst ruct ions are not t he required inst ruct ions
Pipe lining
• Fet ch inst ruct ion
• Decode inst ruct ion
• Calculat e operands ( i.e. EAs)
• Fet ch operands
• Execut e inst ruct ions
• Writ e result
T im ing Dia gra m for
Pipe line H a za rds
• Pipeline, or som e port ion of pipeline, m ust st all
• Also called pipeline bubble
• Types of hazards
—Resource
—Dat a
Re sourc e H a za rds
• Tw o ( or m ore) inst ruct ions in pipeline need sam e resource
• Execut ed in serial rat her t han parallel for part of pipeline
• Also called st ruct ural hazard
• E.g. Assum e sim plified five- st age pipeline
— Each st age t akes one clock cycle
• I deal case is new inst ruct ion ent ers pipeline each clock cycle
• Assum e m ain m em ory has single port
• Assum e inst ruct ion fet ches and dat a reads and writ es perform ed one at a t im e
• I gnore t he cache
• Operand read or w rit e cannot be perform ed in parallel w it h inst ruct ion fet ch
• Fet ch inst ruct ion st age m ust idle for one cycle fet ching I 3
• E.g. m ult iple inst ruct ions ready t o ent er execut e inst ruct ion phase
• Single ALU
• One solut ion: increase available resources
— Mult iple m ain m em ory port s
Dat a H a za rds
• Conflict in access of an operand locat ion
• Tw o inst ruct ions t o be execut ed in sequence
• Bot h access a part icular m em ory or regist er operand • I f in st rict sequence, no problem occurs
• I f in a pipeline, operand value could be updat ed so as t o produce different result from st rict sequent ial execut ion • E.g. x86 m achine inst ruct ion sequence:
• ADD EAX, EBX / * EAX = EAX + EBX
Type s of Dat a H a za rd
• Read aft er w rit e ( RAW) , or t rue dependency
—An inst ruct ion m odifies a regist er or m em ory locat ion —Succeeding inst ruct ion reads dat a in t hat locat ion
—Hazard if read t akes place before writ e com plet e
• Writ e aft er read ( RAW) , or ant idependency
—An inst ruct ion reads a regist er or m em ory locat ion —Succeeding inst ruct ion writ es t o locat ion
—Hazard if writ e com plet es before read t akes place
• Writ e aft er writ e ( RAW) , or out put dependency
—Tw o inst ruct ions bot h writ e t o sam e locat ion
—Hazard if writ es t ake place in reverse of order int ended sequence
• Previous exam ple is RAW hazard
C
o
n
tr
o
l
H
a
z
a
Cont rol H a za rd
• Also known as branch hazard
• Pipeline m akes wrong decision on branch
predict ion
• Brings inst ruct ions int o pipeline t hat m ust subsequent ly be discarded
• Dealing wit h Branches
—Mult iple St ream s
—Prefet ch Branch Target
—Loop buffer
—Branch predict ion
M ult iple St re a m s
• Have t wo pipelines
• Prefet ch each branch int o a separat e
pipeline
• Use appropriat e pipeline
• Leads t o bus & regist er cont ent ion
Pre fe t ch Bra nch Ta rge t
• Target of branch is prefet ched in addit ion t o inst ruct ions following branch
• Keep t arget unt il branch is execut ed
Loop Buffe r
• Very fast m em ory
• Maint ained by fet ch st age of pipeline
• Check buffer before fet ching from
m em ory
• Very good for sm all loops or j um ps
• c.f. cache
Bra nch Pre dic t ion (1 )
• Predict never t aken
—Assum e t hat j um p will not happen
—Always fet ch next inst ruct ion
—68020 & VAX 11/ 780
—VAX will not prefet ch aft er branch if a page
fault would result ( O/ S v CPU design)
• Predict always t aken
—Assum e t hat j um p will happen
Bra nch Pre dic t ion (2 )
• Predict by Opcode
—Som e inst ruct ions are m ore likely t o result in a
j um p t han t hers
—Can get up t o 75% success
• Taken/ Not t aken swit ch
—Based on previous hist ory
—Good for loops
—Refined by t wo- level or correlat ion- based
branch hist ory
• Correlat ion- based
—I n loop- closing branches, hist ory is good
predict or
—I n m ore com plex st ruct ures, branch direct ion
correlat es w it h t hat of relat ed branches
Bra nch Pre dic t ion (3 )
• Delayed Branch
—Do not t ake j um p unt il you have t o
I nt e l 8 0 4 8 6 Pipe lining
• Fet ch
— From cache or ext ernal m em ory
— Put in one of t w o 16- byt e prefet ch buffers
— Fill buffer w it h new dat a as soon as old dat a consum ed
— Average 5 inst ruct ions fet ched per load
— I ndependent of ot her st ages t o keep buffers full
• Decode st age 1
— Opcode & address- m ode info
— At m ost first 3 byt es of inst ruct ion
— Can direct D2 st age t o get rest of inst ruct ion
• Decode st age 2
— Expand opcode int o cont rol signals
— Com put at ion of com plex address m odes
• Execut e
— ALU operat ions, cache access, regist er updat e
• Writ eback
— Updat e regist ers & flags
M M X Re gist e r M a pping
• MMX uses several 64 bit dat a t ypes
• Use 3 bit regist er address fields
—8 regist ers
• No MMX specific regist ers
—Aliasing t o lower 64 bit s of exist ing float ing
Pe nt ium I nt e rrupt Proc e ssing
• I nt errupt s
—Maskable
—Nonm askable
• Except ions
—Processor det ect ed
—Program m ed
• I nt errupt vect or t able
—Each int errupt t ype assigned a num ber
—I ndex t o vect or t able
—256 * 32 bit int errupt vect ors
ARM At t ribut e s
• RI SC
• Moderat e array of uniform regist ers
—More t han m ost CI SC, less t han m any RI SC
• Load/ st ore m odel
—Operat ions perform on operands in regist ers only
• Uniform fixed- lengt h inst ruct ion
—32 bit s st andard set 16 bit s Thum b
• Shift or rot at ion can preprocess source regist ers
—Separat e ALU and shift er unit s
• Sm all num ber of addressing m odes
—All load/ st ore addressees from regist ers and inst ruct ion fields
—No indirect or indexed addressing involving values in m em ory
• Aut o- increm ent and aut o- decrem ent addressing
—I m prove loops
• Condit ional execut ion of inst ruct ions m inim izes
ARM Proc e ssor Orga nizat ion
• Many variat ions depending on ARM version
• Dat a exchanged bet ween processor and m em ory
t hrough dat a bus
• Dat a it em ( load/ st ore) or inst ruct ion ( fet ch)
• I nst ruct ions go t hrough decoder before execut ion
• Pipeline and cont rol signal generat ion in cont rol
unit
• Dat a goes t o regist er file
—Set of 32 bit regist ers
—Byt e & halfword t wos com plem ent dat a sign ext ended
• Typically t wo source and one result regist er
ARM Proc e ssor M ode s
• User
• Privileged
—6 m odes
– OS can t ailor syst em s soft ware use
– Som e regist ers dedicat ed t o each privileged m ode – Sw ift er cont ext changes
• Except ion
—5 of privileged m odes
—Ent ered on given except ions
—Subst it ut e som e regist ers for user regist ers
Privile ge d M ode s
— Soft ware int errupt usedd t o invoke operat ing syst em services
• Abort m ode
— I nt errupt signal from designat ed fast int errupt source
— Fast int errupt cannot be int errupt ed
— May int errupt norm al int errupt
• I nt errupt m ode
ARM
User System Supervisor Abort Undefined Interrupt Fast Interrupt
R0 R0 R0 R0 R0 R0 R0 R10 R10 R10 R10 R10 R10 R10_fiq R11 R11 R11 R11 R11 R11 R11_fiq R12 R12 R12 R12 R12 R12 R12_fiq R13 (SP) R13 (SP) R13_svc R13_abt R13_und R13_irq R13_fiq R14 (LR) R14 (LR) R14_svc R14_abt R14_und R14_irq R14_fiq R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC) R15 (PC)
ARM Re gist e r Orga nizat ion
• 37 x 32- bit regist ers
• 31 general- purpose regist ers
—Som e have special purposes
—E.g. program count ers
• Six program st at us regist ers
• Regist ers in part ially overlapping banks
—Processor m ode det erm ines bank
• 16 num bered regist ers and one or t wo
Ge ne ra l Re gist e r U sa ge
• R13 norm ally st ack point er ( SP)
—Each except ion m ode has it s own R13
• R14 link regist er ( LR)
—Subrout ine and except ion m ode ret urn
address
CPSR
• CPSR process st at us regist er
—Except ion m odes have dedicat ed SPSR
• 16 m sb are user flags
—Condit ion codes ( N,Z,C,V)
—Q – overflow or sat urat ion in som e SMI D
inst ruct ions
—J – Jazelle ( 8 bit ) inst ruct ions
—GEE[ 3: 0] SMI D use [ 19: 16] as great er t han or
equal flag
• 16 lsb syst em flags for privilege m odes
—E – endian
—I nt errupt disable
—T – Norm al or Thum b inst ruct ion
ARM I nt e rrupt (Exc e pt ion) Proc e ssing
• More t han one except ion allowed
• Seven t ypes
• Execut ion forced from except ion vect ors
• Mult iple except ions handled in priorit y order
• Processor halt s execut ion aft er current inst ruct ion
• Processor st at e preserved in SPSR for
except ion
—Address of inst ruct ion about t o execut e put in
link regist er
—Ret urn by m oving SPSR t o CPSR and R14 t o
Fore ground Re a ding
• Processor exam ples
• St allings Chapt er 12