• Tidak ada hasil yang ditemukan

William Stallings Computer Organization and Architecture 7th Edition Chapter 4 Cache Memory Characteristics - Cache Memory

N/A
N/A
Protected

Academic year: 2018

Membagikan "William Stallings Computer Organization and Architecture 7th Edition Chapter 4 Cache Memory Characteristics - Cache Memory"

Copied!
10
0
0

Teks penuh

(1)

Willia m St a llings Com put e r Orga niza t ion a nd Arc hit e c t ure 7 t h Edit ion

Cha pt e r 4 Ca c he M e m ory

Cha ra c t e rist ic s

• Locat ion

• Capacit y

• Unit of t r ansfer

• Access m et hod

• Per for m ance

• Physical t ype

• Physical char act er ist ics

• Or ganisat ion

Loc a t ion

• CPU

• I nt er nal

• Ext er nal

Ca pa c it y

• Wor d size

—The nat ural unit of organisat ion • Num ber of w or ds

—or Byt es

U nit of T ra nsfe r

• I nt er nal

—Usually governed by dat a bus w idt h • Ext er nal

—Usually a block w hich is m uch larger t han a w ord

• Addr essable unit

—Sm allest locat ion w hich can be uniquely addressed

—Word int ernally —Clust er on M$ disks

Ac c e ss M e t hods (1 )

• Sequent ial

—St art at t he beginning and read t hrough in order

—Access t im e depends on locat ion of dat a and previous locat ion

—e. g. t ape • Dir ect

—I ndividual blocks have unique address —Access is by j um ping t o vicinit y plus

sequent ial search

(2)

Ac c e ss M e t hods (2 )

• Random

—I ndividual addresses ident ify locat ions exact ly —Access t im e is independent of locat ion or

previous access —e. g. RAM • Associat ive

—Dat a is locat ed by a com parison w it h cont ent s of a port ion of t he st ore

—Access t im e is independent of locat ion or previous access

—e. g. cache

M e m ory H ie ra rc hy

• Regist er s

—I n CPU

• I nt er nal or Main m em or y

—May include one or m ore levels of cache —“ RAM”

• Ext er nal m em or y

—Backing st ore

M e m ory H ie ra rc hy - Dia gra m Pe rform a nc e

• Access t im e

—Tim e bet w een present ing t he address and get t ing t he valid dat a

• Mem or y Cycle t im e

—Tim e m ay be required for t he m em ory t o “ recover” before next access

—Cycle t im e is access + recovery • Tr ansfer Rat e

—Rat e at w hich dat a can be m oved

Physic a l T ype s

• Sem iconduct or

—RAM

• Magnet ic

—Disk & Tape • Opt ical

—CD & DVD

Physic a l Cha ra c t e rist ic s

• Decay

• Volat ilit y

• Er asable

(3)

Orga nisa t ion

• Physical ar r angem ent of bit s int o w or ds

• Not alw ays obvious

• e.g. int er leaved

T he Bot t om Line

• How m uch?

—Capacit y • How fast ?

—Tim e is m oney • How expensive?

H ie ra rc hy List

• Regist er s

• L1 Cache

• L2 Cache

• Main m em or y

• Disk cache

• Disk

• Opt ical

• Tape

So you w a nt fa st ?

• I t is possible t o build a com put er w hich uses only st at ic RAM ( see lat er )

• This w ould be ver y fast

• This w ould need no cache

—How can you cache cache?

• This w ould cost a ver y lar ge am ount

Loc a lit y of Re fe re nc e

• Dur ing t he cour se of t he execut ion of a pr ogr am , m em or y r efer ences t end t o clust er

• e.g. loops

Ca c he

• Sm all am ount of fast m em or y

• Sit s bet w een nor m al m ain m em or y and CPU

(4)

Ca c he /M a in M e m ory St ruc t ure Ca c he ope ra t ion – ove rvie w

• CPU r equest s cont ent s of m em or y locat ion

• Check cache for t his dat a

• I f pr esent , get fr om cache ( fast )

• I f not pr esent , r ead r equir ed block fr om m ain m em or y t o cache

• Then deliver fr om cache t o CPU

• Cache includes t ags t o ident ify w hich block of m ain m em or y is in each cache slot

Ca c he Re a d Ope ra t ion - Flow c ha rt Ca c he De sign

• Size

• Mapping Funct ion

• Replacem ent Algor it hm

• Wr it e Policy

• Block Size

• Num ber of Caches

Size doe s m a t t e r

• Cost

—More cache is expensive • Speed

—More cache is fast er ( up t o a point ) —Checking cache for dat a t akes t im e

(5)

Com pa rison of Ca c he Size s

Processor Type IntroductionYear of L1 cachea L2 cache L3 cache

IBM 360/85 Mainframe 1968 16 to 32 KB — —

PDP-11/70 Minicomputer 1975 1 KB — —

VAX 11/780 Minicomputer 1978 16 KB — —

IBM 3033 Mainframe 1978 64 KB — —

IBM 3090 Mainframe 1985 128 to 256 KB — —

Intel 80486 PC 1989 8 KB — —

Pentium PC 1993 8 KB/8 KB 256 to 512 KB —

PowerPC 601 PC 1993 32 KB — —

PowerPC 620 PC 1996 32 KB/32 KB — —

PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB

IBM S/390 G6 Mainframe 1999 256 KB 8 MB —

Pentium 4 PC/server 2000 8 KB/8 KB 256 KB — IBM SP High-end server/ supercomputer 2000 64 KB/32 KB 8 MB —

CRAY MTAb Supercomputer 2000 8 KB 2 MB —

Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —

Itanium 2 PC/server 2002 32 KB 256 KB 6 MB

IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —

M a pping Func t ion

• Cache of 64kByt e

• Cache block of 4 byt es

—i. e. cache is 16k ( 214) lines of 4 byt es

• 16MByt es m ain m em or y

• 24 bit addr ess

—( 224= 16M)

Dire c t M a pping

• Each block of m ain m em or y m aps t o only one cache line

—i. e. if a block is in cache, it m ust be in one specific place

• Addr ess is in t w o par t s

• Least Significant w bit s ident ify unique w or d

• Most Significant s bit s specify one m em or y block

• The MSBs ar e split int o a cache line field r and a t ag of s- r ( m ost significant )

Dire c t M a pping Addre ss St ruc t ure

Tag s-r Line or Slot r Word w

8 14 2

• 24 bit address

• 2 bit w ord identifier ( 4 byte block)

• 22 bit block identifier

—8 bit t ag ( = 22- 14)

—14 bit slot or line

• No tw o blocks in the sam e line have the sam e Tag field

• Check contents of cache by finding line and checking Tag

Dire c t M a pping Ca c he Line T a ble

• Cache line Main Mem ory blocks held

• 0 0, m , 2m , 3m …2s- m

• 1 1, m + 1, 2m + 1…2s- m + 1

• m - 1 m - 1, 2m - 1, 3m - 1…2s- 1

(6)

Dire c t M a pping

Ex a m ple Dire c t M a pping Sum m a ry

• Addr ess lengt h = ( s + w ) bit s

• Num ber of addr essable unit s = 2s+ w w or ds or byt es

• Block size = line size = 2w w or ds or byt es

• Num ber of blocks in m ain m em or y = 2s+ w / 2w = 2s

• Num ber of lines in cache = m = 2r

• Size of t ag = ( s – r ) bit s

Dire c t M a pping pros & c ons

• Sim ple

• I nexpensive

• Fixed locat ion for given block

—I f a program accesses 2 blocks t hat m ap t o t he sam e line repeat edly, cache m isses are very high

Assoc ia t ive M a pping

• A m ain m em or y block can load int o any line of cache

• Mem or y addr ess is int er pr et ed as t ag and w or d

• Tag uniquely ident ifies block of m em or y

• Ever y line’s t ag is exam ined for a m at ch

• Cache sear ching get s expensive

Fully Assoc ia t ive Ca c he Orga niza t ion

(7)

Tag 22 bit Word2 bit Assoc ia t ive M a pping

Addre ss St ruc t ure

• 22 bit t ag st ored w it h each 32 bit block of dat a • Com pare t ag field w it h t ag ent ry in cache t o

check for hit

• Least significant 2 bit s of address ident ify w hich 16 bit w ord is required from 32 bit dat a block • e. g.

—Address Tag Data Cache line

—FFFFFC FFFFFC24682468 3FFF

Assoc ia t ive M a pping Sum m a ry

• Addr ess lengt h = ( s + w ) bit s

• Num ber of addr essable unit s = 2s+ w w or ds or byt es

• Block size = line size = 2w w or ds or byt es

• Num ber of blocks in m ain m em or y = 2s+ w / 2w = 2s

• Num ber of lines in cache = undet er m ined

• Size of t ag = s bit s

Se t Assoc ia t ive M a pping

• Cache is divided int o a num ber of set s

• Each set cont ains a num ber of lines

• A given block m aps t o any line in a given set

—e. g. Block B can be in any line of set i • e.g. 2 lines per set

—2 w ay associat ive m apping

—A given block can be in one of 2 lines in only one set

Se t Assoc ia t ive M a pping Ex a m ple

• 13 bit set num ber

• Block num ber in m ain m em or y is m odulo 213

• 000000, 00A000, 00B000, 00C000 … m ap t o sam e set

T w o Wa y Se t Assoc ia t ive Ca c he Orga niza t ion

Se t Assoc ia t ive M a pping Addre ss St ruc t ure

• Use set field t o det er m ine cache set t o look in

• Com par e t ag field t o see if w e have a hit

• e.g

—Address Tag Dat a Set

num ber

—1FF 7FFC 1FF 12345678 1FFF

(8)

T w o Wa y Se t

Assoc ia t ive M a pping Ex a m ple

Se t Assoc ia t ive M a pping Sum m a ry

• Addr ess lengt h = ( s + w ) bit s

• Num ber of addr essable unit s = 2s+ w w or ds or byt es

• Block size = line size = 2w w or ds or byt es

• Num ber of blocks in m ain m em or y = 2d

• Num ber of lines in set = k

• Num ber of set s = v = 2d

• Num ber of lines in cache = kv = k * 2d

• Size of t ag = ( s – d) bit s

Re pla c e m e nt Algorit hm s (1 ) Dire c t m a pping

• No choice

• Each block only m aps t o one line

• Replace t hat line

Re pla c e m e nt Algorit hm s (2 ) Assoc ia t ive & Se t Assoc ia t ive

• Har dw ar e im plem ent ed algor it hm ( speed)

• Least Recent ly used ( LRU)

• e.g. in 2 w ay set associat ive

—Which of t he 2 block is lru? • Fir st in fir st out ( FI FO)

—replace block t hat has been in cache longest • Least fr equent ly used

—replace block w hich has had few est hit s • Random

Writ e Polic y

• Must not over w r it e a cache block unless m ain m em or y is up t o dat e

• Mult iple CPUs m ay have individual caches

• I / O m ay addr ess m ain m em or y dir ect ly

Writ e t hrough

• All w r it es go t o m ain m em or y as w ell as cache

• Mult iple CPUs can m onit or m ain m em or y t r affic t o keep local ( t o CPU) cache up t o dat e

(9)

Writ e ba c k

• Updat es init ially m ade in cache only

• Updat e bit for cache slot is set w hen updat e occur s

• I f block is t o be r eplaced, w r it e t o m ain m em or y only if updat e bit is set

• Ot her caches get out of sync

• I / O m ust access m ain m em or y t hr ough cache

• N.B. 15% of m em or y r efer ences ar e w r it es

Pe nt ium 4 Ca c he • 80386 – no on chip cache

• 80486 – 8k using 16 byt e lines and four w ay set associat ive organizat ion

• Pent ium ( all versions) – t w o on chip L1 caches —Data & instructions

• Pent ium I I I – L3 cache added off chip • Pent ium 4

—L1 caches

–8k byt es

–64 byt e lines

–four way set associat ive

—L2 cache

–Feeding bot h L1 caches

–256k

–128 byt e lines

–8 way set associat ive

—L3 cache on chip

I nt e l Ca c he Evolut ion

Problem Solution

Processor on which feature first appears

External memory slower than the system bus. Add external cache using faster memory technology.

386 Increased processor speed results in external bus

becoming a bottleneck for cache access. Move external cache on-chip, operating at the same speed as the processor.

486

Internal cache is rather small, due to limited space on

chip Add external L2 cache using faster technology than main memory

486 Contention occurs when both the Instruction Prefetcher

and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place.

Create separate data and instruction caches.

Pentium

Increased processor speed results in external bus becoming a bottleneck for L2 cache access.

Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.

Pentium Pro

Move L2 cache on to the processor chip.

Pentium II Some applications deal with massive databases and

must have rapid access to large amounts of data. The on-chip caches are too small.

Add external L3 cache. Pentium III Move L3 cache on-chip. Pentium 4

Pe nt ium 4 Bloc k Dia gra m

Pe nt ium 4 Core Proc e ssor • Fet ch/ Decode Unit

—Fetches instructions from L2 cache

—Decode into m icro- ops

—Store m icro- ops in L1 cache

• Out of order execut ion logic —Schedules m icro- ops

—Based on data dependence and resources

—May speculativ ely execute

• Execut ion unit s —Execute m icro- ops

—Data from L1 cache

—Results in registers

Pe nt ium 4 De sign Re a soning

• Decodes instructions into RI SC like m icro- ops before L1 cache

• Micro- ops fixed length

—Superscalar pipelining and scheduling

• Pentium instructions long & com plex

• Perform ance im proved by separatin g decoding from scheduling & pipelining

—( More lat er – ch14)

• Data cache is w rite back

—Can be configured t o writ e t hrough

• L1 cache controlled by 2 bits in register

—CD = cache disable

—NW = not writ e t hrough

(10)

Pow e rPC Ca c he Orga niza t ion

• 601 – single 32kb 8 w ay set associat ive

• 603 – 16kb ( 2 x 8kb) t w o w ay set associat ive

• 604 – 32kb

• 620 – 64kb

• G3 & G4

—64kb L1 cache –8 w ay set associative

—256k, 512k or 1M L2 cache –tw o w ay set associative

• G5

—32kB inst ruct ion cache —64kB dat a cache

Pow e rPC G5 Bloc k Dia gra m

I nt e rne t Sourc e s

• Manufact ur er sit es

Referensi

Dokumen terkait

Nilai-nilai tersebut muncul dari acuan moral agama dan budaya, dan oleh karenanya perlu bagi pembuat kebijakan dan hukum untuk mengelaborasi kembali nilai-nilai dalam

Menurut manajer produksi PT Sayuran Siap Saji yaitu Hendro Tavip Nugroho, bahwa perkembangan penggunaan teknologi yang ada saat ini menjadi peluang bagi

(Tuliskan dengan rinci reaksi-reaksi kimia yang terjadi sehingga perubahan fisik dalam hal warna dapat diamati secara kasat mata). Persamaan

Di wilayah Kecamatan Tondo di Kelurahan Mantikolore memiliki Iklim yang panas dan curah hujan yang rendah sehingga mengakibatkan Udara yang kurang bersih akibat debu dan polusi,

Sarjana paling sedikit menguasai konsep teoritis bidang pengetahuan dan keterampilan tertentu secara umum dan konsep teoritis bagian khusus dalam bidang. pengetahuan dan

• Mengidentifikasi pengaruh atau konsekuensi dari Rencana Tata Ruang Wilayah terhadap lingkungan hidup sebagai upaya untuk mendukung proses pengambilan keputusan.. •

Upaya Pengelolaan Lingkungan Hidup dan Upaya Pemantauan Lingkungan Hidup, yang selanjutnya disebut UKL-UPL, adalah pengelolaan dan pemantauan terhadap Usaha

Pada pasien-pasien yang mengalami cedera kepala yang lebih parah,.. prognosisnya jauh