• Tidak ada hasil yang ditemukan

dagstuhl03 memory consistency

N/A
N/A
Protected

Academic year: 2017

Membagikan "dagstuhl03 memory consistency"

Copied!
38
0
0

Teks penuh

(1)

(C) 2003 Mulitfacet Project University of Wisconsin-Madison

Revisiting “Multiprocessors

Should Support Simple

Memory Consistency Models”

Mark D. Hill

Multifacet Project (www.cs.wisc.edu/multifacet)

Computer Sciences Department

University of Wisconsin—Madison

(2)

Wisconsin Multifacet Project

2

Dagstuhl 10/2003

High- vs. Low-Level Memory Model Interface

C w/ Posix

Java w/ Threads

HPF

SW

SW

SW

Multithreaded Hardware

Most of This

Workshop

This

Talk

(3)

Wisconsin Multifacet Project

3

Dagstuhl 10/2003

Outline

• Subroutine Call

– Value Prediction & Memory Model Subtleties

• Review Original Paper [Computer, Dec. 1998] – Commercial Memory Model Classes

– Performance Similarities & Differences – Predictions & Recommendation

• Revisit in 2003

– Revisiting 1998 Predictions – “SC + ILP = RC?” Paper

(4)

(C) 2003 Mulitfacet Project University of Wisconsin-Madison

Correctly Implementing Value Prediction

in Microprocessors that Support

Multithreading or Multiprocessing

Milo M.K. Martin, Daniel J. Sorin, Harold W. Cain,

Mark D. Hill, and Mikko H. Lipasti

Computer Sciences Department

(5)

Wisconsin Multifacet Project

6

Dagstuhl 10/2003

Value Prediction

• Predict the value of an instruction

– Speculatively execute with this value – Later verify that prediction was correct

• Example: Value predict a load that misses in cache

– Execute instructions dependent on value-predicted load – Verify the predicted value when the load data arrives

• Without concurrency: simple verification is OK

– Compare actual value to predicted
(6)

Wisconsin Multifacet Project

7

Dagstuhl 10/2003

Informal Example of Problem, part 1

• Student #2 predicts grades are on bulletin board B

• Based on prediction, assumes score is 60

Grades for Class

Student ID

score

1

75

2 60

3 85

(7)

Wisconsin Multifacet Project

8

Dagstuhl 10/2003

Informal Example of Problem, part 2

• Professor now posts actual grades for this class

– Student #2 actually got a score of 80

• Announces to students that grades are on board B

Grades for Class

Student ID

score

1

75

50

2 60

80

3 85

70

(8)

Wisconsin Multifacet Project

9

Dagstuhl 10/2003

Informal Example of Problem, part 3

• Student #2 sees prof’s announcement and says,

“ I made the right prediction (bulletin board B),

and my score is 60”!

• Actually, Student #2’s score is 80

• What went wrong here?

– Intuition: predicted value from future

• Problem is concurrency

– Interaction between student and professor

(9)

Wisconsin Multifacet Project

10

Dagstuhl 10/2003

Linked List Example of Problem (initial state)

head A

null

42

60

null

A.data

B.data

A.next

B.next

• Linked list with single writer and single reader

• No synchronization (e.g., locks) needed

Initial state of list

(10)

Wisconsin Multifacet Project

11

Dagstuhl 10/2003

Linked List Example of Problem (Writer)

head

B

null

42

80

A

• Writer sets up node B and inserts it into list

A.data

B.data

A.next

B.next

Code For Writer Thread

W1:

store mem[B.data] <- 80

W2:

load reg0 <- mem[Head]

W3:

store mem[B.next] <- reg0

W4:

store mem[Head] <- B

I

ns

er

t

{

Setup

no

(11)

Wisconsin Multifacet Project

12

Dagstuhl 10/2003

Linked List Example of Problem (Reader)

head

?

null

42

60

null

• Reader cache misses on head and

value predicts head=B.

• Cache hits on B.data and reads 60.

• Later “verifies” prediction of B. Is this execution legal?

A.data

B.data

A.next

B.next

Predict head=B

Code For Reader Thread

R1:

load reg1 <- mem[Head]

= B

(12)

Wisconsin Multifacet Project

13

Dagstuhl 10/2003

Why This Execution Violates SC

• Sequential Consistency

– Simplest memory consistency model – Must exist total order of all operations

– Total order must respect program order at each processor

(13)

Wisconsin Multifacet Project

14

Dagstuhl 10/2003

Trying to Find a Total Order

• What orderings are enforced in this example?

Code For Writer Thread

W1:

store mem[B.data] <- 80

W2:

load reg0 <- mem[Head]

W3:

store mem[B.next] <- reg0

W4:

store mem[Head] <- B

Code For Reader Thread

R1:

load reg1 <- mem[Head]

R2:

load reg2 <- mem[reg1]

Se tu p no de I ns er t

{

Setup

no

(14)

Wisconsin Multifacet Project

15

Dagstuhl 10/2003

Program Order

Code For Writer Thread

W1:

store mem[B.data] <- 80

W2:

load reg0 <- mem[Head]

W3:

store mem[B.next] <- reg0

W4:

store mem[Head] <- B

Code For Reader Thread

R1:

load reg1 <- mem[Head]

R2:

load reg2 <- mem[reg1]

Se tu p no de I ns er t

{

(15)

Wisconsin Multifacet Project

16

Dagstuhl 10/2003

Data Order

• If we

predict that R1 returns the value B

, we can violate SC

Code For Writer Thread

W1:

store mem[B.data] <- 80

W2:

load reg0 <- mem[Head]

W3:

store mem[B.next] <- reg0

W4:

store mem[Head] <- B

Code For Reader Thread

R1:

load reg1 <- mem[Head]

= B

R2:

load reg2 <- mem[reg1]

= 60

(16)

Wisconsin Multifacet Project

17

Dagstuhl 10/2003

Value Prediction and Sequential Consistency

• Key: value prediction reorders dependent operations

– Specifically, read-to-read data dependence order

• Execute dependent operations out of program order

• Applies to almost all consistency models

– Models that enforce data dependence order

• Must detect when this happens and recover

(17)

Wisconsin Multifacet Project

18

Dagstuhl 10/2003

How to Fix SC Implementations

• Address-based detection of violations

– Student watches board B between prediction and verification – Like existing techniques for out-of-order SC processors

– Track stores from other threads

– If address matches speculative load, possible violation

• Value-based detection of violations

– Student checks grade again at verification – Also an existing idea

– Replay all speculative instructions at commit

(18)

Wisconsin Multifacet Project

19

Dagstuhl 10/2003

Relaxed Consistency Models

• Relax some orderings between reads and writes

• Allows HW/SW optimizations

• Software must add

memory barriers

to get ordering

• Intuition: should make value prediction easier

(19)

Wisconsin Multifacet Project

20

Dagstuhl 10/2003

Weakly Ordered Consistency Models

• Relax orderings unless memory barrier between

• Examples:

– SPARC RMO – IA-64

– PowerPC – Alpha

(20)

Wisconsin Multifacet Project

21

Dagstuhl 10/2003

Relaxed Models that Enforce Data Dependence

• Examples: SPARC RMO, PowerPC, and IA-64

Code For Writer Thread

W1:

store mem[B.data] <- 80

W2:

load reg0 <- mem[Head]

W3:

store mem[B.next] <- reg0

W3b: Memory Barrier

W4:

store mem[Head] <- B

Code For Reader Thread

R1:

load reg1 <- mem[Head]

R2:

load reg2 <- mem[reg1]

Memory barrier orders W4

after W1, W2, W3

I

ns

er

t

{

Setup

no

(21)

Wisconsin Multifacet Project

22

Dagstuhl 10/2003

Violating Consistency Model

• Simple value prediction can break RMO, PPC, IA-64

• How? By relaxing dependence order between reads

(22)

Wisconsin Multifacet Project

23

Dagstuhl 10/2003

Solutions to Problem

1. Don’t enforce dependence order (add memory barriers)

– Changes architecture

– Breaks backward compatibility – Not practical

2. Enforce SC or PC

– Potential performance loss

(23)

Wisconsin Multifacet Project

24

Dagstuhl 10/2003

Models that Don’t Enforce Data Dependence

• Example: Alpha

• Requires extra memory barrier (between R1 & R2)

Code For Writer Thread

W1:

store mem[B.data] <- 80

W2:

load reg0 <- mem[Head]

W3:

store mem[B.next] <- reg0

W3b: Memory Barrier

W4:

store mem[Head] <- B

Code For Reader Thread

R1:

load reg1 <- mem[Head]

R1b: Memory Barrier

R2:

load reg2 <- mem[reg1]

I

ns

er

t

{

Setup

no

(24)

Wisconsin Multifacet Project

25

Dagstuhl 10/2003

Issues in Not Enforcing Data Dependence

• Works correctly with value prediction

– No detection mechanism necessary

– Do not need to add any more memory barriers for VP

• Additional memory barriers

– Non-intuitive locations
(25)

Wisconsin Multifacet Project

26

Dagstuhl 10/2003

Summary of Memory Model Issues

SC

Relaxed

Models

Weakly Ordered

Models

PC

IA-32

SPARC TSO

Enforce

Data Dependence

NOT Enforce

Data Dependence

IA-64

SPARC RMO

(26)

Wisconsin Multifacet Project

27

Dagstuhl 10/2003

Conclusions

Naïve value prediction can violate consistency

Subtle issues for each class of memory model

Solutions for SC & PC require detection mechanism

– Use existing mechanisms for enhancing SC performance
(27)

Wisconsin Multifacet Project

28

Dagstuhl 10/2003

Outline

• Subroutine Call

– Value Prediction & Memory Model Subtleties

• Review Original Paper [Computer, Dec. 1998] – Commercial Memory Model Classes

– Performance Similarities & Differences – Predictions & Recommendation

• Revisit in 2003

– Revisiting 1998 Predictions – “SC + ILP = RC?” Paper

(28)

Wisconsin Multifacet Project

29

Dagstuhl 10/2003

1998 Commercial Memory Model Classes

• Sequential Consistency (SC)

– MIPS/SGI – HP PA-RISC

• Processor Consistency (PC)

– Relax writeread dependencies

– Intel x86 (a.k.a., IA-32) – Sun TSO

• Relaxed Consistency (RC)

– Relax all dependencies, but add fences – DEC Alpha

– IBM PowerPC

(29)

Wisconsin Multifacet Project

30

Dagstuhl 10/2003

With All Models, Hardware Can

• Use

– Coherent Caches

– Non-binding prefetches

– Simultaneous & vertical multithreading

• With Speculative Execution

– Allow “expected” misses to prefetch

– Speculatively perform all reads & writes

(30)

Wisconsin Multifacet Project

31

Dagstuhl 10/2003

Performance Difference

• RC/PC/SC can do same optimzations

• But RC/PC can sometimes commit early

• While SC can lose performance

– Undoing execution on (suspected) model violation – Stalls due to full instruction windows, etc.

• Performance over SC [Ranganathan et al. 1997]

– 11% for PC

– 20% for RC

(31)

Wisconsin Multifacet Project

32

Dagstuhl 10/2003

1998 Predictions & Recommendation

• My Performance Gap Predictions

+ Longer (relative) memory latency

- Larger caches, bigger windows, etc. - New inventions

- My Recommendation

- Implement SC (or PC) - Keep interface simple
(32)

Wisconsin Multifacet Project

33

Dagstuhl 10/2003

Outline

• Subroutine Call

– Value Prediction & Memory Model Subtleties

• Review Original Paper [Computer, Dec. 1998] – Commercial Memory Model Classes

– Performance Similarities & Differences – Predictions & Recommendation

• Revisit in 2003

– Revisiting 1998 Predictions – “SC + ILP = RC?” Paper

(33)

Wisconsin Multifacet Project

34

Dagstuhl 10/2003

Revisiting Predictions

• Evolutionary Predictions

+ Longer (relative) memory latency - Larger caches

- Larger instruction windows.

• New Inventions

- Run-ahead & Helper threads - SMT commercialized

- Chip Multiprocessors (CMPs) - SC + ILP = RC?

Wonderful prefetching

Many threads per processor

Many threads per chip

Can close gap

(34)

Wisconsin Multifacet Project

36

Dagstuhl 10/2003

2003 Commercial Memory Model Classes

• Sequential Consistency (SC)

– MIPS/SGI – HP PA-RISC

• Processor Consistency (PC)

– Relax writeread dependencies

– Intel x86 (a.k.a., IA-32) – Sun TSO

• Relaxed Consistency (RC)

– Relax all dependencies, but add fences – DEC Alpha

– IBM PowerPC

– Sun RMO (no implementations)

(35)

Wisconsin Multifacet Project

37

Dagstuhl 10/2003

Current Analysis

• Architectures changed mostly for business reasons

• No one substantially changed model class

(36)

Wisconsin Multifacet Project

38

Dagstuhl 10/2003

Current Options

• Assume Relaxed HLL model  Three HW Model Options

• Expose SC/PC & Implement SC/PC

– Add SC/PC mechanisms & speculate! (somewhat complex) – HW implementers & verifiers know what correct is

• Expose Relaxed & Implement Relaxed

– Many HW implementers & verifiers don’t understand relaxed – More performance?

– Deep speculation require HW to pass fences

• Run-ahead & throw all away?

• Speculative execution with SC/PC-like mechanisms?

• Expose Relaxed & Implement SC/PC

– Implement fences as no-ops

– Use SC/PC mechanisms, speculate!

(37)

Wisconsin Multifacet Project

39

Dagstuhl 10/2003

Predictions & Recommendation

Predictions

– Longer (relative) memory latency

– Only partially compensated by caches, etc.

– Will speculate further without larger windows (run-ahead) – Will need to speculate past synchronization & fences

– Use CMPs to get many outstanding misses per chip

Recommendations (unrepentant

)

- Implement SC (or PC) - Keep interface simple

(38)

Wisconsin Multifacet Project

40

Dagstuhl 10/2003

Outline

• Subroutine Call

– Value Prediction & Memory Model Subtleties

• Review Original Paper [Computer, Dec. 1998]

– High- vs. Low-Level Memory Models – Commercial Memory Model Classes – Performance Similarities & Differences – Predictions & Recommendation

• Revisit in 2003

– Revisiting 1998 Predictions – “SC + ILP = RC?” Paper

Referensi

Dokumen terkait

Berikut ini link untuk mendownload

Persentase pencari keadilan golongan tertentu yang mendapat layanan bantuan hukum (Posbakum) 20% 4 Terwujudnya pelayanan prima bagi masyarakat pencari keadilan Meningkatnya

Given the range of responses elicited via the different approaches to questions posed in interviews conducted at the end of the first year of program implementation, in the

Menyusun rencana pembangunan dan pengembangan teknologi informasi di sebuah institusi 

Pasar dengan volume perdagangan saham yang relatif kecil menyulitkan investor untuk bereaksi terhadap informasi yang baru dan menyebabkan tidak terjadinya perbedaan volume

Berdasarkan hasil pengujian asumsi klasik, uji analisis regresi, dan pembahasan, maka dapat diketahui hasil pengujian terhadap hipotesis penelitian, yaitu kinerja keuangan

Menentukan wawasan kedepan yang didasarkan atas pertimbangan potensi, kendala, peluang dan ancaman yang menuntut kita harus lebih efektif dan efisiean. dalam bertindak,

Untuk mengetahui hal tersebut, dengan suatu metode pengambilan keputusan multikriteria dengan memecahkan situasi kompleks dan tidak terstruktur kedalam bagian-bagian dan