• Tidak ada hasil yang ditemukan

Valor: Low-Overhead Region Conflict Detection in Software

N/A
N/A
Protected

Academic year: 2024

Membagikan "Valor: Low-Overhead Region Conflict Detection in Software"

Copied!
90
0
0

Teks penuh

(1)

Swarnendu Biswas

, Minjia Zhang, and Michael D. Bond

Ohio State University

Brandon Lucia

Carnegie Mellon University

OOPSLA 2015

Valor: Efficient, Software-Only Region Conflict

Exceptions

(2)

A Simple C++ Program

x = new X();

done = true;

if (done) { x->func();

} X* x = NULL;

bool done= false;

Thread T1 Thread T2

(3)

A Simple C++ Program

x = new X();

done = true;

if (done) { x->func();

}

X* x = NULL;

bool done= false;

Thread T1 Thread T2

(4)

Data Races

x = new X();

done = true;

if (done) { x->func();

} X* x = NULL;

bool done= false;

Thread T1

Data race

Thread T2

(5)

Data Races are Evil

Lack of semantic guarantees make software unsafe

No semantic guarantees

Challenging to reason about correctness for racy executions

Complicates language specifications

Leads to atomicity, order or sequential consistency violations

Indicates other concurrency errors

(6)

Catch-Fire Semantics

x = new X();

done = true;

if (done) { x->func();

} X* x = NULL;

bool done= false;

Thread T1 Thread T2

C++ treats data races as errors

(7)

Catch-Fire Semantics

x = new X();

done = true;

if (done) { x->func();

}

X* x = NULL;

bool done= false;

Thread T1 Thread T2

C++ treats data races as errors

(8)

Catch-Fire Semantics

x = new X();

done = true;

if (done) { x->func();

}

X* x = NULL;

bool done= false;

Thread T1 Thread T2

C++ treats data races as errors

(9)

A Java Example

X = new Object();

done = true;

while (!done) {}

X.compute();

Thread T1 Thread T2

(10)

A Java Example

X = new Object();

done = true;

while (!done) {}

X.compute();

Thread T1 Thread T2

Java tries to assign semantics,

which are unsatisfactory

(11)

C++ and Java Memory Models

Data-race-free

execution Strong semantics

(12)

C++ and Java Memory Models

Data-race-free

execution Strong semantics

Execution is sequentially consistent

(13)

C++ and Java Memory Models

Data-race-free

execution Strong semantics

Execution is sequentially consistent

Synchronization-free regions execute atomically

(14)

C++ and Java Memory Models

Data-race-free

execution Strong semantics

Execution is sequentially consistent

Synchronization-free regions execute atomically

lock(l)

lock(m)

unlock(m)

unlock(l)

SFR

(15)

C++ and Java Memory Models

But what about data

races?

(16)

C++ and Java Memory Models

Racy execution

???

But what about data

races?

(17)

Need for Stronger Memory Models

Adve and Boehm, CACM 2010

“The inability to define reasonable semantics for programs with data races is not just a theoretical shortcoming, but a fundamental hole in

the foundation of our languages and systems.”

(18)

Need for Stronger Memory Models

Adve and Boehm, CACM 2010

“The inability to define reasonable semantics for programs with data races is not just a theoretical shortcoming, but a fundamental hole in

the foundation of our languages and systems.”

“We call upon software and hardware communities to develop languages and systems that enforce data-race-freedom, ...”

(19)

• Programming language memory models and data races

• Data race and region conflict exceptions model

• Valor: Our contribution

• Evaluation

Outline

(20)

Data Race

Thread T1 Thread T2

X = new Object();

done = true;

while (!done) {}

X.compute();

(21)

Data Race Exceptions

Thread T1 Thread T2

X = new Object();

done = true;

while (!done) {}

X.compute();

EXCEPTION

(22)

REGION CONFLICT EXCEPTIONS MODEL

(23)

Region Conflict

Thread T1 Thread T2

(24)

Region Conflict

wr x

Thread T1 Thread T2

(25)

Region Conflict

rd/wr x wr x

Thread T1 Thread T2

(26)

Region Conflict

rd/wr x wr x

Thread T1

Conflict

Thread T2

(27)

Region Conflict

rd/wr x wr x

Thread T1

Conflict

Thread T2

Reports a subset of true data races that violate region serializability

(28)

Execution Models

rd/wr x wr x

Thread T1

Conflict

Thread T2

Reports a subset of true data races that violate region serializability

sequentially consistent

region-conflict- free

region

serializable

data-race-free

(29)

Region Conflict Exception Model

Develop a practical region conflict detection technique

GOAL

(30)

Region Conflict Detection

Hardware customizations required for good performance

 Limited by resources and applicability

o Needs extensive modifications and is unscalable1 o Detects serializability violations of bounded regions2

1. Lucia et al. Conflict Exceptions: Simplifying Concurrent Language Semantics With Precise Hardware Exceptions for Data-Races. ISCA 2010.

2. Marino et al. DRFx: A Simple and Efficient Memory Model for Concurrent Programming Languages. PLDI 2010.

(31)

• Programming language memory models and data races

• Data race and region conflict exceptions model

• Valor: Our contribution

• Evaluation

Outline

(32)

Valor: Efficient, Software-Only Region Conflict Detector

Elides tracking last readers, only tracks last writer

Detect read-write conflicts lazily

(33)

Tracking last writers

Detecting read-write conflicts lazily

Impact of lazy conflict detection

(34)

Tracking Last Writer

(35)

Per-Variable Metadata

j@T1

Epoch

Has an ongoing region updated x?

Thread T1 j

wr x

(36)

Tracking Last Writer

Thread T1 Thread T2

x

p@T0
(37)

Tracking Last Writer

Thread T1 Thread T2

j

wr x

x

p@T0
(38)

Tracking Last Writer

wr x

Thread T1 j

Track last writer Thread T2

x

p@T0
(39)

Tracking Last Writer

wr x

Thread T1 j

Track last writer Thread T2

update metadata

x

j@T1
(40)

Tracking Last Writer

rd/wr x wr x

Thread T1 j

Track last writer Thread T2

x

j@T1
(41)

Tracking Last Writer

rd/wr x wr x

Thread T1 j

Conflict

Track last writer Thread T2

x

j@T1
(42)

Tracking Last Writer

rd/wr x wr x

Thread T1 j

Conflict

Track last writer

Allows precisely detecting write-write and write-read conflicts

Thread T2

x

j@T1
(43)

Tracking last writers

Detecting read-write conflicts lazily

Impact of lazy conflict detection

(44)

Detecting Read-Write Conflicts

rd x

Thread T1

wr x

Thread T2

(45)

Detecting Read-Write Conflicts

rd x

Thread T1 j

Thread T2

x

p@T0
(46)

Detecting Read-Write Conflicts

rd x

Thread T1 j

Thread T2

update read metadata

x

j@T1
(47)

Detecting Read-Write Conflicts

rd x

Thread T1 j

wr x

Thread T2

x

j@T1
(48)

Detecting Read-Write Conflicts

rd x

Thread T1 j

wr x

Conflict

Thread T2

x

j@T1
(49)

Detecting Read-Write Conflicts

rd x

Thread T1 j

wr x

Conflict

Thread T2

This simple mechanism used in prior work has problems

x

j@T1
(50)

Remote Cache Misses Due to Tracking of Metadata

Thread T1

rd x

j

Leads to remote cache misses

Write operation

update metadata

(51)

Metadata Updates

Thread T1

rd/wr x

j

(52)

Metadata Updates

Thread T1

rd/wr x

j

update metadata

(53)

Synchronization on Metadata Updates

Thread T1

rd/wr x lock l

j

(54)

Synchronization on Metadata Updates

Thread T1

rd/wr x lock l

j

update metadata

(55)

Synchronization on Metadata Updates

Thread T1

rd/wr x unlock l

lock l

j

update metadata

(56)

Synchronization on Metadata Updates

Thread T1

rd/wr x unlock l

lock l

j

Bad for mostly read-only data

update metadata

(57)

Elide Tracking Last Readers

(58)

Elide Tracking Last Readers

rd x

Thread T1

wr x

Thread T2

?

(59)

Elide Tracking Last Readers

rd x

Thread T1 Thread T2 Detect read-write conflicts lazily

wr x

(60)

Elide Tracking Last Readers

rd x

Thread T1 Thread T2 Detect read-write conflicts lazily

 Log read accesses in thread- local buffers

 Validate reads at region boundaries

1. Saha et al. McRT-STM: A High Performance Software Transactional Memory System for a Multi-Core Runtime. PPoPP 2006.

wr x

(61)

Per-Variable Metadata

<v, j@T1>

Version, Epoch

Has an ongoing region updated x?

Thread T1 j

wr x

(62)

Elide Tracking Last Readers

rd x

Thread T1 j

Thread T2 Detect read-write conflicts lazily

write metadata

 Log read accesses in thread- local buffers

 Validate reads at region boundaries

x

<v, p@T0>
(63)

Elide Tracking Last Readers

rd x

Thread T1 Thread T2 Detect read-write conflicts lazily

log <x, v>

 Log read accesses in thread- local buffers

 Validate reads at region boundaries

j

x

<v, p@T0>
(64)

Elide Tracking Last Readers

rd x

Thread T1 Thread T2 Detect read-write conflicts lazily

log <x, v> k

j

wr x

x

<v, p@T0>
(65)

Elide Tracking Last Readers

rd x

Thread T1 Thread T2 Detect read-write conflicts lazily

log <x, v> k

j

wr x

x

<v+1, k@T2>
(66)

Elide Tracking Last Readers

rd x

Thread T1 Thread T2 Detect read-write conflicts lazily

log <x, v>

read validation j k

wr x

x

<v+1, k@T2>
(67)

Elide Tracking Last Readers

rd x

Thread T1 Thread T2 Detect read-write conflicts lazily

log <x, v>

read validation

T1 is not the last writer

Version read is outdated

j k

wr x

x

<v+1, k@T2>
(68)

Elide Tracking Last Readers

rd x

Thread T1 Thread T2 Detect read-write conflicts lazily

log <x, v>

read validation

T1 is not the last writer

Version read is outdated

Conflict

j k

wr x

x

<v+1, k@T2>
(69)

Elide Tracking Last Readers

Avoids

• Remote cache misses

• Synchronization overhead

(70)

Tracking last writers

Detecting read-write conflicts lazily

Impact of lazy conflict detection

(71)

Precise Conflict Detection

Thread T1

rd x

Thread T2

wr x

Conflict

(72)

Precise vs Lazy Conflict Detection

Thread T1

rd x

Thread T2

wr x

Conflict

Thread T1

rd x

Thread T2

wr x

read validation

Conflict

delayed exception

(73)

Precise vs Lazy Conflict Detection

Thread T1

rd x

Thread T2

wr x

Conflict

Thread T1

rd x

Thread T2

wr x

read validation

Conflict

delayed exception

Delayed exceptions

(74)

Delayed Exceptions

Thread T1

rd x

Thread T2

wr x

Conflict

Thread T1

rd x

Thread T2

wr x

read validation

Conflict

delayed exception

Delayed exceptions

Do not compromise semantic guarantees

Effects should not be externally visible

(75)

Delayed Exceptions

Thread T1

rd x

Thread T2

wr x

Conflict

Thread T1

rd x

Thread T2

wr x

read validation

Conflict

delayed exception

Debugging might be slightly harder

Exception will be thrown at the next

region boundary from the reader thread

But does not compromise on soundness

and precision

(76)

• Programming language memory models and data races

• Data race and region conflict exceptions model

• Valor: Our contribution

• Evaluation

Outline

(77)

IMPLEMENTATION

(78)

Implementation

Developed on top of Jikes RVM 3.1.3

(79)

Implementation

1. Flanagan and Freund. FastTrack: Efficient and Precise Dynamic Data Race Detection. PLDI 2009.

Developed on top of Jikes RVM 3.1.3

Implemented FastTrack, state-of-art happens-before analysis based data

race detector

(80)

Implementation

Developed on top of Jikes RVM 3.1.3

Implemented FastTrack, state-of-art happens-before analysis based data

race detector

Shared on Jikes RVM Research Archive and ACM DL

(81)

EVALUATION

(82)

• Benchmarks

▫ Large workload sizes of DaCapo 2006 and 9.12-bach suite

▫ Fixed-workload versions of SPECjbb2000 and SPECjbb2005

• Platform

▫ 64-core AMD Opteron 6272

Experimental Methodology

(83)

Performance Comparison

1006

342

99

0 200 400 600 800 1000

eclipse6 hsqldb6 lusearch6 xalan6 avrora9 jython9 luindex9 lusearch9 pmd9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean

Overheads (%) Over Unmodified Jikes RVM

FastTrack Valor

(84)

Performance Comparison

1006

342 267

99

0 200 400 600 800 1000

eclipse6 hsqldb6 lusearch6 xalan6 avrora9 jython9 luindex9 lusearch9 pmd9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean

Overheads (%) Over Unmodified Jikes RVM

FastTrack Eager conflict detection Valor

(85)

Performance Comparison

1006

342 267

99

0 200 400 600 800 1000

eclipse6 hsqldb6 lusearch6 xalan6 avrora9 jython9 luindex9 lusearch9 pmd9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean

Overheads (%) Over Unmodified Jikes RVM

FastTrack Eager conflict detection Valor

First software-only region conflict detector

with less than 100%

overhead

(86)

Performance Comparison: Intel Xeon

1016

408 303

115

0 200 400 600 800 1000

eclipse6 hsqldb6 lusearch6 xalan6 avrora9 jython9 luindex9 lusearch9 pmd9 sunflow9 xalan9 pjbb2000 pjbb2005 geomean

Overheads (%) Over Unmodified Jikes RVM

FastTrack Eager conflict detection Valor

Relative performance remains comparable on

an Intel Xeon architecture

(87)

Please check the paper

Space overheads Characterization

of FastTrack and Jikes RVM

Data race coverage

Additional Experiments

(88)

Valor: Contributions

Strong execution guarantees in software

Exception-free

execution Strong semantics

Detects all violations of region serializability

Advances state-of-art

 Provides strong semantics in software at less than 100%

overhead

(89)

New Opportunities with Valor

Language runtimes could integrate this

Semantic guarantees

Can be used to detect problematic data races

Debugging

All-the-time monitoring in certain environments

Conflict exceptions

Aggressive optimizations

Reorder and eliminate redundant loads and stores within synchronization- free regions

(90)

Swarnendu Biswas

, Minjia Zhang, and Michael D. Bond

Ohio State University

Brandon Lucia

Carnegie Mellon University

OOPSLA 2015

Valor: Efficient, Software-Only Region Conflict

Exceptions

Referensi

Dokumen terkait