• Tidak ada hasil yang ditemukan

panel-plw-2007.ppt 340KB Jun 23 2011 12:05:46 PM

N/A
N/A
Protected

Academic year: 2017

Membagikan "panel-plw-2007.ppt 340KB Jun 23 2011 12:05:46 PM"

Copied!
30
0
0

Teks penuh

(1)

Lizy Kurian John, LCA, UT Aust in

1

The University of Texas at Austin

What Programming

Language/Compiler

Researchers should Know

about Computer Architecture

Lizy Kurian John

Department of Electrical and Computer Engineering

(2)

Lizy Kurian John, LC A, UT Austin

2

Somebody once said

Computers are dumb actors

(3)

Lizy Kurian John, LC A, UT Austin

3

Computer Architecture

Basics

ISAs

RISC vs CISC

Assembly language coding

Datapath (ALU) and controller

Pipelining

Caches

Out of order execution

(4)

Lizy Kurian John, LC A, UT Austin

4

Basics

ILP

DLP

TLP

Massive parallelism

SIMD/MIMD

VLIW

Performance and Power metrics

(5)

Lizy Kurian John, LC A, UT Austin

5

The Bottomline

Programming Language choice

affects performance and power

eg: Java

(6)

Lizy Kurian John, LC A, UT Austin

6

A Java Hardware

Interpreter

Radhakrishnan, Ph. D 2000 (ISCA2000, ICS2001)

This technique used by Nazomi

Communications, Parthus (Chicory Systems)

Java class file

Native executable

Fetch Hardware bytecode

translator

Decode Execute

bytecodes

(7)

Lizy Kurian John, LC A, UT Austin

7

HardInt Performance

4-way performance 44 .8 10 9.

3 149.

7 93 4. 1 91 1. 7 60 .4 13 5. 9 85 .2 12 7. 7 49 2. 2 71 .0 13 3. 7 22 1. 5 98 9. 4 86 7. 8 59 .8 10 8.

8 146.

2 14 6. 1 32 1. 9 16 .0 27 .7 28 .8 25 0. 2 12 0. 0 0 50 100 150 200 250 300 350 400

db javac jess mpeg mtrt

e x e c u ti o n c y c le s ( m il li o n s )

J DK 1.1.6 Interpreter J DK 1.1.6 J IT J DK 1.2 Interpreter J DK 1.2 J IT Hard- Int

• Hard-Int performs consistently better than the interpreter • In JIT mode, significant performance boost in 4 of 5

(8)

Lizy Kurian John, LC A, UT Austin

8

Compiler and Power

A B D F C E A B D F A B D F C C E E

DDG Peak Power = 3

Energy = 6

Peak Power = 2 Energy = 6

(9)

Lizy Kurian John, LC A, UT Austin

9

Valluri et al 2001 HPCA

workshop

Quantitative Study

Influence of state-of-the-art optimizations

on energy and power of the processor

examined

Optimizations studied

 Standard –O1 to –O4 of DEC Alpha’s cc compiler  Four individual optimizations – simple

basic-block instruction scheduling, loop unrolling,

function inlining, and aggressive global

(10)

Lizy Kurian John, LC A, UT Austin

10

Standard Optimizations on

Power

Benchmark opt level Energy Exec Time Insts Avg Power IPC

O0 100 100 100 100 100 O1 74.48 81.55 81.52 91.33 99.96 O2 75.13 81.44 82.04 92.25 100.73 O3 75.13 81.44 82.04 92.25 100.73 O4 79.01 82.77 86.11 95.45 104.03 O0 100 100 100 100 100 O1 66.2 64.13 68.94 103.23 107.5 O2 62.62 61.31 63.01 102.14 102.78 O3 62.62 61.31 63.01 102.14 102.78 O4 63.67 62.19 63.75 102.38 102.51 O0 100 100 100 100 100 O1 81.32 83.66 83.18 97.2 99.42 O2 79.6 75.97 82.97 104.78 109.21 O3 79.6 75.97 82.97 104.78 109.21 O4 85.71 77.89 90.96 110.05 116.78

compress

go

(11)

Lizy Kurian John, LC A, UT Austin

11

Somebody once said

Computers are dumb actors

(12)

Lizy Kurian John, LC A, UT Austin

12

A large part of modern

out of order processors

(13)

Lizy Kurian John, LC A, UT Austin

13

Let me get more arrogant

A large part of modern out of

order processors was designed

because

computer architects thought

(14)

Lizy Kurian John, LC A, UT Austin

14

Value Prediction

Is a slap on your face

(15)

Lizy Kurian John, LC A, UT Austin

15

Value Locality

Likelihood that an instruction’s

computed result or a similar

predictable result will occur soon

Observation – a limited set of

(16)

Lizy Kurian John, LC A, UT Austin

16

(17)

Lizy Kurian John, LC A, UT Austin

17

Causes of value locality

Data redundancy – many 0s, sparse

matrices, white space in files, empty

cells in spread sheets

Program constants –

Computed branches – base address for

jump tables is a run-time constant

Virtual function calls – involve code to

(18)

Lizy Kurian John, LC A, UT Austin

18

Causes of value locality

Memory alias resolution – compiler

conservatively generates code – may

contain stores that alias with loads

Register spill code – stores and

subsequent loads

Convergent algorithms – convergence in

parts of algorithms before global

convergence

(19)

Lizy Kurian John, LC A, UT Austin

19

2 Extremist Views

Anything that can be done in

hardware should be done in

hardware.

(20)

Lizy Kurian John, LC A, UT Austin

20

What do we need?

The Dumb actor

Or the

(21)

Lizy Kurian John, LC A, UT Austin

21

Challenging all compiler

writers

The last 15 years was the defiant actor’s era

What about the next 15? TLP,

Multithreading, Parallelizing compilers –

It’s time for a lot more dumb acting from

the architect’s side.

(22)

Lizy Kurian John, LCA, UT Aust in

22

The University of Texas at Austin

(23)

Lizy Kurian John, LC A, UT Austin

23

Compiler Optimzations

cc

-

Native C compiler on Dec

Alpha 21064 running OSF1

operating system

gcc –

Used to study the effect of

(24)

Lizy Kurian John, LC A, UT Austin

24

Std Optimizations Levels

on

cc

-O0 – No optimizations performed

-O1 – Local optimizations such as CSE,

copy propagation, IVE etc

-O2 – Inline expansion of static procedures

and global optimizations such as loop

unrolling, instruction scheduling

(25)

Lizy Kurian John, LC A, UT Austin

25

Std Optimizations Levels

on g

cc

-O0 – No optimizations performed

-O1 – Local optimizations such as CSE, copy propagation, dead-code elimination etc -O2 – aggressive instruction scheduling -O3 – Inlining of procedures

Almost same optimizations in each level of cc and gccIn cc and gcc, optimizations that increase ILP are in

levels -O2, -O3, and -O4

cc used where ever possible, gcc used used where specific hooks are required

(26)

Lizy Kurian John, LC A, UT Austin

26

Individual Optimizations

Four

gcc

optimizations, all optimizations

applied on top -O1

-

fschedule-insns

local register allocation

followed by basic-block list scheduling

-

fschedule-insns2

– Postpass scheduling

done

-

finline-functions –

Integrated all simple

functions into their callers

-funroll-loops

– Perform the optimization

(27)

Lizy Kurian John, LC A, UT Austin

27

Some observations

Energy consumption reduces when

# of instructions is reduced, i.e.,

when the total work done is less,

energy is less

Power dissipation is directly

(28)

Lizy Kurian John, LC A, UT Austin

28

Observations (contd.)

Function inlining was found to be

good for both power and energy

Unrolling was found to be good for

(29)

Lizy Kurian John, LC A, UT Austin

29

MMX/SIMD

(30)

Lizy Kurian John, LC A, UT Austin

30

Standard Optimizations on

Power (Contd)

Benchmark opt level Energy Exec Time Insts Avg Power IPC

O0 100 100 100 100 100 O1 97.38 100.24 92.49 97.15 92.27 O2 97.69 99.38 92.49 98.3 93.07 O3 97.69 99.38 92.49 98.3 93.07 O4 98.31 99.27 92.84 99.02 93.51 O0 100 100 100 100 100 O1 42.09 51.04 33.21 82.46 65.06 O2 40.99 47.52 33.1 86.28 69.67 O3 40.99 46.37 33.1 87.65 71.38 O0 100 100 100 100 100 O1 30.1 36.64 20.01 82.15 5463 O2 28.93 34.01 19.05 85.06 56.01 O3 28.93 34.01 19.05 85.06 56.01

su2cor

Referensi

Dokumen terkait

dengan cara mengikuti aturan-aturan (IF-THEN Rules) yang telah ditetapkan pada basis pengetahuan fuzzy. 4) Defuzzyfikasi merupakan proses mengubah output fuzzy yang diperoleh

MANDALA INDONESIA TECHNOLOGY Divisi Training IT, WINTECH, adalah merupakan unit tugas yang harus diikuti oleh setiap mahasiswa Desain Komunikasi Visual di

dengan judulnya Colorful dengan konsep pewarnaan yang disukai anak kecil, eye catching, dan menyenangkan. Tentu saja dengan bentukan desain yang sangat familiar

know, the research result is a fact that is recognized the truth in the past,. present, and in the future, however, the research result in this case

[r]

Untuk membuat file baru, Anda dapat memilih menu File‐New atau tekan tombol atau tekan

Hasil dari perancangan sistem ini masih dimungkinkan untuk dilakukan pengembangan, sehingga kebutuhan informasi yang dihasilkan sesuai dengan kebutuhan. Pengembangan yang dilakukan

Restoran dipilih karena merupakan sebuah fasilitas pemenuh kebutuhan pangan masyarakat yang dibutuhkan setiap hari, selain makan pengunjung juga dapat berkeliling ke