• Tidak ada hasil yang ditemukan

dally.ppt 3432KB Jun 23 2011 12:31:30 PM

N/A
N/A
Protected

Academic year: 2017

Membagikan "dally.ppt 3432KB Jun 23 2011 12:31:30 PM"

Copied!
30
0
0

Teks penuh

(1)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 1

Tomorrow’s Computing Engines

February 3, 1998

Symposium on High-Performance Computer Architecture

William J. Dally

Computer Systems Laboratory Stanford University

(2)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 2

Focus on Tomorrow, not Yesterday

General’s tend to always fight the last war

Computer architects tend to always design the last computer

old programs

(3)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 3

Some Previous “Wars” (1/3)

MARS Router 1984

Torus Routing Chip 1985

Network Design Frame 1988

(4)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 4

Some Previous “Wars” (2/3)

(5)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 5

(6)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 6

Tomorrow’s Computing Engines

• Driven by tomorrow’s applications - media

(7)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 7

90% of Desktop Cycles will Be Spent on ‘Media’ Applications by 2000

• Quote from Scott Kirkpatric of IBM (talk abstract) • Media applications include

– video encode/decode

– polygon & image-based graphics

– audio processing - compression, music, speech - recognition/synthesis

– modulation/demodulation at audio and video rates

• These applications involve stream processing • So do

(8)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 8

Typical Media Kernel

Image Warp and Composite

• Read 10,000 pixels from memory

• Perform 100 16-bit integer operations on each pixel • Test each pixel

• Write 3,000 result pixels that pass to memory

• Little reuse of data fetched from memory

– each pixel used once

• Little interaction between pixels

– very insensitive to operation latency

(9)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 9

Telepresence: A Driving Application

Acquire 2D Images

Extract Depth (3D Images)

Segmentation Model Extraction

Compression

Decompression Rendering

Display 3D Scene

Most kernels: Latency insensitive

(10)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 10

Tomorrow’s Technology is Wire Limited

(11)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 11

Technology scaling makes communication the

scarce resource

0.35m

64Mb DRAM 16 64b FP Proc

400MHz

0.10m

4Gb DRAM 1K 64b FP Proc

2.5GHz

1997 2007

18mm 12,000 tracks

1 clock

32mm 90,000 tracks

20 clocks

(12)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 12

On-chip wires are getting slower

x1 x2

y

y

x2 = s x1 0.5x R2 = R1/s2 4x

C2 = C1 1x

tw2 = R2C2y2 = t

w1/s2 4x

tw2/tg2= tw1/(tg1s3) 8x

v = 0.5(tgRC)-1/2 (m/s)

v2 = v1s1/2 0.7x

vtg = 0.5(tg/RC)1/2 (m/gate)

v2tg2 = v1tg1s3/2 0.35x

tw = RCy2 RCy2 RCy2

(13)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 13

Bandwidth and Latency of Modern VLSI

Size 10

1 100 103 104 105

10 100

1 103

Latency

Latency Bandwidth

1

0.01

10-4

10-6 Bandwidth

(14)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 14

Architecture for Locality Exploit high on-chip bandwidth

Off-chip RAM P in -B an dw id th , 2 G B /s Vector Reg File 104 32-bit ALUs

50GB/s Sw

itc

h

(15)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 15

Tomorrow’s Computing Engines

• Aimed at media processing

– stream based – latency tolerant – low-precision – little reuse

– lots of conditionals

• Use the large number of devices available on future chips

• Make efficient use of scarce communication resources

– bandwidth hierarchy

– no centralized resources

• Approach the performance of a special-purpose

(16)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 16

Why do Special-Purpose Processors Perform Well?

(17)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 17

Care and Feeding of ALUs

Data Bandwidth

Instruction Bandwidth

Regs

Instr. Cache

IR

IP

(18)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 18

Three Key Problems

• Instruction bandwidth • Data bandwidth

(19)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 19

A Bandwidth Hierarchy

SDRAM

SDRAM

SDRAM

SDRAM Str

ea m in g M em or y 1.6GB/s V ec to r R eg is te r F ile 50GB/s ALU Cluster ALU Cluster ALU Cluster 500GB/s

13 ALUs per cluster

•Solves data bandwidth problem

(20)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 20

A Streaming Memory System

Address Generator

Address Generator IX

D

C

ro

ss

ba

r

Reorder Queue

Reorder Queue

SDRAM Bank

(21)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 21

Streaming Memory Performance

Bank Queue Effectiveness

0.00000 0.20000 0.40000 0.60000 0.80000 1.00000 1.20000 1.40000 1.60000 1.80000

1 2 4 8 16 32 64 Infinite

Queue Size C yc le s /A cc e s s

• Exploit latency insensitivity for improved bandwidth

(22)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 22

Compound Vector Operations 1 Instruction does lots of work

LD Vd Vx

Mem AG

VRF Memory Instructions

Control Store uIP

Op V0 V1 V2 V3 V4 V5 V6 V7 Compound Vector Instruction

Op Ra Rb Op Ra Rb Op Ra Rb

1 CV Inst (50b)

(23)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 23

Scheduling by Simulated Annealing

• List scheduling assumes global communication

– does poorly when

communication exposed

• View scheduling as a CAD problem (place and route)

– generate naïve ‘feasible’ schedule

– iteratively improve schedule by moving operations.

ALUs Time

(24)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 24

Typical Annealing Schedule

0 20 40 60 80 100 120 140 160 180

1 2001 4001 6001 8001 10001 12001 14001 16001 18001

13 166

(25)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 25

Conventional Approaches to

Data-Dependent Conditional Execution

(26)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 26

Zero-Cost Conditionals

• Most Approaches to Conditional Operations are Costly

– Branching control flow - dead issue slots on mispredicted branches – Predication (SIMD select, masked vectors) - large fraction of

execution ‘opportunities’ go idle.

• Conditional Vectors

– append an element to an output stream depending on a case

variable.

Result Stream

Case Stream {0,1}

0 1

(27)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 27

Application Sketch - Polygon Rendering

V1 V2

V3

V1 V2 V3 X Y RGB

Y X1 X2 RGB1 RGB Y

X1 X2

UV

UV1 UV

Vertex

Span

X Y RGB UV Pixel

Y X

X Y RGB TexturedPixel

(28)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 28

Status

• Working simulator of Imagine

• Simple kernels running on simulator

– FFT

• Applications being developed

– Depth extraction, video compression, polygon rendering, image-based graphics

(29)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 29

Acknowledgements

• Students/Staff

– Don Alpert (Intel) – Chris Buehler (MIT) – J.P Grossman (MIT) – Brad Johanson

– Ujval Kapasi – Brucek Khailany

– Abelardo Lopez-Lagunas – Peter Mattson

– John Owens – Scott Rixner

• Helpful Suggestions

– Henry Fuchs (UNC) – Pat Hanrahan

– Tom Knight (MIT) – Marc Levoy

(30)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 30

Conclusion

• Work toward tomorrow’s computing engines

• Targeted toward media processing

– streams of low-precision samples – little reuse

– latency tolerant

• Matched to the capabilities of communication-limited

technology

– explicit bandwidth hierarchy

– explicit communication between units – communication exposed

Referensi

Dokumen terkait

Setelah diumumkannya penetapan Pemenang pengadaan ini, maka kepada Peserta dapat menyampaikan sanggahan secara elektronik melalui aplikasi SPSE atas penetapan pemenang kepada Pokja

Karakteristik orang yang melakukan prinsip ekonomi sebagai berikut, kecuali ..... Memperhitungkan manfaat

Partisipasi dan perjuangan rakyat Indonesia dalam upaya bela negara pada masa yang lalu, adalah sebagai berikut , kecuali ………a. pertempuran 10 November di Surabaya

Ayah menyiangi rumput separuh dari kebunnya dan anaknya mengerjakan sepertiganya?. dengan luas kebun adalah

MANDALA INDONESIA TECHNOLOGY Divisi Training IT, WINTECH, adalah merupakan unit tugas yang harus diikuti oleh setiap mahasiswa Desain Komunikasi Visual di

dengan judulnya Colorful dengan konsep pewarnaan yang disukai anak kecil, eye catching, dan menyenangkan. Tentu saja dengan bentukan desain yang sangat familiar

Metoda evaluasi yang dipakai adalah sistem gugur dengan ambang batas teknis baik pada unsur-unsur maupun nilai total teknis dimana setiap dokumen yang dinyatakan

know, the research result is a fact that is recognized the truth in the past,. present, and in the future, however, the research result in this case