• Tidak ada hasil yang ditemukan

dally.ppt 3432KB Jun 23 2011 12:31:30 PM

N/A
N/A
Protected

Academic year: 2017

Membagikan "dally.ppt 3432KB Jun 23 2011 12:31:30 PM"

Copied!
30
0
0

Teks penuh

(1)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 1

Tomorrow’s Computing Engines

February 3, 1998

Symposium on High-Performance Computer Architecture

William J. Dally

Computer Systems Laboratory Stanford University

(2)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 2

Focus on Tomorrow, not Yesterday

General’s tend to always fight the last war

Computer architects tend to always design the last computer

old programs

(3)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 3

Some Previous “Wars” (1/3)

MARS Router 1984

Torus Routing Chip 1985

Network Design Frame 1988

(4)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 4

Some Previous “Wars” (2/3)

(5)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 5

(6)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 6

Tomorrow’s Computing Engines

• Driven by tomorrow’s applications - media

(7)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 7

90% of Desktop Cycles will Be Spent on ‘Media’ Applications by 2000

• Quote from Scott Kirkpatric of IBM (talk abstract) • Media applications include

– video encode/decode

– polygon & image-based graphics

– audio processing - compression, music, speech - recognition/synthesis

– modulation/demodulation at audio and video rates

• These applications involve stream processing • So do

(8)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 8

Typical Media Kernel

Image Warp and Composite

• Read 10,000 pixels from memory

• Perform 100 16-bit integer operations on each pixel • Test each pixel

• Write 3,000 result pixels that pass to memory

• Little reuse of data fetched from memory

– each pixel used once

• Little interaction between pixels

– very insensitive to operation latency

(9)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 9

Telepresence: A Driving Application

Acquire 2D Images

Extract Depth (3D Images)

Segmentation Model Extraction

Compression

Decompression Rendering

Display 3D Scene

Most kernels: Latency insensitive

(10)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 10

Tomorrow’s Technology is Wire Limited

(11)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 11

Technology scaling makes communication the

scarce resource

0.35m

64Mb DRAM 16 64b FP Proc

400MHz

0.10m

4Gb DRAM 1K 64b FP Proc

2.5GHz

1997 2007

18mm 12,000 tracks

1 clock

32mm 90,000 tracks

20 clocks

(12)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 12

On-chip wires are getting slower

x1 x2

y

y

x2 = s x1 0.5x R2 = R1/s2 4x

C2 = C1 1x

tw2 = R2C2y2 = t

w1/s2 4x

tw2/tg2= tw1/(tg1s3) 8x

v = 0.5(tgRC)-1/2 (m/s)

v2 = v1s1/2 0.7x

vtg = 0.5(tg/RC)1/2 (m/gate)

v2tg2 = v1tg1s3/2 0.35x

tw = RCy2 RCy2 RCy2

(13)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 13

Bandwidth and Latency of Modern VLSI

Size 10

1 100 103 104 105

10 100

1 103

Latency

Latency Bandwidth

1

0.01

10-4

10-6 Bandwidth

(14)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 14

Architecture for Locality Exploit high on-chip bandwidth

Off-chip RAM P in -B an dw id th , 2 G B /s Vector Reg File 104 32-bit ALUs

50GB/s Sw

itc

h

(15)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 15

Tomorrow’s Computing Engines

• Aimed at media processing

– stream based – latency tolerant – low-precision – little reuse

– lots of conditionals

• Use the large number of devices available on future chips

• Make efficient use of scarce communication resources

– bandwidth hierarchy

– no centralized resources

• Approach the performance of a special-purpose

(16)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 16

Why do Special-Purpose Processors Perform Well?

(17)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 17

Care and Feeding of ALUs

Data Bandwidth

Instruction Bandwidth

Regs

Instr. Cache

IR

IP

(18)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 18

Three Key Problems

• Instruction bandwidth • Data bandwidth

(19)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 19

A Bandwidth Hierarchy

SDRAM

SDRAM

SDRAM

SDRAM Str

ea m in g M em or y 1.6GB/s V ec to r R eg is te r F ile 50GB/s ALU Cluster ALU Cluster ALU Cluster 500GB/s

13 ALUs per cluster

•Solves data bandwidth problem

(20)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 20

A Streaming Memory System

Address Generator

Address Generator IX

D

C

ro

ss

ba

r

Reorder Queue

Reorder Queue

SDRAM Bank

(21)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 21

Streaming Memory Performance

Bank Queue Effectiveness

0.00000 0.20000 0.40000 0.60000 0.80000 1.00000 1.20000 1.40000 1.60000 1.80000

1 2 4 8 16 32 64 Infinite

Queue Size C yc le s /A cc e s s

• Exploit latency insensitivity for improved bandwidth

(22)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 22

Compound Vector Operations 1 Instruction does lots of work

LD Vd Vx

Mem AG

VRF Memory Instructions

Control Store uIP

Op V0 V1 V2 V3 V4 V5 V6 V7 Compound Vector Instruction

Op Ra Rb Op Ra Rb Op Ra Rb

1 CV Inst (50b)

(23)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 23

Scheduling by Simulated Annealing

• List scheduling assumes global communication

– does poorly when

communication exposed

• View scheduling as a CAD problem (place and route)

– generate naïve ‘feasible’ schedule

– iteratively improve schedule by moving operations.

ALUs Time

(24)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 24

Typical Annealing Schedule

0 20 40 60 80 100 120 140 160 180

1 2001 4001 6001 8001 10001 12001 14001 16001 18001

13 166

(25)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 25

Conventional Approaches to

Data-Dependent Conditional Execution

(26)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 26

Zero-Cost Conditionals

• Most Approaches to Conditional Operations are Costly

– Branching control flow - dead issue slots on mispredicted branches – Predication (SIMD select, masked vectors) - large fraction of

execution ‘opportunities’ go idle.

• Conditional Vectors

– append an element to an output stream depending on a case

variable.

Result Stream

Case Stream {0,1}

0 1

(27)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 27

Application Sketch - Polygon Rendering

V1 V2

V3

V1 V2 V3 X Y RGB

Y X1 X2 RGB1 RGB Y

X1 X2

UV

UV1 UV

Vertex

Span

X Y RGB UV Pixel

Y X

X Y RGB TexturedPixel

(28)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 28

Status

• Working simulator of Imagine

• Simple kernels running on simulator

– FFT

• Applications being developed

– Depth extraction, video compression, polygon rendering, image-based graphics

(29)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 29

Acknowledgements

• Students/Staff

– Don Alpert (Intel) – Chris Buehler (MIT) – J.P Grossman (MIT) – Brad Johanson

– Ujval Kapasi – Brucek Khailany

– Abelardo Lopez-Lagunas – Peter Mattson

– John Owens – Scott Rixner

• Helpful Suggestions

– Henry Fuchs (UNC) – Pat Hanrahan

– Tom Knight (MIT) – Marc Levoy

(30)

Tomorrow's Computin g Engines

WJD Feb 3, 1998 30

Conclusion

• Work toward tomorrow’s computing engines

• Targeted toward media processing

– streams of low-precision samples – little reuse

– latency tolerant

• Matched to the capabilities of communication-limited

technology

– explicit bandwidth hierarchy

– explicit communication between units – communication exposed

Referensi

Dokumen terkait

###* Relationship between an organization and its environment* Life cycle of the organization * Political nature of the organization changes in internal control structures,

legal regulatory political hazard economic natural reputational Event Risk liqudity risk vendor financing debt risk covenant violation account receivable account payable Credit

 Computer science is the study of algorithms  An algorithm is a well-ordered collection of. unambiguous and

• Performs top-down admissible search of the subsumption lattice above this example. • Use of compression function to limit size of clauses and number of

• Allows definitions of objects in own coordinate systems Allows definitions of objects in own coordinate systems • Allows use of object definition multiple times in a scene Allows

o Director and Lecturer, Computer Forensics and e- Discovery, Humanities Advanced Technology and Information Institute, University of Glasgow.. o

• Examine ways of changing instructional design in Computer Supported Collaborative Learning Environments in Higher Education so as authentic assessment to be part of

possible—software that uses text, graphics, animation, video, music, voice, and sound effects to communicate.  Regardless of the hardware, interactive