misc02 harnessing moore

(1)

Harnessing Moore’s Law

(with Selected Implications)

Mark D. Hill

Computer Sciences Department

University of Wisconsin-Madison

(2)

Motivation

• What the do the following intervals have in common?

– Prehistory-2003

– 2004-2005

• Answer: Equal progress in absolute computer speed

• Furthermore, more doublings in 2006-07, 2008-09, …

• Questions

– Why do computers get better and cheaper?

– How do computer architects contribute (my bias)?

(3)

Outline

• Computer Primer

– Software

– Hardware

• Technology Primer

• Harnessing Moore’s Law

(4)

Computer Primer: Software

Application programmers write software:

int main (int argc, char *argv[]) {

int i;

int sum = 0;

for (i = 0; i <= 100; i++) sum = sum + i * i; printf (“The sum from 0 .. 100 is %d\n”, sum); }

(5)

Computer Primer: Software, cont.

System software translates for hardware:

.main: ...

loop: lw $14, 28($sp)

mul $15, $14, $14 <--- multiply i * i

lw $24, 24($sp)

addu $25, $24, $15 <--- add to sum

sw $25, 24($sp) addu $8, $14, 1 sw $8, 28($sp) ble $8, 100, loop la $4, str

lw $5, 24($sp) jal printf

move $2, $0

(6)

Computer Primer: Software, cont.

What the hardware really sees:

…

10001111101011100000000000011100 10001111101110000000000000011000

00000001110011100000000000011001 <--- multiply i * i

00100101110010000000000000000001 00101001000000010000000001100101 10101111101010000000000000011100 00000000000000000111100000010010

00000011000011111100100000100001 <--- add to sum

(7)

Computer Primer: Hardware Components

• Processor

– Rapidly executes instructions

– Commonly: Processor implemented

– as microprocessor chip (Intel Pentium 4) – Larger computers have multiple processors

• Memory

– Stores vast quantities of instructions and data – Commonly: DRAM chips

backed by magnetic disks

(8)

Apple Mac 7200 (from Hennessy & Patterson)

(9)

Computer Primer: Hardware Operation

E.g., do

mul temp,i,i

& go on to next instruction

Fetch-Execute Loop {

S1: read “current” instruction from memory

S2: decode instruction to see what is to be done S3: read instruction input(s)

S4: perform instruction operation S5: write instruction output(s)

(10)

Computer Big Picture

• Separate Software & Hardware (divide & conquer)

• Software

– Worry about applications only (hardware can already exist) – Translate from one form to another

(instructions & data interchangeable!)

• Hardware

(11)

Outline

• Computer Primer

– Exponential Growth

– Technology Background – Moore’s Law

(12)

Exponential Growth

• Occurs when growth is proportional to current size

• Mathematically:

dy / dt = k * y

• Solution:

y = e

k*t

• E.g., a bond with $100 principal yielding 10% interest

• 1 year: $110 = $100 * (1 + 0.10)

• 2 years: $121 = $100 * (1 + 0.10) * (1 + 0.10)

•

…

• 8 years: $214 = $100 * (1 + 0.10)

8

• Other examples

(13)

Absurd Exponential Example

• Parameters

– $16 base

– 59% growth/year – 36 years

• 1

st

year’s $16



buy book

• 3

rd

year’s $64



buy computer game

• 15

th

year’s $16,000



buy car

• 24

th

year’s $100,000



buy house

(14)

Technology Background

• Computer logic implemented with switches

– Like light switches, except that a switch can control others – Yields a network (called circuit) of switches

– Want circuits to be fast, reliable, & cheap

• Logic Technologies

– Mechanical switch & vacuum tube – Transistor (1947)

– Integrated circuit (chip): circuit of many transistors made at once (1958)

(15)

(Technologist’s) Moore’s Law

• Parameters

– 16 transistor/chip circa 1964 – 59% growth/year

– 36 years (2000) and counting

• 1st year’s 16  ???

• 3rd year’s 64  ???

• 15th year’s 16,000  ???

• 24th year’s 100,000  ???

(16)

(17)

Other “Moore’s Laws”

• Other technologies improving rapidly

– Magnetic disk capacity – DRAM capacity

– Fiber-optic network bandwidth

• Other aspects improving slowly

– Delay to memory – Delay to disk

– Delay across networks

• Computer Implementor’s Challenge

(18)

Outline

• Computer Primer

– Microprocessor

– Bit-Level Parallelism

– Instruction-Level Parallelism – Caching & Memory Hierarchies – Cost & Implications

(19)

Microprocessor

• Computers for the 1960s expensive, using 100s if not 1000s of chips

• First Microprocessor in 1971 – Processor on one chip

– Intel 4004

– 2300 transistors – Barely a processor

(20)

Transistor Parallelism

• To use more transistor quickly,

– use them side-by-side (or in parallel) – Approach depend on scale

• Consider organizing people – 10 people

– 1000 people

– 1,000,000 people

• Transistors

– Bit-level parallelism

(21)

Bit-Level Parallelism

• Less (e.g., 8 * 15 = 120):

00001000 * 00001111 = 00001000

00001000 00001000 00001000 00001111000

• More:

(22)

Instruction-Level Parallelism

• Limits to bit-level parallelism

– Numbers are big enough – Operations are fast

• Seek parallelism executing many instruction at once

• Recall Fetch-Execute Loop {

S1: read “current” instruction from memory

S2: decode instruction to see what is to be done

S3: read instruction input(s)

S4: perform instruction operation

S5: write instruction output(s)

Also determine “next” instruction and make it “current”

(23)

Instruction-Level Parallelism, cont.

• One-at-a-time instructions per cycle = 1/5

Time 01 02 03 04 05 06 07 08 09 10 ADD S1 S2 S3 S4 S5

SUB .. .. .. .. .. S1 S2 S3 S4 S5

• Pipelining instructions per cycle = 1 (or less)

Time 01 02 03 04 05 06 07 08 09 10 ADD S1 S2 S3 S4 S5

SUB .. S1 S2 S3 S4 S5

(24)

• 4-way Superscalar instructions per cycle = 4 (or less)

Time 01 02 03 04 05 06 07 08 09 10 ADD S1 S2 S3 S4 S5

SUB S1 S2 S3 S4 S5 ORI S1 S2 S3 S4 S5 AND S1 S2 S3 S4 S5

MUL .. S1 S2 S3 S4 S5 SRL .. S1 S2 S3 S4 S5 XOR .. S1 S2 S3 S4 S5 LDW .. S1 S2 S3 S4 S5

(25)

• Current processors have dozens of instructions

executing

• Must predict which instructions are next

• Limits to control prediction?

• Look elsewhere? (thread-level parallelism later)

(26)

Caching & Memory Hierarchies

• Memory can be

– Fast

– Vast

– But not both

• Use two memories

– Cache: small, fast (e.g., 64,000 bytes in 1 ns)

– Memory: large, vast (e.g., 64,000,000 bytes in 100 ns)

• Use prediction to fill cache

– Likely to re-reference information

– Likely to reference nearby information

(27)

Caching & Memory Hierarchies, cont.

• Cache + Memory makes memory look fast & vast

– If cache has information on 99% of accesses

– 1 ns + 1% * 100 ns = 2 ns – E.g. P3 (w/o L2 cache)

• Caching Applied Recursively – Registers

– Level-one cache – Level-two cache – Memory

(28)

Cost Side of Moore’s Law

• About every two years: same computing at half cost

• Long-term effect:

– 1940s Prototypes for calculating ballistic trajectories – 1950s Early mainframes for large banks

– 1960s Mainframes flourish in many large businesses

– 1970s Minicomputers for business, science, & engineering – Early 1980s PCs for word processing & spreadsheets – Late 1980s PCs for desktop publishing

– 1990s PCs for games, multimedia, e-mail, & web

(29)

Outline

• Computer Primer

• Future Trends

– Moore’s Law

(30)

Revolutions

• Industrial Revolution enabled by machines

– Interchangeable parts

– Mass production

– Lower costs  expanded application

• Information Revolution enabled by machines

– Interchangeable purpose (software)

(31)

Future of Moore’s Law

• Short-Term (1-5 years)

– Will operate (due to prototypes in lab) – Fabrication cost will go up rapidly

• Medium-Term (5-15 years)

– Exponential growth rate will likely slow – Trillion-dollar industry is motivated

• Long-Term (>15 years)

– May need new technology (chemical or quantum) – We can do better (e.g., human brain)

(32)

Future of Harnessing Moore’s Law

• Thread-Level Parallelism

– Multiple processors cooperating (exists today)

– More common in future with multiple processors per chip – Parallelism in Internet? The Grid.

• System on a Chip

– Processor, memory, and I/O on one chip – Cost-performance leap like microprocessor? – (e.g., accelerometer at right)

• Communication

– World-wide web & wireless cell phone fuse!

(33)

Future Computer Uses

• Computer cost-effectiveness determines application viability – Spreadsheets on a US$2M mainframe do not make sense

– A 10x cost-performance change enables new possibilities [Joy]

• Most computers will NOT be computers

– How many electric motors do you have in your home? – How many did you buy as electric motors?

– I control several computers, but most computers I control are embedded in cars, remote controls, refrigerators, etc.

• Two Stories

(34)

Future Computer Uses, cont.

• Technologists have always been poor predictors for future use – Edison invented the motion picture machine

– Hollywood invented movies

• To Predict:

– What would you want if it was 10 times cheaper? – What can be 10 time cheaper if you make more?

– Better yet, ask a ten year old!

(35)

Some Non-Technical Thoughts

• We make over a billion transistors/second

– One transistor per man/woman/child in < 10 seconds

(humankind has made many more transistors than bricks!) – But those transistors are not being distributed equally

• Computers can be incredibly effectively tools

– Knowledge workers in medicine, law, & engineering – But not unskilled laborers!

• Computer use will exacerbate the social gradient

(36)

Summary

• Computers are machines for purposes

“to be determined”

• Vast cost reductions have enabled new uses

– Software flexibility

– Moore’s Law and its harnessing

• Technology should be our tool, not our master

– Many benefits