Harnessing Moore’s Law
(with Selected Implications)
Mark D. Hill
Computer Sciences Department
University of Wisconsin-Madison
Motivation
• What the do the following intervals have in common?
– Prehistory-2003– 2004-2005
• Answer: Equal progress in absolute computer speed
• Furthermore, more doublings in 2006-07, 2008-09, …
• Questions
– Why do computers get better and cheaper?
– How do computer architects contribute (my bias)?
Outline
• Computer Primer
– Software– Hardware
• Technology Primer
• Harnessing Moore’s Law
Computer Primer: Software
Application programmers write software:
int main (int argc, char *argv[]) {
int i;
int sum = 0;
for (i = 0; i <= 100; i++) sum = sum + i * i; printf (“The sum from 0 .. 100 is %d\n”, sum); }
Computer Primer: Software, cont.
System software translates for hardware:
.main: ...
loop: lw $14, 28($sp)
mul $15, $14, $14 <--- multiply i * i
lw $24, 24($sp)
addu $25, $24, $15 <--- add to sum
sw $25, 24($sp) addu $8, $14, 1 sw $8, 28($sp) ble $8, 100, loop la $4, str
lw $5, 24($sp) jal printf
move $2, $0
Computer Primer: Software, cont.
What the hardware really sees:
…
10001111101011100000000000011100 10001111101110000000000000011000
00000001110011100000000000011001 <--- multiply i * i
00100101110010000000000000000001 00101001000000010000000001100101 10101111101010000000000000011100 00000000000000000111100000010010
00000011000011111100100000100001 <--- add to sum
Computer Primer: Hardware Components
• Processor
– Rapidly executes instructions
– Commonly: Processor implemented
– as microprocessor chip (Intel Pentium 4) – Larger computers have multiple processors
• Memory
– Stores vast quantities of instructions and data – Commonly: DRAM chips
backed by magnetic disks
Apple Mac 7200 (from Hennessy & Patterson)
Computer Primer: Hardware Operation
E.g., do
mul temp,i,i
& go on to next instruction
Fetch-Execute Loop {
S1: read “current” instruction from memory
S2: decode instruction to see what is to be done S3: read instruction input(s)
S4: perform instruction operation S5: write instruction output(s)
Computer Big Picture
• Separate Software & Hardware (divide & conquer)
• Software
– Worry about applications only (hardware can already exist) – Translate from one form to another
(instructions & data interchangeable!)
• Hardware
Outline
• Computer Primer
• Technology Primer
– Exponential Growth– Technology Background – Moore’s Law
• Harnessing Moore’s Law
Exponential Growth
• Occurs when growth is proportional to current size
• Mathematically:
dy / dt = k * y
• Solution:
y = e
k*t• E.g., a bond with $100 principal yielding 10% interest
• 1 year: $110 = $100 * (1 + 0.10)
• 2 years: $121 = $100 * (1 + 0.10) * (1 + 0.10)
•
…• 8 years: $214 = $100 * (1 + 0.10)
8• Other examples
Absurd Exponential Example
• Parameters
– $16 base– 59% growth/year – 36 years
• 1
styear’s $16
buy book
• 3
rdyear’s $64
buy computer game
• 15
thyear’s $16,000
buy car
• 24
thyear’s $100,000
buy house
Technology Background
• Computer logic implemented with switches
– Like light switches, except that a switch can control others – Yields a network (called circuit) of switches
– Want circuits to be fast, reliable, & cheap
• Logic Technologies
– Mechanical switch & vacuum tube – Transistor (1947)
– Integrated circuit (chip): circuit of many transistors made at once (1958)
(Technologist’s) Moore’s Law
• Parameters
– 16 transistor/chip circa 1964 – 59% growth/year
– 36 years (2000) and counting
• 1st year’s 16 ???
• 3rd year’s 64 ???
• 15th year’s 16,000 ???
• 24th year’s 100,000 ???
Other “Moore’s Laws”
• Other technologies improving rapidly
– Magnetic disk capacity – DRAM capacity
– Fiber-optic network bandwidth
• Other aspects improving slowly
– Delay to memory – Delay to disk
– Delay across networks
• Computer Implementor’s Challenge
Outline
• Computer Primer
• Technology Primer
• Harnessing Moore’s Law
– Microprocessor– Bit-Level Parallelism
– Instruction-Level Parallelism – Caching & Memory Hierarchies – Cost & Implications
Microprocessor
• Computers for the 1960s expensive, using 100s if not 1000s of chips
• First Microprocessor in 1971 – Processor on one chip
– Intel 4004
– 2300 transistors – Barely a processor
Transistor Parallelism
• To use more transistor quickly,
– use them side-by-side (or in parallel) – Approach depend on scale
• Consider organizing people – 10 people
– 1000 people
– 1,000,000 people
• Transistors
– Bit-level parallelism
Bit-Level Parallelism
• Less (e.g., 8 * 15 = 120):
00001000 * 00001111 = 00001000
00001000 00001000 00001000 00001111000
• More:
Instruction-Level Parallelism
• Limits to bit-level parallelism
– Numbers are big enough – Operations are fast
• Seek parallelism executing many instruction at once
• Recall Fetch-Execute Loop {
S1: read “current” instruction from memory
S2: decode instruction to see what is to be done
S3: read instruction input(s)
S4: perform instruction operation
S5: write instruction output(s)
Also determine “next” instruction and make it “current”
Instruction-Level Parallelism, cont.
• One-at-a-time instructions per cycle = 1/5
Time 01 02 03 04 05 06 07 08 09 10 ADD S1 S2 S3 S4 S5
SUB .. .. .. .. .. S1 S2 S3 S4 S5
• Pipelining instructions per cycle = 1 (or less)
Time 01 02 03 04 05 06 07 08 09 10 ADD S1 S2 S3 S4 S5
SUB .. S1 S2 S3 S4 S5
Instruction-Level Parallelism, cont.
• 4-way Superscalar instructions per cycle = 4 (or less)
Time 01 02 03 04 05 06 07 08 09 10 ADD S1 S2 S3 S4 S5
SUB S1 S2 S3 S4 S5 ORI S1 S2 S3 S4 S5 AND S1 S2 S3 S4 S5
MUL .. S1 S2 S3 S4 S5 SRL .. S1 S2 S3 S4 S5 XOR .. S1 S2 S3 S4 S5 LDW .. S1 S2 S3 S4 S5
Instruction-Level Parallelism, cont.
• Current processors have dozens of instructions
executing
• Must predict which instructions are next
• Limits to control prediction?
• Look elsewhere? (thread-level parallelism later)
Caching & Memory Hierarchies
• Memory can be
– Fast– Vast
– But not both
• Use two memories
– Cache: small, fast (e.g., 64,000 bytes in 1 ns)
– Memory: large, vast (e.g., 64,000,000 bytes in 100 ns)
• Use prediction to fill cache
– Likely to re-reference information
– Likely to reference nearby information
Caching & Memory Hierarchies, cont.
• Cache + Memory makes memory look fast & vast
– If cache has information on 99% of accesses
– 1 ns + 1% * 100 ns = 2 ns – E.g. P3 (w/o L2 cache)
• Caching Applied Recursively – Registers
– Level-one cache – Level-two cache – Memory
Cost Side of Moore’s Law
• About every two years: same computing at half cost
• Long-term effect:
– 1940s Prototypes for calculating ballistic trajectories – 1950s Early mainframes for large banks
– 1960s Mainframes flourish in many large businesses
– 1970s Minicomputers for business, science, & engineering – Early 1980s PCs for word processing & spreadsheets – Late 1980s PCs for desktop publishing
– 1990s PCs for games, multimedia, e-mail, & web
Outline
• Computer Primer
• Technology Primer
• Harnessing Moore’s Law
• Future Trends
– Moore’s LawRevolutions
• Industrial Revolution enabled by machines
– Interchangeable parts
– Mass production
– Lower costs expanded application
• Information Revolution enabled by machines
– Interchangeable purpose (software)
Future of Moore’s Law
• Short-Term (1-5 years)
– Will operate (due to prototypes in lab) – Fabrication cost will go up rapidly
• Medium-Term (5-15 years)
– Exponential growth rate will likely slow – Trillion-dollar industry is motivated
• Long-Term (>15 years)
– May need new technology (chemical or quantum) – We can do better (e.g., human brain)
Future of Harnessing Moore’s Law
• Thread-Level Parallelism
– Multiple processors cooperating (exists today)
– More common in future with multiple processors per chip – Parallelism in Internet? The Grid.
• System on a Chip
– Processor, memory, and I/O on one chip – Cost-performance leap like microprocessor? – (e.g., accelerometer at right)
• Communication
– World-wide web & wireless cell phone fuse!
Future Computer Uses
• Computer cost-effectiveness determines application viability – Spreadsheets on a US$2M mainframe do not make sense
– A 10x cost-performance change enables new possibilities [Joy]
• Most computers will NOT be computers
– How many electric motors do you have in your home? – How many did you buy as electric motors?
– I control several computers, but most computers I control are embedded in cars, remote controls, refrigerators, etc.
• Two Stories
Future Computer Uses, cont.
• Technologists have always been poor predictors for future use – Edison invented the motion picture machine
– Hollywood invented movies
• To Predict:
– What would you want if it was 10 times cheaper? – What can be 10 time cheaper if you make more?
– Better yet, ask a ten year old!
Some Non-Technical Thoughts
• We make over a billion transistors/second
– One transistor per man/woman/child in < 10 seconds
(humankind has made many more transistors than bricks!) – But those transistors are not being distributed equally
• Computers can be incredibly effectively tools
– Knowledge workers in medicine, law, & engineering – But not unskilled laborers!
• Computer use will exacerbate the social gradient
Summary
• Computers are machines for purposes
“to be determined”
• Vast cost reductions have enabled new uses
– Software flexibility– Moore’s Law and its harnessing
• Technology should be our tool, not our master
– Many benefits