Lecture 1 1
CS 352H: Computer Systems Architecture
Lecture 1: What is Computer
Architecture and why should I care?
Professor Emmett Witchel University of Texas at Austin
Goals
• Understand the “how” and “why” of computer system
organization
– Instruction Set Architecture
– System Organization (processor, memory, I/O) – Microarchitecture
– Virtualization
• Learn methods of evaluating performance
– Metrics & benchmarks• Learn how to make systems go fast
– Pipelining, caching– Parallelism (ILP, DLP, TLP)
Lecture 1 3
Logistics
Lectures T/Th 12:30-2:00pm, PAI 3.14
Instructor Prof. Emmett Witchel, W 1:15-2:15 TA Shalini Sahoo
MW 11:30-1:00pm PAI 5.38 Desk1
Grading see web page
Texts Hennessy & Patterson, Computer
Organization and Design (Fourth Edition) Including CD
CS352H Online
URL:
www.cs.utexas.edu/users/witchel/CS352H
I will occasionally email you via blackboard and by your
registered email address. I expect this channel to be
reliable and timely.
discussion group: via blackboard
login at courses.utexas.edu
General, Homeworks, Project
Computer Architecture Seminar Series:
Lecture 1 5
Assignment for Next Tuesday
• Turn in student survey forms, if you want
• Read the Moore paper (see webpage)
– Write a review of 1/2-1 page (see syllabus) – Review should include
• Summary of content of paper
• Your observations on the most interesting/important aspects
Discussion
• Are you interested in taking this course?
• One question about computer science
Lecture 1 7
Specification
Program
ISA
(Instruction Set Architecture)microArchitecture
Logic
Transistors
Physics/Chemistry
compute the fibonacci sequence for(i=2; i<100; i++) { a[i] = a[i-1]+a[i-2];}
load r1, a[i]; add r2, r2, r1;
re gi st er s A B S F G D S G S D
CS352H Topics
• Technology Trends
• Instruction set architectures
• Pipelining
• Modern pipelined architectures
– Dynamic ILP machines– Static ILP machines
• Cache memory systems
• Virtual memory
• Multiprocessors
Making This Class Work For You
• Plus and minus grades
• Clickers
CS352H Fall 2007
What is Computer Architecture?
Technology
Applications Computer Architect
Interfaces
Machine Organization
Measurement & Evaluation
IS
A
A
P
I
Li
nk
I/O
C
ha
n
Lecture 1 11
Technology Constraints
• Yearly improvement
– Semiconductor technology • 60% more devices per
chip
(doubles every 18 months)
• 15% faster devices
(doubles every 5 years) • Slower wires
– Magnetic Disks
• 60% increase in density – Circuit boards
• 5% increase in wire density
– Cables
• no change
1998 1995 1992
1989
>100x more devices since 1989 10x faster devices
Changing Technology leads to
Changing Architecture
• 1970s
– multi-chip CPUs
– semiconductor memory very expensive
– microcoded control
– complex instruction sets (good code density)
• 1980s
– single-chip CPUs, on-chip RAM feasible
– simple, hard-wired control – simple instruction sets – small on-chip caches
• 1990s
– lots of transistors
– complex control to exploit instruction-level parallelism
• 2000s
– even more transistors – Power wall
– Transition to CMPs – Multi-level caches
• 2010s
– Embedded vs. Desktop vs. Data center (cloud)
– New storage (PCM, flash)
– Simpler cores and lots of them
Lecture 1 13
Intel 4004 - 1971
• The first microprocessor
• 2,300 transistors
• 108 KHz
Some Recent Chips!
Intel Pentium IV
• 42 million transistors • 4GHz
• 0.13m process
• Could fit ~15,000 4004s on this chip!
NVidia - GeForce 6800 • 222 million transistors • 400MHz
• 0.13m process
Intel Itanium II (Montecito) • 1.7 billion transistors • 1.6 GHz
• 90nm process
IBM Cell
• 8 vector processors + 1 PPC
CS352H Fall 2007
Lecture 1 15
Application Constraints
• Applications drive machine ‘balance’
– Numerical simulations
• floating-point performance • main memory bandwidth – Transaction processing
• I/Os per second
• integer CPU performance – Decision support
• I/O bandwidth – Embedded control
• I/O timing, power – Media processing
Lecture 1 17
Application-Driven Architectures
• General purpose - good performance on “all”
programs
– x86 family, ARM, powerPC, etc.
• Application specificity can focus on:
– Types of concurrency available– Domain of deployment (server, handheld, desktop)
• Today - overview of graphics processors
– Interface (instruction set architecture - ISA) – Processor organization
Apple’s iPad/iPhone4 Powered by A4 Chip
• A4 is modified ARM Cortex run at 1GHz
– Integrated processor, graphics, memory controller
• Among other claims, ARM says the processors gets a
near "25 percent processing power boost, even at
same processor speed, from the use of a new
instruction pipelining system."
– We will cover pipelining in this class.
Performance: Latency and Throughput
• Latency: time to complete an operation
• Throughput: work completed per unit time
• Consider plumbing
– Low latency: turn on faucet and water comes out – High bandwidth: lots of water (e.g., to fill a pool)
• What is “High speed Internet?”
– Low latency: needed to interactive gaming
– High bandwidth: needed for downloading large files – Marketing departments like to conflate latency and
Relationship between Latency and Throughput
• Latency and bandwidth only loosely coupled
– Henry Ford: assembly lines increase bandwidth without reducing latency
• My factory takes 1 day to make a Model-T ford.
– But I can start building a new car every 10 minutes – At 24 hrs/day, I can make 24 * 6 = 144 cars per day – A special order for 1 green car, still takes 1 day– Throughput is increased, but latency is not.
• Latency reduction is difficult
• Often, one can buy bandwidth
What is cloud computing?
•
Cloud computing
is where dynamically scalable and
often virtualized resources are provided as a service
over the Internet (thanks, wikipedia!)
• Infrastructure as a service (IaaS)
– Amazon’s EC2 (elastic compute cloud)• Platform as a service (PaaS)
– Google gears – Microsoft azure
• Software as a service (SaaS)
– gmailLecture 1 23
Graphics has dedicated chip in PCs
CPU
Memory
Input/Output Glue Chip (“South Bridge”)
Graphics
Processor
Memory Controller Chip (“North Bridge”)
Memory
Memory
Memory
Disk, Keyboard, PCIe, etc.
582 Million
transistors
681 Million
transistors
(GeForce 8800, 90nm)
(AGP, PCIe) (Intel “Kentsfield” quad core,
GPU/CPU Performance comparison
G
F
LO
P
S
G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce 6800 Ultra NV35 = GeForce FX 5950 Ultra NV30 = GeForce FX 5800
Source: NVIDIA (except CELL and Core2 Quad)
* IBM Cell ~200 GFlops
CS352H Fall 2007
Lecture 1 25
Why a dedicated processing chip?
• 1) Specialization – becoming less important with time • 2) Parallelism – becoming more important
Graphics processors are the only highly-parallel processors in every desktop machine.
128 “processors” * 2 FLOPS
@ 1.35 GHz
Graphics
requires
programmability
void normalmapped(float2 normalMapTexCoord : TEXCOORD0,
void normalmapped(float2 normalMapTexCoord : TEXCOORD0,
…
…
out float4 color : COLOR,
out float4 color : COLOR,
uniform float ambient,
uniform float ambient,
…)
…)
{
{
float3 normalTex, …;
float3 normalTex, …;
normalTex = tex2D(normalMap, normalMapTexCoord).xyz;
normalTex = tex2D(normalMap, normalMapTexCoord).xyz;
…
…
diffuse = saturate(dot(normal, normLightDir);
diffuse = saturate(dot(normal, normLightDir);
…
…
color = Kd * (ambient + diffuse ) +
color = Kd * (ambient + diffuse ) +
Ks * pow(specular, specularExponent;
Ks * pow(specular, specularExponent;
}
}
Every application does something a bit different.
Lecture 1 27
Next Time
• Performance evaluation
• Basic computer organization
• How chips are made
• Start in on instruction set review/overview