Lecture_1.ppt 2323KB Jun 23 2011 12:12:12 PM

(1)

Lecture 1 ₁

CS 352H: Computer Systems Architecture

Lecture 1: What is Computer

Architecture and why should I care?

Professor Emmett Witchel University of Texas at Austin

(2)

Goals

• Understand the “how” and “why” of computer system

organization

– Instruction Set Architecture

– System Organization (processor, memory, I/O) – Microarchitecture

– Virtualization

• Learn methods of evaluating performance

– Metrics & benchmarks

• Learn how to make systems go fast

– Pipelining, caching

– Parallelism (ILP, DLP, TLP)

(3)

Lecture 1 ₃

Logistics

Lectures T/Th 12:30-2:00pm, PAI 3.14

Instructor Prof. Emmett Witchel, W 1:15-2:15 TA Shalini Sahoo

MW 11:30-1:00pm PAI 5.38 Desk1

Grading see web page

Texts Hennessy & Patterson, Computer

Organization and Design (Fourth Edition) Including CD

(4)

CS352H Online

URL:

www.cs.utexas.edu/users/witchel/CS352H

I will occasionally email you via blackboard and by your

registered email address. I expect this channel to be

reliable and timely.

discussion group: via blackboard

login at courses.utexas.edu

General, Homeworks, Project

Computer Architecture Seminar Series:

(5)

Lecture 1 ₅

Assignment for Next Tuesday

• Turn in student survey forms, if you want

• Read the Moore paper (see webpage)

– Write a review of 1/2-1 page (see syllabus) – Review should include

• Summary of content of paper

• Your observations on the most interesting/important aspects

(6)

Discussion

• Are you interested in taking this course?

• One question about computer science

(7)

Lecture 1 ₇

Specification

Program

ISA

(Instruction Set Architecture)

microArchitecture

Logic

Transistors

Physics/Chemistry

compute the fibonacci sequence for(i=2; i<100; i++) { a[i] = a[i-1]+a[i-2];}

load r1, a[i]; add r2, r2, r1;

re gi st er s A B S F G D S G S D

(8)

CS352H Topics

• Technology Trends

• Instruction set architectures

• Pipelining

• Modern pipelined architectures

– Dynamic ILP machines

– Static ILP machines

• Cache memory systems

• Virtual memory

• Multiprocessors

(9)

Making This Class Work For You

• Plus and minus grades

• Clickers

CS352H Fall 2007

(10)

What is Computer Architecture?

Technology

Applications Computer _Architect

Interfaces

Machine Organization

Measurement & Evaluation

IS

A

P

I

Li

nk

I/O

C

ha

n

(11)

Lecture 1 ₁₁

Technology Constraints

• Yearly improvement

– Semiconductor technology • 60% more devices per

chip

(doubles every 18 months)

• 15% faster devices

(doubles every 5 years) • Slower wires

– Magnetic Disks

• 60% increase in density – Circuit boards

• 5% increase in wire density

– Cables

• no change

1998 1995 1992

1989

>100x more devices since 1989 10x faster devices

(12)

Changing Technology leads to

Changing Architecture

• 1970s

– multi-chip CPUs

– semiconductor memory very expensive

– microcoded control

– complex instruction sets (good code density)

• 1980s

– single-chip CPUs, on-chip RAM feasible

– simple, hard-wired control – simple instruction sets – small on-chip caches

• 1990s

– lots of transistors

– complex control to exploit instruction-level parallelism

• 2000s

– even more transistors – Power wall

– Transition to CMPs – Multi-level caches

• 2010s

– Embedded vs. Desktop vs. Data center (cloud)

– New storage (PCM, flash)

– Simpler cores and lots of them

(13)

Lecture 1 ₁₃

Intel 4004 - 1971

• The first microprocessor

• 2,300 transistors

• 108 KHz

(14)

Some Recent Chips!

Intel Pentium IV

• 42 million transistors • 4GHz

• 0.13m process

• Could fit ~15,000 4004s on this chip!

NVidia - GeForce 6800 • 222 million transistors • 400MHz

• 0.13m process

Intel Itanium II (Montecito) • 1.7 billion transistors • 1.6 GHz

• 90nm process

IBM Cell

• 8 vector processors + 1 PPC

(15)

CS352H Fall 2007

Lecture 1 ₁₅

(16)

Application Constraints

• Applications drive machine ‘balance’

– Numerical simulations

• floating-point performance • main memory bandwidth – Transaction processing

• I/Os per second

• integer CPU performance – Decision support

• I/O bandwidth – Embedded control

• I/O timing, power – Media processing

(17)

Lecture 1 ₁₇

Application-Driven Architectures

• General purpose - good performance on “all”

programs

– x86 family, ARM, powerPC, etc.

• Application specificity can focus on:

– Types of concurrency available

– Domain of deployment (server, handheld, desktop)

• Today - overview of graphics processors

– Interface (instruction set architecture - ISA) – Processor organization

(18)

Apple’s iPad/iPhone4 Powered by A4 Chip

• A4 is modified ARM Cortex run at 1GHz

– Integrated processor, graphics, memory controller

• Among other claims, ARM says the processors gets a

near "25 percent processing power boost, even at

same processor speed, from the use of a new

instruction pipelining system."

– We will cover pipelining in this class.

(19)

Performance: Latency and Throughput

• Latency: time to complete an operation

• Throughput: work completed per unit time

• Consider plumbing

– Low latency: turn on faucet and water comes out – High bandwidth: lots of water (e.g., to fill a pool)

• What is “High speed Internet?”

– Low latency: needed to interactive gaming

– High bandwidth: needed for downloading large files – Marketing departments like to conflate latency and

(20)

Relationship between Latency and Throughput

• Latency and bandwidth only loosely coupled

– Henry Ford: assembly lines increase bandwidth without reducing latency

• My factory takes 1 day to make a Model-T ford.

– But I can start building a new car every 10 minutes – At 24 hrs/day, I can make 24 * 6 = 144 cars per day – A special order for 1 green car, still takes 1 day

– Throughput is increased, but latency is not.

• Latency reduction is difficult

• Often, one can buy bandwidth

(21)

What is cloud computing?

•

Cloud computing

is where dynamically scalable and

often virtualized resources are provided as a service

over the Internet (thanks, wikipedia!)

• Infrastructure as a service (IaaS)

– Amazon’s EC2 (elastic compute cloud)

• Platform as a service (PaaS)

– Google gears – Microsoft azure

• Software as a service (SaaS)

– gmail

(22)

(23)

Lecture 1 ₂₃

Graphics has dedicated chip in PCs

CPU

Memory

Input/Output Glue Chip (“South Bridge”)

Graphics

Processor

Memory Controller Chip (“North Bridge”)

Memory

Disk, Keyboard, PCIe, etc.

582 Million

transistors

681 Million

transistors

(GeForce 8800, 90nm)

(AGP, PCIe) (Intel “Kentsfield” quad core,

(24)

GPU/CPU Performance comparison

G

F

LO

P

S

G80 = GeForce 8800 GTX G71 = GeForce 7900 GTX G70 = GeForce 7800 GTX NV40 = GeForce 6800 Ultra NV35 = GeForce FX 5950 Ultra NV30 = GeForce FX 5800

Source: NVIDIA (except CELL and Core2 Quad)

* IBM Cell ~200 GFlops

(25)

CS352H Fall 2007

Lecture 1 ₂₅

Why a dedicated processing chip?

• 1) Specialization – becoming less important with time • 2) Parallelism – becoming more important

Graphics processors are the only highly-parallel processors in every desktop machine.

128 “processors” * 2 FLOPS

@ 1.35 GHz

(26)

Graphics

requires

programmability

void normalmapped(float2 normalMapTexCoord : TEXCOORD0,

…

out float4 color : COLOR,

uniform float ambient,

…)

{

float3 normalTex, …;

normalTex = tex2D(normalMap, normalMapTexCoord).xyz;

…

diffuse = saturate(dot(normal, normLightDir);

…

color = Kd * (ambient + diffuse ) +

Ks * pow(specular, specularExponent;

}

Every application does something a bit different.

(27)

Lecture 1 ₂₇

(28)

Next Time

• Performance evaluation

• Basic computer organization

• How chips are made

• Start in on instruction set review/overview