• Tidak ada hasil yang ditemukan

PPT Slide 1

N/A
N/A
Protected

Academic year: 2025

Membagikan "PPT Slide 1"

Copied!
62
0
0

Teks penuh

(1)

Techniques to Mitigate the Effects of Congenital Faults in Processors

Smruti R. Sarangi

(2)

Semiconductor Fabrication facility

(courtesy tabalcoaching.com)

(3)
(4)

Basic Lithographic Process

 The source of light is typically a argon-flouride laser

 The light passes through an array of lenses to reach the silicon substrate

 The resolution limit is given by:

 To decrease the resolution we need to :

Decrease the wavelength

Increase the refractive index

R = k1λ / NA NA = n sin θ

(5)

Resolution

 We currently use 193 nm light to make 14nm structures

 This is what we get

(6)

Methods to Compensate for Process Variation – Optical Proximity

Correction

 Pre-distort the shape such that it prints better

(7)
(8)

Assist Features

 Add small sub-resolution features to

increase the exposure at areas, which

print sub-optimally

(9)

Phase-shift Masking

 Insert features, which have a long optical path length (this inverts the phase)

 Due to destructive interference the lines will not

(10)

Parameter Variation

Parameter Variation

Process Supply Voltage Temperature

P V T

Threshold Voltage – Vt Transistor Length – Leff

(11)

Why is Variation a Problem ?

Unpredictability of V

t

, L

eff

and T implies :

  Lower chip frequency and higher leakage

(12)

Implications on Design Decisions

 Static timing analysis not possible

 Overly conservative designs

 Chips too slow

 Performance of a generation lost

 Possible solution

 Clock the chip at an unsafe frequency

 Tolerate resulting timing errors

 Reduce timing errors

Architectural techniques

Circuit techniques

(13)

Overview

Techniques to

Reduce Timing Errors

Dynamic Optimization Techniques to

Tolerate Timing Errors

Model for Timing Errors due to Process Variation

Model for Process Variation

(14)

Process Variation

Process Variation

Systematic Variation Random Variation

Lens aberrations

Mask deformities

Thickness variation in CMP

Photo-lithographic effects

Variable dopant density

Line edge roughness

(15)

Modeling Systematic Variation

Variation Map

100 0

1000

Break into a million cells

(16)

Systematic and Random Variation

 Superimpose random variation on top of systematic

Normal Distribution

 Distribution of systematic components

 Normal distribution

Spatial Correlation

Multi-variate Normal Distribution

(17)

Overview

Techniques to

Reduce Timing Errors

Dynamic Optimization Techniques to

Tolerate Timing Errors

Model for Process Variation

Model for Timing Errors due to

Process Variation ISQED ‘07

(18)

Distribution of path delays in pipe stage: With variation

Timing Errors

Distribution of path delays in pipe stage: No variation

Timing errors

P(E) = 1 – cdf(t

clk

)

(19)

Model for Timing Errors

Basic assumptions

 A structure consists of many critical paths

 The critical path depends on the input

 critical path delay > clock period  timing error

 clock period = delay of the longest critical path at

 maximum temperature

 no variation

 All pipeline stages are tightly designed  0 slack

(20)

Error rate: PE (t) = 1 – cdf(t)

Paths in a Pipeline Stage

pdf(t) cdf (t)

Timing errors

t

f

1

(21)

Basic Kinds of Structures

Logic Memory

 Heterogeneous critical paths

 ALUs, comparators, sense-amps

 Homogenous critical paths

 SRAMs, CAMs

Mixed

(22)

Logic

35% Wiring

Elmore Delay Model

65% Gates

Alpha Power Law

( )( DD th) DD eff

g T V V

V T L

Critical Path

(23)

Logic Delay

(d

wire

+  * d

gate

)*

D

varlogic

= D

logic

+d

gate

*D

extra

D

logic

Relative gate delay due to systematic

variation in P,V, T Delay due to variation in the random and syst.

component within a stage Distribution of path delays – no variation

d

wire

+ d

gate

= 1

Distribution of path delays with variation

(24)

Memory Delay

Memory Cell

Memory Line

 Use Kirchoff’s equations

 Long channel trans. equations

 Multi-variable Taylor expansion Delay dist.

Delay

line

= max(Delay

cell

)

max. distribution extend analysis

done by Roy et. al.

IEEE TCAD ‘05

(25)

Combined Error Model

 We have the delay distributions – cdf(t) – for memory and logic with variation

 For each structure

 per access, P(E) = 1 – cdf(t)

 P(E) per inst. =  P(E) ,  =accesses/inst.

 Combined error rate per instruction

P(E) =   P(E)

(26)

Validation – Logic

S. Das et. al. ‘05

(27)

Overview

Model for Timing Errors due to Process Variation

Techniques to

Reduce Timing Errors

Dynamic Optimization Model for Process Variation

Techniques to

Tolerate Timing Errors

(28)

Variation Aware Timing Speculation (VATS)

Multicore Chip

Processor Core

Diva Checker L0 Cache

L1 Cache

Checker

Razor Latches Unsafe

frequency Error free:

- Lower freq - Safe design

(29)

Other VATS Checkers

 TIMERRTOL – Uht et. al.

 Razor – Dan Ernst et. al., MICRO 2003

 X-Checker – X. Vera et. al, SELSE 2006

 X-Pipe – X. Vera et. al., ASGI 2006

 Sato and Arita, COSLP 2003

(30)

Overview

Model for Timing Errors due to Process Variation

Dynamic Optimization Model for Process Variation

Techniques to

Tolerate Timing Errors

Techniques to

Reduce Timing Errors

Submitted to ISCA ‘07

(31)

Basic Mechanisms – Shift and Tilt

Errror Rate(PE)

f Before

After Errror Rate(PE)

Before After f

frequency

Error Rate(PE)

f

Tilt Shift

(32)

Architectural Mechanisms

 Resizable issue queue (Albonesi et. al.)

 switch pass trans. off

 smaller queue

 shifts the error rate curve

SRAM/CAM array Pass Transistors

SRAM/CAM array Pass Transistors

SRAM/CAM array Sense Amps Original

New error rate

(33)

Gate Sizing

Transistor Width – W

Delay  A + B/W Power  W

Original path delay dist.

Make faster paths slower to save power

Gate Sizing

(34)

Optimization: Replicate ALUs

 Tradeoff is power vs errors

 IDEA : Switch between the two ALUs

 Use gate sized ALU if it is not timing critical and vice versa

Difference in Error Rate

(35)

Multicore Chip

frequency

Error Rate(PE)

f

Fine Grain ABB and ASV

 Adaptive Body Bias (ABB) – V

bb

 V

bb

 Delay Leakage

 V

bb

 Delay Leakage

 Adaptive Supply Voltage (ASV) -- V

dd

 V

dd

 Delay Leakage Dynamic

Vary:

Supply Voltage(ASV) Body Voltage (ABB)

(36)

Overview

Techniques to

Reduce Timing Errors Techniques to

Tolerate Timing Errors

Model for Process Variation

Model for Timing Errors due to Process Variation

Dynamic Optimization

(37)

Dynamic Behavior

Temperature Activity Factors

(38)

Formulate an Optimization Problem

 Constraints

 Temperature – At all points T < T

MAX

 Power – Total core power < P

MAX

 Error – Total errors < Err

MAX

 Goal – Maximize performance

Optimization Output

Constraints Goals

Input

(39)

Outputs

 15 ABB/ASV regions

 30 values of (V

dd

, V

bb

)

 33 outputs

f, V

dd

, V

bb

can take many values

 Very large state space

V

dd

V

bb

f

ALU

Issue queue

1

Outputs: + 30 + 1 + 1 = 33

(40)

Dimensionality Reduction

1 2 3 4 5 6 7

Max. Frequency

Stages

Minimum Frequency

 Find the max. frequency that each stage can support

 Find the slowest stage

 This is the core frequency

 Minimize power in the rest of the units

core frequency

(41)

Inputs

Inputs :  , T

H

, V

t0

, R

th

, K

leak

activity factor

accesses/cycle Heat sink

temperature Thermal resistance Phase Heat sink cycle

Forever

Constant in Leakage eqn.

(42)

Optimization Overview

Inputs f(1)

Freq. Algorithm

Inputs

Freq. Algorithm min

f(15) fcore

Power Algorithm

Power Algorithm fcore

Inputs Inputs

Vdd Vbb Vdd Vbb

(43)

Fuzzy Logic based Algorithm

Fuzzy Logic Based Algorithm

Inputs - Computationally expensive - Requires detailed models + Accurate Results

+ Very fast computation times + Incorporates detailed models - Slight inaccuracy

Exhaustive Search (Freq/Power)

(44)

Fuzzy SubController1

Final Picture

Inputs f(1)

Inputs Fuzzy

SubController15 min

f(15) fcore

Fuzzy SubController1

Fuzzy

SubController15 fcore

Inputs Inputs

Vdd Vbb Vdd Vbb

(45)

Timeline

t

Phase 120 ms Phase

Heat Sink Cycle 2-3 secs

New Phase

20 s 0.5 s

1 step

2 ms

Test configuration

6 s

STO P

10 s

2 ms Retuning Cycles

(46)

Results

(47)

Evaluation Framework

 Processor Modeled

Athlon 64 floorplan 3-wide processor 12 stage pipeline

45 nm, Vdd = 1 V, 6 GHz

Core

Core Core

Core

4-core private L2 cache Sherwood phase

detector (ISCA ’03)

 Variation Modeling

 PVT maps for 100 dies

 Fuzzy controller

C

C C

C

(48)

Terminology

Baseline Proc. with variation effects

TS Baseline+DIVA checker

TS+FU TS + FU replication

TS+Queue TS + issue-queue resizing TS+ABB+ASV Both circuit level techniques TS+Dyn TS + dynamic optimization

TS+All TS+FU+Queue+ABB+ASV+dyn

NoVar Without any variation effects

(49)

Error Plots

Maximum Perf.

point

Maximum Perf.

point

ErrMAX

(50)

Execution Point

Power

Frequency

Log (Timing Error Rate) frequency

power

power errors

frequency errors

constant error

constant freq.

constant power

(51)

Frequency

23% 49%

 Frequency increase: 10 – 49 %

Static Oracle

Fuzzy

(52)

Performance

19%

34%

 We can nullify effects of variation and even speedup

 The performance loss due to fuzzy logic is minimal

Static

(53)

Conclusion

 Do not design processors for worst case

  Need to tolerate variation induced errors

 Contributions

 Model for timing errors

 New framework for tradeoffs in P, f and P(E)

 High dimensional dynamic adaptation

 Eval. of  arch. techniques to tolerate/mitigate P(E)

 10-49% increase in frequency

 7-34% increase in performance

(54)

Conclusion II

 CADRE (DSN’06)

 Arch. support to make a board level computer cycle-accurate deterministic

 Phoenix (MICRO’06 & Top Picks’07)

  arch. support to detect and patch processor

design bugs

(55)

BACKUP

(56)

Algorithm

 f, V

dd

, V

bb

Verify T < T

MAX

T R

th

, T

H

P

dyn

P

leak

P

leak0

, V

t

Delay V

t

Error Model Find f

max

Verify Err < Err

MAX

Inputs :

 , R

th

, T

H

, P

leak0

, V

t
(57)

Memory Delay

 Solve for I

cell

using long channel eqns.

 I

cell

= f(Vt

X

,Vt

Y

,L

X

,L

Y

)

 Vt

X

,Vt

Y

,L

X

and L

Y

are gaussian variables

VDD

BL BR

WL

Icell

Y

X cell

mem

I

T 1

 

vtx

, 

vty

, 

lx

, 

ly

are the systematic components

  ,  ,  ,  are the random components

(58)

Memory Delay - II

Find a distribution for T

mem

 T

mem

is a function of four gaussian variables

 Model T

mem

as a normal distribution

 Find the  and  for T

mem

using multi-variable Taylor expansion

 This is the access time dist. for 1 bit

 A typical entry has 32-128 bits

 Find the max distribution of 32-128 normal variables

Error probability = 1 – cdf(t

mem

)

(59)

Fuzzy Low Level

X

i

j

X

j

ij

ij

W

ij

= exp[ -(( - )/ )

2

] X

j

ij

ij

  y

y

i

W y

 

i i i

W y W

y

i

Final Output

(60)

Recovery Penalty

(61)

Validation – Memory

(62)

Power

Max Power Limit

Proc. with no variation – 25 W, P

MAX

= 30 W

Referensi

Dokumen terkait

Design Phase • Computer-aided software engineering CASE tools are designed to support one or more activities of system development • CASE tools sometimes contain the following

Listing Permutations Listing Permutations continued continued  Index 0: Index 0:  Use a loop to create n-1 copies of the array by Use a loop to create n-1 copies of the array by

// if no object currently exists, the // method calls the private constructor // to create an instance; in any case, the // method returns the static reference // variable dice

 A research project entitled “Ecological Approach to English Language Learning Factors of Student Attitude and Perception toward their Teacher and Classroom Interaction in

Classical Halftoning • Use dots of varying size to represent intensities • Area of dots proportional to intensity in image... Example of Halftone Left: Halftone

Predefined Methods Predefined Methods continued continued  To access a Java method defined in the To access a Java method defined in the Math class, a calling statement must Math

Basic scenarios: sensor networks  Sensor network scenarios  Sources: Any entity that provides data/measurements  Sinks: Nodes where information is required  Belongs to the

Molecular cytogenetic examinations • In most of cases interphase cells could be used for analysis with exception of whole chromosome painting probes and M-FISH • Examples of