• Tidak ada hasil yang ditemukan

Chapter 2-1: CPUs

N/A
N/A
Protected

Academic year: 2024

Membagikan "Chapter 2-1: CPUs"

Copied!
31
0
0

Teks penuh

(1)

Chapter 2-1: CPUs

Soo-Ik Chae

© 2007 Elsevier 1

Topics p

„ CPU metrics.

„ Categories of CPUs. g

„ CPU mechanisms.

(2)

Performance as a design metric g

„ Performance = speed:

‰ Latency.

‰ Throughput.

„ Average vs. peak f

performance.

„ Worst-case and best- f

case performance.

High Performance Embedded Computing

© 2007 Elsevier 3

Other metrics

„ Cost (area).

„ Energy and power. gy p

„ Predictability: important for embedded systems

systems

‰ Pipelining: branch penalty.

‰ Memory system (Cache) : cache miss penalty

„ Security: difficult to measure because of the y fact that we do not know of a successful

attack.

(3)

Flynn’s taxonomy of processors y y p

„ Single-instruction single-data (SISD): RISC, etc.

„ Single-instruction multiple-data (SIMD): all processors perform the same operations processors perform the same operations.

„ Multiple-instruction multiple-data (MIMD):

homogeneo s or heterogeneo s homogeneous or heterogeneous multiprocessor.

„ Multiple-instruction multiple data (MISD).

High Performance Embedded Computing

© 2007 Elsevier 5

Other axes of comparison

„ RISC.

‰

Emphasis on software

Si l l i l i t ti

‰

Single-cycle, simple instructions

‰

Register to register: LOAD" and "STORE“ are independent instructions

‰

Low cycles per second,

‰

Large code sizes

‰

Spends more transistors on memory registers

‰

Spends more transistors on memory registers

„ CISC.

‰

Emphasis on hardware

‰

multi-cycle, complex instructions

‰

Memory-to-memory: LOAD" and "STORE“ incorporated in instructions

‰

High cycles per second

‰

Small code sizes

Transistors used for storing complex instructions

Transistors used for storing complex instructions

(4)

RISC CISC

1. 1-cycle simple instructions 1. multi-cycle complex instructions 2. only LD/ST can access memory 2. any instruction may access memory 3 designed around pipeline 3 designed around instn set

3. designed around pipeline 3. designed around instn. set

4. instns. executed by h/w 4. instns interpreted by micro-program 5. Fixed format instns 5. variable format instns

6 Few instns and modes 6 Many instns and modes 6. Few instns and modes 6. Many instns and modes

7. Complexity in the compiler 7. Complexity in the micro-program 8. Multiple register sets 8. Single register set

High Performance Embedded Computing

© 2007 Elsevier 7

Other axes of comparison p

„ Instruction issue width

„ Instruction issue width.

‰ Single issue

‰ Multiple issue: higher performance high cost increased

‰ Multiple issue: higher performance, high cost, increased power consumption

„ Scheduling for multiple-issue machines. Scheduling for multiple issue machines.

‰ Static scheduling: VLIW

‰ Dynamic schedule: superscalar y p

„ Vector processing: instruction for 1D or 2D arrays

„ Multithreading: a fine-grained concurrency

„ Multithreading: a fine grained concurrency

mechanism that allows the processor to quickly

switch between several threads of execution

(5)

Embedded vs. general-purpose processors g p p p

E b dd d b ti i d f

„ Embedded processors may be optimized for a category of applications.

‰ Must be flexible

‰ Must be flexible

‰ Customization may be narrow or broad.

‰ Billions of 8-bit processors sold each year p y

‰ 100s millions of 32-bit processors for embedded systems

„ We may judge embedded processors using different metrics:

‰ Code size.

M t f

‰ Memory system performance.

‰ Preditability.

High Performance Embedded Computing

© 2007 Elsevier 9

ARM Processor Familyy

Processor family

# of pipeline stages

Memory organization

Clock Rate

MIPS/MHz

ARM6 3 Von Neumann 25 MH

ARM6 3 Von Neumann 25 MHz

ARM7 3 Von Neumann 66 MHz 0.9

ARM8 5 Von Neumann 72 MHz 1 2

ARM8 5 Von Neumann 72 MHz 1.2

ARM9 5 Harvard 200 MHz 1.1

ARM10 6 Harvard 400 MHz 1.25

ARM10 6 400 MHz 1.25

StrongARM 5 Harvard 233 MHz 1.15

ARM11 8 Von Neumann/ 550 MHz 1.2

Harvard

(6)

ARM Architecture Version Summary y

Core Version Feature

ARM1020T v5T

„

Improved ARM/Thumb

ARM1020T v5T

„

Improved ARM/Thumb

Interworking

„

CLZ instruction for improved division improved division ARM9E-S, ARM10TDMI, ARM1020E v5TE

„

Extended multiplication

and saturated maths for DSP lik f ti lit DSP-like functionality ARM7EJ-S, ARM926EJ-S, ARM1026EJ-S v5TEJ

„

Jazelle Technology for

Java acceleration

ARM11, ARM1136J-S, v6

„

Low power needed

„

SIMD (Single Instruction Multiple Data) media processing extensions J: Jazelle

S: Synthesizable F: integral vector floating point unit E: Enhanced DSP instruction

High Performance Embedded Computing

© 2007 Elsevier 11

S: Synthesizable F: integral vector floating point unit

ARM7 3-stage pipeline organization g p p g

„ Organizations

‰ Address generating block

address register

A[31:0] control

g g

„

Address register

„

Incrementer

‰ Register bank

incrementer

register P C

g

PC

„

31-GPRs, 6-PSRs

„

2 read, 1 write ports

„

Additional 1 read,

multiply

instruction decode

&

register bank

A L

1 write port for PC

‰ Barrel shifter

‰ ALU

control

barrel shifter L

U b u s

A b u s

B b u s register

‰ ALU

‰ Data register

‰ Control logic

ALU

‰ Control logic

„

External interface

„

Instruction decoder

data out register data in register
(7)

ARM7 3-stage Pipeline g p

f

„ ARM7 family has 3 stage pipeline

„ 3 stage pipeline Fetch

‰

Fetch

„

Instruction fetch from memory

‰

Decode

PC

F D E

„

Instruction decoding

„

Datapath control signals for the next cycle

PC+i

F D E

for the next cycle

‰

Execute

„

Reading registers

PC+2i

F D E

„

Shift and ALU operations

„

Writing back to the register bank

PC+2i

F D E

High Performance Embedded Computing

© 2007 Elsevier 13

ARM7 multi-cycle instructions y

fetch ADD decode execute 1

fetch STR decode calc. addr.

fetchADD decode execute

2

3

data xfer

fetch ADD decode execute 3

fetch ADD decode execute 4

5 fetch ADD decode execute

i t ti

time

instruction

(8)

ARM7 multi-cycle instructions

„ Branch „ LDR

y

„ Branch „ LDR

LDR

F D E1

calc

E2

xfer

E3

move

B

F D E1

calc

E2

link

E3

adjust calc xfer move

F D E

j

PC+i

F

F D E

discarded

PC+2i

F

F D E

discarded

T

F D E

T+i

F D E

High Performance Embedded Computing

© 2007 Elsevier 15

ARM9TDMI core

„ LDR „ Branch

F D E M W ADD F D E M W

B F D E1 E2 E3 M W

LDR F D E M W

F

F D E M W

F

F D E M W

F D E M W

F D E M W Separated cache

Instruction and data cache

are accessible at the same time

(9)

ARM11

„ 8 stage pipeline

‰

Branch Prediction and Return Stack

‰

Branch Prediction and Return Stack

‰

Separate processing units for the ALU, MAC, and Load-Store (LS) instructions

lth h th i li i i l i

„

although the pipeline is single issue

High Performance Embedded Computing

© 2007 Elsevier 17

Feature Comparison p

Feature ARM9E ARM10E XScale ARM11

Feature ARM9E ARM10E XScale ARM11

Architecture ARMv5TE(J) ARMv5TE(J) ARMv5TE ARMv6

pipeline length 5 6 7~8 8

Java decode (ARM926EJ) (ARM1026EJ) No Yes

V6 SIMD

instructions No No No Yes

MIA instructions No No Yes Available as

coprocessor

Branch prediction No Static Dynamic Dynamic

Independent

Load-store unit No Yes Yes Yes

Instruction issue Scalar, in-order Scalar, in-order Scalar, in-order Scalar, in-order Concurrency None ALU/MAC, LSU ALU, MAC, LSU ALU, MAC, LSU Out-of-order

completion No Yes Yes Yes

Target

Synthesizable Synthesizable Custom chip Synthesizable and g

implementation Synthesizable Synthesizable Custom chip y

Hardmacro Performance

range Up to 250MHz Up to 325MHz 200MHz ~

> 1GHz

350MHz ~

> 1GHz

(10)

MIPS32 processor family p y

„ MIPS: MIPS32 4K has 5-stage pipeline; 4KE g p p ; family has DSP extension; 4KS is designed for security

for security.

High Performance Embedded Computing

© 2007 Elsevier 19

MIPS32 processor family p y

(11)

PowerPC processor family p y

„ PowerPC: 400 series includes several embedded processors; MPD7410 is two- issue machine; 970FX has 16-stage pipeline issue machine; 970FX has 16 stage pipeline.

High Performance Embedded Computing

© 2007 Elsevier 21

PowerPC processor family p y

(12)

PowerPC processor family p y

High Performance Embedded Computing

© 2007 Elsevier 23

What is DSP?

„ DSP = Digital Signal Processing g g g OR

DSP = Digital Signal Processor?

„ DSP used to denote both

i b d d d f th t t i hi h th

‰ meaning can be deduced from the context in which the term DSP is used.

„ What is a Digital Signal Processor (DSP)? g g ( )

‰ Microprocessor specifically designed to perform fast DSP

operations (e.g., Fast Fourier Transforms, inner products,

Multiply & Accumulate) p y )

(13)

DSP performance p

„ Wireless Systems requires more and more high y q g performance and higher bandwidth

P f DSP performance

3G

Performance

~100,000MIPS 384 2000 Kb

p

might not be enough for future applications

2.5G ~10,000MIPS 64-384 Kbps

384-2000 Kbps applications

2G ~100MIPS 8-13 Kbps Bit Rate

High Performance Embedded Computing

© 2007 Elsevier 25

Digital signal processors g g p

„ First DSP was AT&T DSP16:

‰ Hardware multiply- accumulate unit.

‰ Harvard architecture

‰ Harvard architecture.

„ Today, DSP is often used as a marketing used as a marketing term.

„ Modern DSPs are

„ Modern DSPs are

heavily pipelined.

(14)

TMS320C55x ™ DSP Generation, 16-bit Fi d P i t M t P Effi i t DSP Fixed Point – Most Power Efficient DSP

Features Applications

Specifications

• C55x™ DSP core delivers 300 MHz for up to 600-MIPS performance

• Feature-rich, miniaturized per- sonal and portable products

• Advanced automatic power management

• 1.6-volt core and 3.3-volt peripherals

sonal and portable products

• 2G, 2.5G and 3G cell phones and basestations

• Digital audio players

• Configurable idle domains to extend your battery life

• Shortened debug for faster time-to- market

144 MH /200 MH l k t • Digital still cameras

• Electronic books

• Voice recognition

• GPS receivers

• 144-MHz/200-MHz clock rate

• 256-KB RAM, 64-KB ROM

• Three McBSPs, I 2 C, watchdog timer, general-purpose timers

• Fingerprint/Pattern recognition

• Wireless modems

• Headsets

• Biometrics

• USB 2.0 full-speed (12 Mbps)

10-bit ADC

real-time clock (RTC)

High Performance Embedded Computing

© 2007 Elsevier 27

TMS320C55x ™ DSP + RISC, 16 bit Fi d P i t OMAP P

16-bit Fixed Point – OMAP Processor

Features Applications

Specifications

ƒ150 MHz TI enhanced ARM925

• Dual CPU processor integrating a TMS320C55x™ DSP core and an ARM925TDMI™ RISC @150 MHz

• 1.8-volt core and 1.8-volt peripherals

Internet appliances

Applications processing

Enhanced gaming

Webpad

ƒ150-MHz TI-enhanced ARM925

• 16 KB instruction cache and 8 KB data cache

• Data and instruction MMUs

• 32-bit and 16-bit instruction setsWebpad

Point-of-sale

Medical devices

Industry-specific PDAs

• 32-bit and 16-bit instruction sets 150-MHz TMS320C55x™ DSP

• 12 KW (24 KB) instruction cache

• 80 KW (160 KB) SRAM

Telematics

Digital media processing

Military and government cellular

• 16 KW (32 KB) ROM

• Two 16-bit memory interfaces for SDRAM and flash

• Nine-channel system DMA controller

• LCD controller

• USB 1.1 host and client

• MMC/SD card interface

• Seven serial ports plus three UARTs, Nine timers, Keyboard interface

• Less than 250 mW at 1.6 V

(15)

TMS320C62x ™ DSP Generation, 16-bit Fi d P i t Hi h P f DSP Fixed Point – High Performance DSP

Features Applications

Specifications

• 16-bit fixed-point DSPs

• Up to 2400 MIPS

• Pooled modems

• Digital Subscriber Line (xDSL)

• C6000™ DSP Platform VelociTI™

advanced architecture Up to 2400 MIPS

•Running at 300 Mhz

• Digital Subscriber Line (xDSL)

• Wireless basestations

• Central office switches

• Private Branch Exchange (PBX)

• Up to eight 32-bit instructions executed each cycle

• Eight independent, multi-purpose functional units thirty-two 32-bit registers

• Digital imaging

• Call processing

• 3D graphics

• Speech recognition registers

• Industry’s most advanced C compiler and Assembly Optimizer maximize efficiency and performance

• Voice over packet

High Performance Embedded Computing

© 2007 Elsevier 29

TMS320C67x ™ DSP Generation, 32-bit Fl ti P i t Hi h P f DSP Floating Point – High Performance DSP

Features Applications

Specifications

• 32-bit floating point DSPs

• Up to 1350 MFLOPS

• Pooled modems

• Digital Subscriber Line (xDSL)

• C6000™ DSP Platform VelociTI™

advanced architecture Up to 1350 MFLOPS

•Running at 225 Mhz

• Digital Subscriber Line (xDSL)

• Wireless basestations

• Central office switches

• Private Branch Exchange (PBX)

• Up to eight 32-bit instructions executed each cycle

• Eight independent, multi-purpose functional units thirty-two 32-bit registers

• Digital imaging

• Call processing

• 3D graphics

• Speech recognition registers

• Industry’s most advanced C compiler and Assembly Optimizer maximize efficiency and performance

• IEEE floating-point format

• Voice over packet

• Up to 1350 MFLOPS at 225

• Two new multi-channel serial ports (McASP) (C6713 DSP) can support up to stereo channels of I2S (Inter IC Sound) and compatible with S/PDIF transmit protocol. Note I2S is a protocol for transmitting 2 channels of digital audio over a single serial connection

(16)

TMS320C64x ™ DSP Generation, 16-bit Fi d P i t Hi h P f DSP Fixed Point – High Performance DSP

Features Applications

Specifications

•16-bit fixed point processor TMS320C64x DSP high per-

DSL and pooled modems

Basestation transceivers

• C6000™ DSP Platform VelociTI™

advanced architecture TMS320C64x DSP high per

formance core provides scalable performance of up to 1.1 GHz

• The industry’s fastest DSPs with t 600 MH (4800 MIPS)

Basestation transceivers

Wireless LAN

Enterprise PBX

Multimedia gateway

• Up to eight 32-bit instructions executed each cycle

• Eight independent, multi-purpose functional units thirty-two 32-bit registers

up to 600 MHz (4800 MIPS) performance

• C64x DSPs are software compatible with TI’s C62x™ DSPs

Broadband video transcoders

Streaming video servers and clients

Highspeed raster image processing (RIP) registers

• Industry’s most advanced C compiler and Assembly Optimizer maximize efficiency and performance

p g ( )

High Performance Embedded Computing

© 2007 Elsevier 31

Example: TI C5x DSP p

(17)

Example: TI C5x DSP p

„ 40-bit arithmetic unit

‰ 32-bit values with 8 guard bits g

„ Barrel shifter.

„ 17 x 17 multiplier

„ 17 x 17 multiplier.

„ Comparison unit for Viterbi encoding/decoding.

„ Single-cycle exponent encoder for wide- dynamic-range arithmetic.

dy a c a ge a t et c

„ Two address generators.

High Performance Embedded Computing

© 2007 Elsevier 33

TI C55x microarchitecture

(18)

TI C55x co-processors p

„ Designed to support

‰ Pixel interpolation

A U B

‰ Motion estimation

‰ DCT/IDCT computation

I t l t U M R

A U B

„ Interpolates U, M, R values given A, B, C, D pixels

M R

pixels.

C D

High Performance Embedded Computing

© 2007 Elsevier 35

Fixed Point Vs Floating Point g

Floating Point Fixed Point

Floating Point Fixed Point

Applications

•Modems

Applications

•Portable Products

•Digital Subscriber Line (DSL)

•Wireless Basestations

•2G, 2.5G and 3G Cell Phones

•Digital Audio Players Di it l Still C

•Central Office Switches

•Private Branch Exchange (PBX)

•Digital Imaging

•Digital Still Cameras

•Electronic Books

•Voice Recognition

•Digital Imaging

•3D Graphics

•Speech Recognition

Voice Recognition

•GPS Receivers

•Headsets

p g

•Voice over IP •Biometrics

•Fingerprint Recognition

(19)

Simple VLIW architecture p

„ Powerful compiler

„ A packet of instruction

„ Large register file with multiple ports feeds multiple function units

function units.

E box

Add 1 2 3 S b 4 5 6 Ld 7 f St 8 b NOP Register file

Add r1,r2,r3; Sub r4,r5,r6; Ld r7,foo; St r8,baz; NOP Register file

ALU ALU Load/store Load/store FU

High Performance Embedded Computing

© 2007 Elsevier 37

Clustered VLIW architecture

„ Register file, function units divided into clusters.

Cluster bus

Execution Execution

Register file Register file

(20)

Parallelism extraction in VLIW

„ Static:

‰ Use compiler to analyze program

„ Dynamic:

‰ Use hardware to identify opportunities analyze program.

‰ Simpler CPU.

‰ Can make use of high

identify opportunities.

‰ More complex CPU.

‰ Can make use of data

‰ Can make use of high- level language

constructs.

‰ Can make use of data values.

‰ Can’t depend on data values.

High Performance Embedded Computing

© 2007 Elsevier 39

Motorola Starcore SC140

DALU i l d 4 ALU 1 i t fil

„ DALU includes 4 ALUs, 1 register file.

„ AGU includes 2 address arithmetic units (AAU) 1 address register file

(AAU), 1 address register file.

„ Program sequencer and control unit (PSEQ).

P f

„ Performance:

‰ 4 MACs per cycle.

10 RISC MIPS MH l k

‰ 10 RISC MIPS per MHz clock.

(21)

SC140 Core

Program Program

sequencer Address Register file

Data Register file Power

mgt

Clock/PLL 2 AAUs BMU 4 ALUs

AGU DALU

High Performance Embedded Computing

© 2007 Elsevier 41

Typical SC140 configuration yp g

Level 1 memory expansion RAM, ROM

DMA, Cache Program sequencer Cache,

Interrupts, Level 2 mem, Program sequencer

ALU ALU AAU

ALU Etc.

ALU ALU AAU

peripherals

(22)

Instruction format

„ 16-bit instructions.

„ Up to 6 instructions per cycle. p p y

„ Instructions are grouped to define allowable simultaneous operations

simultaneous operations.

MACR –D0,D1,D7 AND D4,D5 MOVE.L (R0),+N0,R6 ADDA R2,R3

DALU AGU

High Performance Embedded Computing

© 2007 Elsevier 43

DALU AGU

AGU addressingg

„ Allowable addressing modes:

‰ Linear: useful for general purpose addressing. g g

‰ Modulo: useful for FIFO queues.

‰ Reverse-carry: useful for FFT.

‰ Reverse carry: useful for FFT.

‰ Automatic updating during register indirect.

‰ Stack

‰ Stack.

„ Array addressing: base, offset, modifier i t

registers.

(23)

TM-1 characteristics

27 function units

„ Characteristics

‰ 5 RISC operations/sec

‰ 5 RISC operations/sec

‰ Floating point support

‰ Sub-word parallelism

‰ Sub-word parallelism support

‰ Guarded operation (If Conversion)

‰ Additional custom ti

operations

High Performance Embedded Computing

© 2007 Elsevier 45

TM 1 VLIW CPU TM-1 VLIW CPU

i fil register file

read/write crossbar

FU1 ... FU27

slot 1 slot 2 slot 3 slot 4 slot 5

(24)

Trimedia TM 1 Trimedia TM-1

memory interface video in video out video in

audio in

video out audio out

I 2 C serial

timers image co-p

VLD co-p ge co p

PCI

VLIW CPU

High Performance Embedded Computing

© 2007 Elsevier 47

Superscalar processors p p

„ Instructions are dynamically scheduled.

‰ Dependencies are checked at run time in hardware.

„ Used to some extent in embedded Used to some extent in embedded processors.

‰ Embedded Pentium is two-issue in-order

‰ Embedded Pentium is two-issue in-order.

(25)

SIMD and subword parallelism p

„ Many special-purpose SIMD machines.

„ Subword parallelism is widely used for video. p y

‰ ALU is divided into subwords for independent operations on small operands.

operations on small operands.

„ Vector processing is widely used for integer values

values.

High Performance Embedded Computing

© 2007 Elsevier 49

SIMD Extensions

(26)

SIMD Extensions

High Performance Embedded Computing

© 2007 Elsevier 51

Multithreadingg

„ Low-level parallelism mechanism.

„ Hardware multithreading alternately fetches g y instructions from separate threads.

„ Simultaneous multithreading (SMT) fetches

„ Simultaneous multithreading (SMT) fetches instructions from several threads on each c cle

cycle.

(27)

Processor Resource Utilization

„ Processor choice depends on program characteristics.

‰ Leverage our knowledge of the core algorithms

„ Many researchers assume that multimedia

„ Many researchers assume that multimedia algorithms exhibit high levels of parallelism.

Experiments with SimpleScalar shows that this is

‰ Experiments with SimpleScalar shows that this is not the case.

Most applications exhibit fewer than 4 IPC

‰ Most applications exhibit fewer than 4 IPC.

High Performance Embedded Computing

© 2007 Elsevier 53

Available parallelism in multimedia li i (T ll l )

applications (Talla et al.)

(28)

Dynamic behavior of loops in y p MediaBench (Fritts)

„ Path ratio

‰ (instructions executed per iteration) / (total number of loop instructions)

loop instructions).

M di B h h ll th ti

„ MediaBench shows small path ratio ->

considerable conditional behavior in loops.

High Performance Embedded Computing

© 2007 Elsevier 55

Operand characteristics in MediaBench

More than 10

78%

78%

(29)

Operand characteristics in MediaBench

High Performance Embedded Computing

© 2007 Elsevier 57

Operand characteristics in Video Codecs

(30)

Dynamic voltage scaling (DVS) y g g ( )

P l ith V 2

„ Power scales with V 2 while performance scales roughly as V scales roughly as V.

„ Reduce operating voltage, add parallel voltage, add parallel operating units to make up for lower clock

d speed.

„ DVS doesn’t work in high leakage

high-leakage processors.

High Performance Embedded Computing

© 2007 Elsevier 59

Dynamic voltage and frequency scaling y g q y g (DVFS)

„ Scale both voltage and clock frequency.

„ Can use control algorithms to match

f t

performance to application, reduce power

power.

(31)

Razor architecture

„ Used specialized latch to detect errors.

„ Recovers only on errors, gains average-case

f

performance.

High Performance Embedded Computing

© 2007 Elsevier 61

Razor architecture

„ Used specialized latch to detect errors.

„ Recovers only on errors, gains average-case

f

performance.

Referensi

Dokumen terkait

Increased concentrations of soluble form of P-selectin (sP-selectin) from activated platelets or activated endothelial cells are associated with VTE events in cancer patients [16,

Menurut Undang-Undang Republik Indonesia Nomor 6 Tahun 1983 sebagaimana telah diubah terakhir dengan Undang-Undang Nomor 16 Tahun 2009 tentang Ketentuan Umum dan Tata

Kota Waringin Barat No.16 Th.2003 Ttg Perubahan Pertama Atas Peraturan Daerah Kabupaten Kota Waringin Barat Nomor 15 Tahun 2002 tentang Retribusi Pengangkutan Hasil Hutan,

Derive an equation for the electric field Eat point P, a distance Z from the disk along its central axis.. i Calculate the electric field at the surface of the disk ii Using the

Chi-squared analysis reveals no difference in answers between the metopic/unicoronal craniosynostosis n=15, sagittal craniosynostosis n=13 and age-matched controls n=16.. P-values shown

She has prepared the following analysis for her new shop: 19lj Sales price per pair of sandals Variable costs per pair of sandals Contribution margin per pair of sandals Fixed costs