• Tidak ada hasil yang ditemukan

computer03_simulation.ppt 804KB Jun 23 2011 12:31:56 PM

N/A
N/A
Protected

Academic year: 2017

Membagikan "computer03_simulation.ppt 804KB Jun 23 2011 12:31:56 PM"

Copied!
51
0
0

Teks penuh

(1)

(C) 2003 Mulitfacet Project University of Wisconsin-Madison

Simulating

a $2M Commercial Server

on a $2K PC

Alaa Alameldeen, Milo Martin, Carl Mauer,

Kevin Moore, Min Xu, Daniel Sorin,

Mark D. Hill, & David A. Wood

Multifacet Project (www.cs.wisc.edu/multifacet)

Computer Sciences Department

University of Wisconsin—Madison

(2)

Wisconsin Multifacet Project

2

Methods

• Context

– Commercial server design is important – Multifacet project seeks improved designs – Must evaluate alternatives

• Commercial Servers

– Processors, memory, disks  $2M

– Run large multithreaded transaction-oriented workloads – Use commercial applications on commercial OS

• To Simulate on $2K PC

– Scale & tune workloads

– Manage simulation complexity – Cope with workload variability

Summary

(3)

Wisconsin Multifacet Project

3

Methods

Outline

• Context

– Commercial Servers – Multifacet Project

• Workload & Simulation Methods

• Separate Timing & Functional Simulation

• Cope with Workload Variability

(4)

Wisconsin Multifacet Project

4

Methods

Why Commercial Servers?

• Many (Academic) Architects

– Desktop computing – Wireless appliances

• We focus on servers

– (Important Market)

(5)

Wisconsin Multifacet Project

5

Methods

3-Tier Internet Service

PCs w/ “soft” state

Servers running databases for

“hard” state Servers running

applications for “business” rules

LAN

/

SAN

LAN

/

SAN

(6)

Wisconsin Multifacet Project

6

Methods

Multifacet: Commercial Server Design

• Wisconsin Multifacet Project

– Directed by Mark D. Hill & David A. Wood

– Sponsors: NSF, WI, Compaq, IBM, Intel, & Sun

– Current Contributors: Alaa Alameldeen, Brad Beckman,

Nikhil Gupta, Pacia Harper, Jarrod Lewis, Milo Martin, Carl Mauer, Kevin Moore, Daniel Sorin, & Min Xu

– Past Contributors: Anastassia Ailamaki, Ender Bilir, Ross Dickson, Ying Hu, Manoj Plakal, & Anne Condon

• Analysis

– Want 4-64 processors

– Many cache-to-cache misses

– Neither snooping nor directories ideal

• Multifacet Designs

(7)

Wisconsin Multifacet Project

7

Methods

Outline

• Context

• Workload & Simulation Methods

– Select, scale, & tune workloads – Transition workload to simulator – Specify & test the proposed design

– Evaluate design with simple/detailed processor models

• Separate Timing & Functional Simulation

• Cope with Workload Variability

(8)

Wisconsin Multifacet Project

8

Methods

Multifacet Simulation Overview

• Virtutech Simics (

www.virtutech.com

)

• Rest is Multifacet software

Full System Functional Simulator (Simics)

Pseudo-Random Protocol Checker

Memory Timing Simulator (Ruby)

Processor Timing Simulator (Opal) Commercial Server

(9)

Wisconsin Multifacet Project

9

Methods

Select Important Workloads

• Online Transaction Processing: DB2 w/ TPC-C-like

• Java Server Workload: SPECjbb

• Static web content serving: Apache

• Dynamic web content serving: Slashcode

• Java-based Middleware: (soon)

(10)

Wisconsin Multifacet Project

10

Methods

Setup & Tune Workloads (on real hardware)

• Tune workload, OS parameters

• Measure transaction rate, speed-up, miss rates, I/O

• Compare to published results

(11)

Wisconsin Multifacet Project

11

Methods

Scale & Re-tune Workloads

• Scale-down for PC memory limits

• Retaining similar behavior (e.g., L2 cache miss rate)

• Re-tune to achieve higher transaction rates

(OLTP: raw disk, multiple disks, more users, etc.)

Commercial Server

(12)

Wisconsin Multifacet Project

12

Methods

Transition Workloads to Simulation

• Create disk dumps of tuned workloads

• In simulator: Boot OS, start, & warm application

• Create Simics checkpoint (snapshot)

Full System Functional Simulator (Simics)

(13)

Wisconsin Multifacet Project

13

Methods

Specify Proposed Computer Design

• Coherence Protocol (control tables: states X events)

• Cache Hierarchy (parameters & queues)

• Interconnect (switches & queues)

• Processor (later)

Memory Timing Simulator (Ruby) Memory Protocol

(14)

Wisconsin Multifacet Project

14

Methods

Test Proposed Computer Design

• Randomly select write action & later read check

• Massive false-sharing for interaction

• Perverse network stresses design

• Transient error & deadlock detection

• Sound but not complete

Memory Timing Simulator (Ruby) Pseudo-Random

(15)

Wisconsin Multifacet Project

15

Methods

Simulate with Simple Blocking Processor

• Warm-up caches or sometimes sufficient (SafetyNet)

• Run for fixed number of transactions

– Some transaction partially done at start – Other transactions partially done at end

• Cope with workload variability (later)

Full System Functional Simulator (Simics)

Memory Timing Simulator (Ruby)

(16)

Wisconsin Multifacet Project

16

Methods

Simulate with Detailed Processor

• Accurate (future) timing & (current) function

• Simulation complexity decoupled (discussed soon)

• Same transaction methodology

& work variability issues

Full System Functional Simulator (Simics)

Memory Timing Simulator (Ruby)

(17)

Wisconsin Multifacet Project

17

Methods

Simulation Infrastructure & Workload Process

• Select important workloads: run, tune, scale, & re-tune • Specify system & pseudo-randomly test

• Create warm workload checkpoint

• Simulate with simple or detailed processor

• Fixed #transactions, manage simulation complexity (next), cope with workload variability (next next)

Full System Functional Simulator (Simics)

Memory Timing Simulator (Ruby)

Processor Timing Simulator (Opal) Commercial Server

(Sun E6000) Scaled Workloads Full Workloads

(18)

Wisconsin Multifacet Project

18

Methods

Outline

• Context

• Simulation Infrastructure & Workload Process

• Separate Timing & Functional Simulation

– Simulation Challenges

– Managing Simulation Complexity – Timing-First Simulation

– Evaluation

(19)

Wisconsin Multifacet Project

19

Methods

Challenges to Timing Simulation

• Execution driven simulation is getting harder

• Micro-architecture complexity

– Multiple “in-flight” instructions – Speculative execution

– Out-of-order execution

• Thread-level parallelism

(20)

Wisconsin Multifacet Project

20

Methods

Challenges to Functional Simulation

• Commercial workloads have high functional fidelity

demands

(Simulated) Target System Target Application Database Operating System SPEC Benchmarks Kernels Web Server Application complexity RAM Processor PCI Bus Ethernet Controller Fiber Channel Controller Graphics Card SCSI Controller CD-ROM SCSI Disk SCSI Disk DMA Controller Terminal I/O MMU Controller IRQ Controller Status
(21)

Wisconsin Multifacet Project

21

Methods

Managing Simulator Complexity

Functional Simulator

Timing

Simulator Functional-First (Trace-driven) - Timing feedback

+ Timing feedback - Tight Coupling - Performance? Timing and Functional

Simulator Integrated (SimOS) - Complex Timing-Directed Functional Simulator Timing Simulator

Complete Timing No? Function

No Timing

Complete Function

Timing-First (Multifacet) Functional

Simulator Timing

Simulator

Complete Timing Partial Function

No Timing

Complete Function

+ Timing feedback

+ Using existing simulators

(22)

Wisconsin Multifacet Project

22

Methods

Timing-First Simulation

• Timing Simulator

– does functional execution of user and privileged operations – does speculative, out-of-order multiprocessor timing simulation

– does NOT implement functionality of full instruction set or any devices

• Functional Simulator

– does full-system multiprocessor simulation

– does NOT model detailed micro-architectural timing

(23)

Wisconsin Multifacet Project

23

Methods

Timing-First Operation

• As instruction retires, step CPU in functional simulator

• Verify instruction’s execution

• Reload state if timing simulator

deviates

from functional

– Loads in multi-processors

– Instructions with unidentified side-effects – NOT loads/store to I/O devices

(24)

Wisconsin Multifacet Project

24

Methods

Benefits of Timing-First

• Supports speculative multi-processor timing models

• Leverages existing simulators

• Software development advantages

– Increases flexibility and reduces code complexity – Immediate, precise check on timing simulator

• However:

(25)

Wisconsin Multifacet Project

25

Methods

Evaluation

• Our implementation, TFsim uses:

Functional Simulator: Virtutech Simics

Timing simulator: Implemented less than one-person year

• Evaluated using OS intensive commercial workloads

OS Boot: > 1 billion instructions of Solaris 8 startup – OLTP: TPC-C-like benchmark using a 1 GB database

Dynamic Web: Apache serving message board, using code and data similar to slashdot.org

(26)

Wisconsin Multifacet Project

26

Methods

Measured Deviations

(27)

Wisconsin Multifacet Project

27

Methods

(28)

Wisconsin Multifacet Project

29

Methods

Analysis of Results

• Runs full-system workloads!

• Timing performance impact of deviations

– Worst case: less than 3% performance error

• ‘Overhead’ of redundant execution

– 18% on average for uniprocessors

– 18% (2 processors) up to 36% (16 processors)

Total Execution Time

(29)

Wisconsin Multifacet Project

30

Methods

Performance Comparison

• Absolute simulation performance comparison

– In kilo-instructions committed per second (KIPS) – RSIM Scaled: 107 KIPS

– Uniprocessor TFsim: 119 KIPS (Simulated) Target System Target Application Host Computer Out-of-Order MP SPARC V9 SPLASH-2 Kernels

400 MHz SPARC running Solaris Out-of-Order MP Full-system SPARC V9 SPLASH-2 Kernels

(30)

Wisconsin Multifacet Project

32

Methods

Timing-First Conclusions

• Execution-driven simulators are increasingly complex

• How to manage complexity?

• Our answer:

– Introduces relatively little performance error (worst case: 3%)

– Has low-overhead (18% uniprocessor average) – Rapid development time

Timing-First Simulation Functional

Simulator Timing

Simulator

Complete Timing Partial Function

No Timing

(31)

Wisconsin Multifacet Project

33

Methods

Outline

• Context

• Workload Process & Infrastructure

• Separate Timing & Functional Simulation

• Cope with Workload Variability

– Variability in Multithreaded Workloads – Coping in Simulation

– Examples & Statistics

(32)

Wisconsin Multifacet Project

34

Methods

What is Happening Here?

(33)

Wisconsin Multifacet Project

35

Methods

What is Happening Here?

• How can slower memory lead to faster workload?

• Answer: Multithreaded workload takes different

path

– Different lock race outcomes – Different scheduling decisions

• (1) Does this happen for real hardware?

(34)

Wisconsin Multifacet Project

36

Methods

One Second Intervals (on real hardware)

(35)

Wisconsin Multifacet Project

37

Methods

60 Second Intervals (on real hardware)

16-day simulation

(36)

Wisconsin Multifacet Project

38

Methods

Coping with Workload Variability

• Running (simulating) long enough not appealing

• Need to separate

coincidental

&

real

effects

• Standard statistics on real hardware

– Variation within base system runs

vs. variation between base & enhanced system runs – But deterministic simulation has no “within” variation

• Solution with deterministic simulation

– Add pseudo-random delay on L2 misses

(37)

Wisconsin Multifacet Project

39

Methods

(38)

Wisconsin Multifacet Project

40

Methods

Wrong Conclusion Ratio

(39)

Wisconsin Multifacet Project

41

Methods

More Generally: Use Standard Statistics

• As one would for a measurement of a “live” system

• Confidence Intervals

– 95% confidence intervals contain true value 95% of the time – Non-overlapping confidence intervals give statistically

significant conclusions

(40)

Wisconsin Multifacet Project

42

Methods

Confidence Interval Example

• Estimate #runs to get

non-overlapping confidence intervals

(41)

Wisconsin Multifacet Project

43

Methods

Also Time Variability (on real hardware)

• Therefore, select checkpoint(s) carefully

(42)

Wisconsin Multifacet Project

44

Methods

Workload Variability Summary

• Variability is a real phenomenon for multi-threaded

workloads

– Runs from same initial conditions are different

• Variability is a challenge for simulations

– Simulations are short

– Wrong conclusions may be drawn

• Our solution accounts for variability

– Multiple runs, confidence intervals

(43)

Wisconsin Multifacet Project

45

Methods

Talk Summary

• Simulations of $2M Commercial Servers must

– Complete in reasonable time (on $2K PCs)

– Handle OS, devices, & multithreaded hardware – Cope with variability of multithreaded software

• Multifacet

– Scale & tune transactional workloads – Separate timing & functional simulation

– Cope w/ workload variability via randomness & statistics

• References (

www.cs.wisc.edu/multifacet/papers

)

– Simulating a $2M Commercial Server on a $2K PC [Computer03] – Full-System Timing-First Simulation [Sigmetrics02]

(44)

Wisconsin Multifacet Project

46

Methods

Other Multifacet Methods Work

• Specifying & Verifying Coherence Protocols

– [SPAA98], [HPCA99], [SPAA99], & [TPDS02]

• Workload Analysis & Improvement

– Database systems [VLDB99] & [VLDB01]

– Pointer-based [PLDI99] & [Computer00]

– Middleware [HPCA03]

• Modeling & Simulation

– Commercial workloads [Computer02] & [HPCA03]

– Decoupling timing/functional simulation [Sigmetrics02]

– Simulation generation [PLDI01]

– Analytic modeling [Sigmetrics00] & [TPDS TBA]

(45)

Wisconsin Multifacet Project

47

Methods

(46)

Wisconsin Multifacet Project

48

Methods

One Ongoing/Future Methods Direction

• Middleware Applications

– Memory system behavior of Java Middleware [HPCA 03]

– Machine measurements – Full-system simulation

• Future Work: Multi-Machine Simulation

– Isolate middle-tier from client emulators and database

• Understand fundamental workload behaviors

(47)

Wisconsin Multifacet Project

49

Methods

ECPerf vs. SpecJBB

(48)

Wisconsin Multifacet Project

50

Methods

Online Transaction Processing (OLTP)

DB2 with a TPC-C-like workload. The TPC-C benchmark is widely used to evaluate system performance for the on-line transaction processing market. The benchmark itself is a specification that describes the schema, scaling rules, transaction types and transaction mix, but not the exact implementation of the database. TPC-C transactions are of five transaction types, all related to an order-processing environment. Performance is measured by the number of “New Order” transactions performed per minute (tpmC).

(49)

Wisconsin Multifacet Project

51

Methods

Java Server Workload (SPECjbb)

• Java-based middleware applications are increasingly used in modern e-business settings. SPECjbb is a Java benchmark emulating a 3-tier system with emphasis on the middle tier server business logic. SPECjbb runs in a single Java Virtual Machine (JVM) in which threads represent terminals in a warehouse. Each thread independently generates random input (tier 1 emulation) before calling transaction-specific business logic. The business logic operates on the data held in binary trees of java objects (tier 3 emulation). The specification states that the benchmark does no disk or network I/O.

(50)

Wisconsin Multifacet Project

52

Methods

Static Web Content Serving: Apache

• Web servers such as Apache represent an important enterprise server application. Apache is a popular open-source web server used in many internet/intranet settings. In this benchmark, we focus on static web content serving.

(51)

Wisconsin Multifacet Project

53

Methods

Dynamic Web Content Serving: Slashcode

• Dynamic web content serving has become increasingly important for web sites that serve large amount of information. Dynamic content is used by online stores, instant news, and community message board systems. Slashcode is an open-source dynamic web message posting system used by the popular slashdot.org message board system.

Referensi

Dokumen terkait

Setelah diumumkannya penetapan Pemenang pengadaan ini, maka kepada Peserta dapat menyampaikan sanggahan secara elektronik melalui aplikasi SPSE atas penetapan pemenang kepada Pokja

Partisipasi dan perjuangan rakyat Indonesia dalam upaya bela negara pada masa yang lalu, adalah sebagai berikut , kecuali ………a. pertempuran 10 November di Surabaya

Ayah menyiangi rumput separuh dari kebunnya dan anaknya mengerjakan sepertiganya?. dengan luas kebun adalah

dengan cara mengikuti aturan-aturan (IF-THEN Rules) yang telah ditetapkan pada basis pengetahuan fuzzy. 4) Defuzzyfikasi merupakan proses mengubah output fuzzy yang diperoleh

MANDALA INDONESIA TECHNOLOGY Divisi Training IT, WINTECH, adalah merupakan unit tugas yang harus diikuti oleh setiap mahasiswa Desain Komunikasi Visual di

dengan judulnya Colorful dengan konsep pewarnaan yang disukai anak kecil, eye catching, dan menyenangkan. Tentu saja dengan bentukan desain yang sangat familiar

Metoda evaluasi yang dipakai adalah sistem gugur dengan ambang batas teknis baik pada unsur-unsur maupun nilai total teknis dimana setiap dokumen yang dinyatakan

Untuk membuat file baru, Anda dapat memilih menu File‐New atau tekan tombol atau tekan