(C) 2003 Mulitfacet Project University of Wisconsin-Madison
Evaluating a $2M Commercial
Server on a $2K PC
and Related Challenges
Mark D. Hill
Multifacet Project (www.cs.wisc.edu/multifacet) Computer Sciences Department
University of Wisconsin—Madison
Wisconsin Multifacet Project
2
Methods
• Commercial Servers
– Processors, memory, disks $2M
– Run large multithreaded transaction-oriented workloads – Use commercial applications on commercial OS
• To Simulate on $2K PC
– Scale & tune workloads
– Manage simulation complexity – Cope with workload variability
• NSF Challenges in Computer Architecture Evaluation
Context & Summary
Keep L2 miss rates, etc. Separate timing & function Use randomness & statistics
Wisconsin Multifacet Project
3
Methods
Multifacet: Commercial Server Design
• Wisconsin Multifacet Project
– Directed by Mark D. Hill & David A. Wood – Sponsors: NSF, WI, IBM, Intel, & Sun
– Current Contributors: Alaa Alameldeen, Brad Beckman, Milo Martin, Mike Marty, Kevin Moore, & Min Xu
• Commercial Server Availability
– SafetyNet tolerates some transient faults [ISCA 2002]
• Commercial Server Software Complexity
– Flight Data Recorder aids debugging of multithreaded programs
[ISCA 2003]
• Commercial Server Design Complexity
– Token Coherence eases coherence protocol design
Wisconsin Multifacet Project
4
Methods
Outline
• Workload & Simulation Methods
– Select, scale, & tune workloads – Transition workload to simulator – Specify & test the proposed design
– Evaluate design with simple/detailed processor models
• Separate Timing & Functional Simulation
• Cope with Workload Variability
Wisconsin Multifacet Project
5
Methods
Multifacet Simulation Overview
• Virtutech Simics (www.virtutech.com)
• Rest is Multifacet software
Full System Functional Simulator (Simics)
Pseudo-Random Protocol Checker
Memory Timing Simulator (Ruby)
Processor Timing Simulator (Opal) Commercial Server
(Sun Fire V880) Scaled Workloads Full Workloads
Memory Protocol Generator (SLICC)
Timing Simulator
Protocol Development
Wisconsin Multifacet Project
6
Methods
Select Important Workloads
• Online Transaction Processing: DB2 w/ TPC-C-like • Java Server Workload: SPECjbb
• Static web content serving: Apache
• Dynamic web content serving: Slashcode • Java-based Middleware
Wisconsin Multifacet Project
7
Methods
Setup & Tune Workloads (on real hardware)
• Tune workload, OS parameters
• Measure transaction rate, speed-up, miss rates, I/O • Compare to published results
Wisconsin Multifacet Project
8
Methods
Scale & Re-tune Workloads
• Scale-down for PC memory limits
• Retaining similar behavior (e.g., L2 cache miss rate) • Re-tune to achieve higher transaction rates
(OLTP: raw disk, multiple disks, more users, etc.)
Commercial Server
Wisconsin Multifacet Project
9
Methods
Transition Workloads to Simulation
• Create disk dumps of tuned workloads
• In simulator: Boot OS, start, & warm application • Create Simics checkpoint (snapshot)
Full System Functional Simulator (Simics)
Wisconsin Multifacet Project
10
Methods
Specify Proposed Computer Design
• Coherence Protocol (control tables: states X events) • Cache Hierarchy (parameters & queues)
• Interconnect (switches & queues) • Processor (later)
Memory Timing Simulator (Ruby) Memory Protocol
Wisconsin Multifacet Project
11
Methods
Test Proposed Computer Design
• Randomly select write action & later read check • Massive false-sharing for interaction
• Perverse network stresses design • Transient error & deadlock detection • Sound but not complete
Memory Timing Simulator (Ruby) Pseudo-Random
Wisconsin Multifacet Project
12
Methods
Simulate with Simple Blocking Processor
• Warm-up caches or sometimes sufficient (SafetyNet) • Run for fixed number of transactions
– Some transaction partially done at start – Other transactions partially done at end
• Cope with workload variability (later)
Full System Functional Simulator (Simics)
Memory Timing Simulator (Ruby)
Wisconsin Multifacet Project
13
Methods
Simulate with Detailed Processor
• Accurate (future) timing & (current) function
• Simulation complexity decoupled (discussed soon) • Same transaction methodology
& work variability issues
Full System Functional Simulator (Simics)
Memory Timing Simulator (Ruby)
Wisconsin Multifacet Project
14
Methods
Simulation Infrastructure & Workload Process
• Select important workloads: run, tune, scale, & re-tune • Specify system & pseudo-randomly test
• Create warm workload checkpoint
• Simulate with simple or detailed processor
• Fixed #transactions, manage simulation complexity (next), cope with workload variability (next next)
Full System Functional Simulator (Simics)
Memory Timing Simulator (Ruby)
Processor Timing Simulator (Opal) Commercial Server
(Sun Fire V880) Scaled Workloads Full Workloads
Wisconsin Multifacet Project
15
Methods
Outline
• Workload & Simulation Methods
• Separate Timing & Functional Simulation
– Simulation Challenges & Complexity – Timing-First Simulation
• Cope with Workload Variability
Wisconsin Multifacet Project
16
Methods
Simulating Function Getting Harder!
(Simulated) Target System
Target Application Kernels Benchmarks SPEC
Database
Operating System Web Server
RAM Processor
PCI Bus
Ethernet Controller
Fiber Channel Controller Graphics
Card
Controller Terminal I/O MMU
Controller
IRQ Controller Status
Wisconsin Multifacet Project
17
Methods
Simulating Timing Getting Harder!
• Micro-architecture complexity
– Multiple “in-flight” instructions – Speculative execution
– Out-of-order execution
• Thread-level parallelism
Wisconsin Multifacet Project
18
Methods
Managing Simulator Complexity
Functional Simulator
Timing
Simulator Functional-First (Trace-driven) - Timing feedback
+ Timing feedback - Tight Coupling - Performance? Timing and Functional
Simulator Integrated (SimOS) - Complex
Timing-Directed
Functional Simulator Timing
Simulator
Complete Timing
No? Function
No Timing
Complete Function
Timing-First (Multifacet)
Functional Simulator Timing
Simulator
Complete Timing Partial Function
No Timing
Wisconsin Multifacet Project
19
Methods
Timing-First Operation
Timing Simulator
Functional Simulator
Execute Commit
Reload Verify
• Timing Simulator runs speculatively ahead
• On commit, calls Functional Simulator to verify • Reload Timing Simulator state if necessary,
Wisconsin Multifacet Project
20
Methods
Timing-First Discussion
• Supports speculative multi-processor timing models • Leverages existing simulators
• Rapid development time (e.g., immediate checks) • Has low simulation overhead (18% uniprocessor) • Introduces relatively little performance error (< 3%) • BUT duplicates some code & function
Timing-First Simulation
Functional Simulator Timing
Simulator
Complete Timing
Partial Function
No Timing
Wisconsin Multifacet Project
21
Methods
Outline
• Workload & Simulation Methods
• Separate Timing & Functional Simulation
• Cope with Workload Variability
– Variability in Multithreaded Workloads – Coping in Simulation
Wisconsin Multifacet Project
22
Methods
What is Happening Here?
Wisconsin Multifacet Project
23
Methods
What is Happening Here?
• How can slower memory lead to faster workload?
• Answer: Multithreaded workload takes different path
– Different lock race outcomes – Different scheduling decisions
• (1) Does this happen for real hardware?
Wisconsin Multifacet Project
24
Methods
One Second Intervals (on real hardware)
Wisconsin Multifacet Project
25
Methods
60 Second Intervals (on real hardware)
16-day simulation
Wisconsin Multifacet Project
26
Methods
Coping with Workload Variability
• Running (simulating) long enough not appealing
• Need to separate coincidental & real effects • Standard statistics on real hardware
– Variation within base system runs
vs. variation between base & enhanced system runs – But deterministic simulation has no “within” variation
• Solution with deterministic simulation
– Add pseudo-random delay on L2 misses
Wisconsin Multifacet Project
27
Methods
Confidence Interval Example
• Estimate #runs to get
non-overlapping confidence intervals
Wisconsin Multifacet Project
28
Methods
Outline
• Workload & Simulation Methods
• Separate Timing & Functional Simulation
• Cope with Workload Variability
• NSF Challenges in Computer Architecture Evaluation
Wisconsin Multifacet Project
29
Methods
NSF Challenges in Computer Architecture Evaluation
• Dec 2001 NSF Computer Systems Architecture Workshop
– Report in IEEE Computer, Aug 2003
– By Kevin Skadon, Margaret Martonosi,David August, Mark Hill, David Lilja, & Vijay Pai
• Simulation Frameworks
– P (Problem): Need more modularity, portability, & reuse – R (Recommendation): More simulations frameworks,
e.g., ASIM & Liberty
• Benchmarking
– P: Benchmarks for too few domains
Wisconsin Multifacet Project
30
Methods
NSF Challenges in Computer Architecture Evaluation
• Abstractions & Methodology
– P: Believe simulation too much; other methods insufficiently
• 1985 ISCA: 30% simulation & 30% modeling • 2001 ISCA: 90% simulation & 0% modeling
– R: Push analytic models for insight, cross validation, & far—reaching research
• Metrics, Accuracy, & Validation
– P: Too dependent on relative & aggregate metrics
Wisconsin Multifacet Project
31
Methods
Talk Summary
• Simulations of $2M Commercial Servers must
– Complete in reasonable time (on $2K PCs) – Handle OS, devices, & multithreaded hardware – Cope with variability of multithreaded software
• Multifacet
– Scale & tune transactional workloads – Separate timing & functional simulation
– Cope w/ workload variability via randomness & statistics
• References (www.cs.wisc.edu/multifacet/papers)
– Simulating a $2M Commercial Server on a $2K PC [Computer 2/03]
– Full-System Timing-First Simulation [Sigmetrics 02]
– Variability in Architectural Simulations … [HPCA 03]
• NSF Panel
Wisconsin Multifacet Project
32
Methods
Wisconsin Multifacet Project
33
Methods
Other Multifacet Methods Work
• Specifying & Verifying Coherence Protocols
– [SPAA98], [HPCA99], [SPAA99], & [TPDS02]
• Workload Analysis & Improvement
– Database systems [VLDB99] & [VLDB01]
– Pointer-based [PLDI99] & [Computer00]
– Middleware [HPCA03]
• Modeling & Simulation
– Commercial workloads [Computer02] & [HPCA03]
– Decoupling timing/functional simulation [Sigmetrics02]
– Simulation generation [PLDI01]
– Analytic modeling [Sigmetrics00] & [TPDS TBA]
– Micro-architectural slack [ISCA02]