Design and Implementation - System Software Techniques Leveraging Emerging Hardware for High-Pe

3.3.1 Overview and Terminologies

The main goal of HyPart is to configure the state of each application in a way that maximizes the user- specified efficiency metric. HyPart is a hybrid memory bandwidth partitioning technique in the sense that it employs the three memory bandwidth partitioning techniques in a coordinated manner. HyPart performs optimization in an application characteristic-aware manner in the way that it dynamically prunes inefficient states based on the characteristics of the target applications, which are dynamically analyzed at runtime.

Without loss of generality, we assume that there are NC cores on the target system andNA applications to be executed, whereN_A≤N_C. A state (i.e.,s_i) of applicationi(i∈[0,N_A−1]) is defined as (ci, di,mi), whereci,di, andmidenote the setting of thread packing (i.e., core count), clock modulation, and MBA, respectively. The total core count allocated to all the applications must be equal to or less than N_Cand the sharing of cores between applications is disallowed.

The system state (i.e.,S) is then defined as follows – S={s₀,s₁,· · ·,sN_A−1}. The goal of HyPart is to find the optimal system state (i.e.,S_Opt) that maximizes the user-defined efficiency metric. In this work, we employ the overall throughput metric defined in Equation 1, where IPS denotes instructions per second. However, HyPart can be configured to perform optimizations based on other metrics (e.g., fairness).

Throughput= ^NA sNA−1

∏

i=0

IPSi (1)

Phase 1:

Pareto-Frontier State Selection

Phase 2:

Application Profiling

Phase 4:

State Space Exploration Phase 3:

Application Char-Aware State Pruning

Phase 5:

Idle Candidate

States Pareto-Frontier

States

Application Characteristics

Not Converged

Best System State

Re-adaptation Apps

Figure 5: Overall execution flow of HyPart

Figure 5 shows the overall execution flow of HyPart, which comprises five phases – (1) Pareto- frontier state selection, (2) application profiling, (3) application characteristic-aware state pruning, (4) state space exploration, and (5) idle phases.

3.3.2 Pareto-Frontier State Selection Phase

The first phase of HyPart is the Pareto-frontier state selection phase. The main goal of this phase is to reduce the system state space by pruning suboptimal states for each application. The key idea behind this phase is based on the observation, from our performance characterization studies, that applications tend to achieve the best performance with the state with the highest clock modulation setting and the lowest MBA setting when the core count is fixed. Therefore, we determine the Pareto-frontier state for each core count and prune all the other suboptimal states.

3.3.3 Application Profiling Phase

The second phase of HyPart is the application profiling phase. The main goal of this phase is to dynamically profile each of the target applications (in parallel) in order to determine its tolerance to the performance anomalies of thread packing when the allocated core count is not a divisor of the thread count. Specifically, we measure the portion of theIdlecycles (out of the total cycles) for a short period at runtime by setting the core count to a value that is not a divisor of the thread count and determine that the target application is tolerable against the performance anomalies of thread packing if the portion of theIdlecycle is below a threshold, which is configurable.³

3.3.4 Application Characteristic-Aware State Pruning Phase

The third phase of HyPart is the application characteristic-aware state pruning phase. The goal of this phase is to eliminate the states in which the core count is not a divisor of the thread count of the target application from its candidate states if the target application has been classified to be intolerant against the performance anomalies of thread packing. The key idea behind this phase is to increase the probability of finding the optimal system state and reduce the time required to converge to the optimal system state by eliminating the suboptimal states from the applications that are intolerant against the performance anomalies of thread packing.

3In this work, we set the threshold to 20%.

Start exploreSystemStateSpace()

sortedAppList ← sort(appList, app.TPEffect, increasing) for app in sortedAppList:

app.candidateStates.sortBy(computationCapacity, decreasing) rollbackCount ← 0

coordinationSuccess ← true

for app in sortedAppList:

worseState ← app.getWorseState(app.currState, app.prevState) app.removeCandidateState(worseState)

currEfficiency – prevEfficiency < δe

for app in sortedAppList:

app.currState ← app.prevState rollbackCount ← rollbackCount + 1

for app in sortedAppList:

app.prevState ← app.currState

rollbackCount < Trollback

for app in sortedAppList:

if app.idleRatio < threshold or app.candidateStates.isEmpty():

sortedAppList.remove(app)

sortedAppList.isEmpty()

for app in sortedAppList:

app.currState ← app.getBestState()

appList.requiredCoreCount() > NC

coordinate()

coordinationSuccess

currStateSet ≠ prevStateSet

victimApp ← sortedAppList.getRandomApp() victimApp.removeState(victimApp.currState) for app in sortedAppList:

app.currState ← app.prevState phase ← idle

End exploreSystemStateSpace() isInitialized() True

False

True

False

True False

True

False

True

False

True False

True

False

5 6

Figure 6: Execution flow of the state space exploration phase

3.3.5 State Space Exploration Phase

The fourth phase of HyPart is the state space exploration phase. HyPart employs a variant of the Tabu search algorithm [52] to dynamically explore the system state space and find an efficient system state that achieves high efficiency in terms of the user-define metric.

We first present a high-level description of the state space exploration phase of HyPart. At each adaptation period (set to 1 second in this work), HyPart explores a new system state. If the efficiency of the currently explored system state is higher than that of the previous state, HyPart continues to explore a new system state. Otherwise, HyPart rolls back to the previous system state and generates a new system state to explore. At each adaptation period, HyPart also removes some of the inefficient states that have been explored to reduce the search space. HyPart repeats this search process until all the system states have been explored or the rollback count exceeds the predefined threshold (set to 10 in this work).

Figure 6 shows the execution flow of this phase, which primarily consists of six sub-phases – (1) the initialization, (2) the efficiency evaluation, (3) the system state generation, (4) the coordination, (5) the duplicated system state elimination, and (6) the exit sub-phases.⁴

Initialization sub-phase: During the initialization sub-phase, the key data structures that are re-

4Each sub-phase is marked with its own number and enclosed in the blue, orange, grey, yellow, light blue, or green dotted box, respectively.

quired for the entire state space exploration phase are initialized. Specifically, all the applications are sorted with respect to their tolerance to the performance anomalies to thread packing, which is primarily required in the coordination sub-phase. In addition, the candidate states of each application are sorted in the order of their computation capacity (i.e., the product of the thread packing and clock modulation pa- rameters). Because the candidate states provide similar memory bandwidth, HyPart considers the states with larger computation capacity with higher priority. Note that the initialization sub-phase is executed only once for a given set of the applications.

Efficiency evaluation sub-phase: During the efficiency evaluation sub-phase, HyPart compares the efficiency of the previous and current states and accordingly performs the necessary actions based on the evaluation results. Specifically, HyPart performs the efficiency evaluation in the individual application and system levels.

In the application level, HyPart removes the state (between the previous and current states) that exhibits lower efficiency from the candidate state list of the application based on the efficiency data of each application with the previous and current states. This is to reduce the search space and ensure a faster convergence to an efficient system state.

In the system level, HyPart evaluates the overall efficiency across the applications with the previous and current system states. If the efficiency of the current system state is lower than that of the previous system state, HyPart rolls back to the previous system state. If the rollback count exceeds the predefined threshold, it transitions to the exit sub-phase.

System state generation sub-phase: If the rollback count is below the predefined threshold or the efficiency of the current system state is higher than the previous system state, HyPart transitions to the system state generation sub-phase. In this sub-phase, HyPart generates the next system state to explore by combining the best state of each application. HyPart then determines the validity of the newly generated system state by checking if the total core count requested by the newly generated system state is equal to or less than the total core count in the system. If so, HyPart transitions to the duplicated system state elimination sub-phase.

Coordination sub-phase: Otherwise, HyPart transitions to the coordination sub-phase to resolve the core oversubscription issue (i.e., the total core count requested by the applications exceeds the total core count in the system). Figure 7 shows the execution flow of the coordinate function, which implements the logic for the coordination between the applications. Specifically, HyPart iterates the application list which is sorted in the order of the tolerance against the performance anomalies of thread packing. If an application is tolerant against the performance anomalies of thread packing, it is expected to achieve high performance even if the allocated core count is not a divisor of the thread count.

For each application in the sorted list, HyPart finds the feasible state of the application that requires the minimum core count and attempts to resolve the core oversubscription issue by reclaiming the corresponding number of cores from the application. If HyPart fails to resolve the core oversubscription issue, it transitions to the exit sub-phase.

Duplicated system state elimination sub-phase: If the coordination has been successful, HyPart transitions to the duplicated system state elimination sub-phase. In this sub-phase, HyPart checks if

Start coordinate()

victimApp ← sortedAppList.getNext()

victimApp ≠ NULL

victimApp.currState ← victimApp.getMinimumCoreState()

coordinationSucccess ← False appList.requiredCoreCount() ≤ NC

victimApp.currState ← victimApp.getFeasibleBestState() coordinationSucccess ← True

End coordinate() True

False

True False

Figure 7: Execution flow of the coordinate function

the newly generated system state is the same as the previously explored system state. If so, there is a possibility that HyPart would be indefinitely stuck at the same system state. To prevent this scenario, HyPart randomly selects a state among the states in the newly generated system state and removes it from the candidate state list of the corresponding application. This essentially eliminates the newly generated system state from the candidate system states. HyPart then transitions back to the system state generation sub-phase to generate a new system state to explore.

Exit sub-phase: Finally, if HyPart transitions to the exit sub-phase, it terminates the system state exploration phase because there is no more system state to explore or the rollback count has exceeded the threshold. It then transitions to the idle phase.

3.3.6 Idle Phase

During the idle phase, HyPart keeps monitoring the target system and applications without performing any adaptation activities. If a change (e.g., the termination of an application or a change in the memory bandwidth budgets) is detected, HyPart terminates the idle phase and re-triggers the aforementioned adaptation process.

Dalam dokumen System Software Techniques Leveraging Emerging Hardware for High-Performance, Efficient, and (Halaman 35-39)