New Approaches to Energy and Temperature Aware Scheduling Techniques for Real-time Multi-core

I have followed the norms and guidelines given in the institute's ethical code of conduct. As far as we know, it has not been submitted elsewhere for the award of the degree.

Challenges

Real-time systems are implemented on platforms consisting of a limited number of processing elements, memory, network bandwidth, etc. Nowadays, modern real-time systems are based on heterogeneous multi-core platforms, which helps them efficiently meet the diverse and high computing demands of applications.

Motivation for this dissertation

Proposed Framework

Contributions

EAFBFS: An Energy Aware Frame Based Fair Scheduler
DPFair Scheduling with Slowdown and Suspension
A Cluster-Oriented Scheduling Technique for Heterogeneous Multi-
A Low Overhead Scheduler for Real-Time Periodic Tasks on Het-
An Energy-Aware Scheduler for Heterogeneous Multi-core real-
A Temperature-Aware Real-Time Semi-partitioned Scheduler

Experimental results show that our proposed DPFair-SS scheduling technique exhibits significant energy savings compared to the state-of-the-art [57] in situations where the system has low workloads over a fairly long period. Experimental studies show that our proposed scheduling mechanism is able to schedule a significantly larger number of task sets compared to the state-of-the-art [108].

Organization of the Thesis

The Application Layer

A Real-time Task Model

Execution time(ei) is the time it takes for the processor to complete the computing power of a task without interruption. Slack time of Laxity is the maximum time a task can be delayed after it has been triggered to be completed within the deadline: di−ei.

A Real-time Scheduler

Static and dynamic task system: In a static task system, the set of tasks that is executed on the platform is fully defined before it starts executing the set of tasks. A set of tasks is said to be schedulable if there is at least one algorithm that can produce a feasible schedule.

Processing Platform

In a job-preserving scheduling algorithm, the processor is never kept idle when there is a task waiting to be executed on the processor. Therefore, all processors can execute all tasks, but the speed at which they are executed and their worst-case execution times vary depending on the processor they are executing on.

A Classification of Real-time Scheduling Approaches

Thus, some tasks cannot be executed on some processors in the platform, while their execution speeds (and their worst-case execution times) may differ on the other processors. Dynamic priority scheduling: The distinction between static priority and dynamic priority scheduling is based on the priority management policy adopted by a priority-driven scheduler.

A brief survey of scheduling algorithms

Partitioning Strategies
Traditional Real-time Scheduling Strategies
Rate-based Resource Allocation Strategies

Server-based Allocation
Liu and Layland Style Allocation
Fluid-flow Allocation (Proportional Share Scheduling) . 27

Temperature-Aware Scheduling strategies

In [17], the authors addressed the problem of task-to-core allocation on heterogeneous multi-core platforms so that the overall energy consumption of the system is minimized. However, with more and more energy-constrained embedded devices running interactive and QoS-sensitive applications, such as streaming media, games, etc., the need for energy-aware proportional fair scheduling algorithms is rapidly becoming more important.

Summary

The scheduling strategy in the system should not only be able to allocate and schedule tasks on the available cores, but it should do so in such a way that energy consumption is minimized, making it a challenging problem make. In the following sections, we present the working principle of EAFBFS and DPFair-SS algorithms, which can suggest solutions for such problems.

Energy Aware Frame Based Fair Scheduling (EAFBFS)

Specifications

System Model
Power Model

Symbol Description n Number of tasks. si Start time of the task. ei Execution requirement of the task pi Duration of the task. rei Remaining execution requirement of Ti rpi Remaining period ofTi. wti Weight of the task shri Part of the task. U Utilization factor in the system f rcritical Normalized critical frequency. fmax Maximum value of normalized operating frequency at the frequency level. f rTi Minimum normalized frequency sufficient to complete Ti on a single core in a time slice T D Set of tasks for which f rTi > f r1. m Number of cores in the system V Set of cores. f r1 Optimum operating frequency selected in a time slice f rg Global Operating Frequency selected in a time slice.

EAFBFS Scheduling Strategy

Algorithm EAFBFS
Frequency Allocation and Mapping (FAM)
Scheduling within an Individual Core

Example 3: Continuing with the previous example, let's discuss the handling of the migrating and fixed tasks in the system. Therefore, the total over-allocation of the fixed tasks onVk duringT Ek−N T Sk is given.

Analysis of the Algorithm

Calculating the system level minimum operating frequency f r1 (using Equation 3.8) takes a constant amount of time. Therefore, the time complexity of the EAFBFS scheduler is the same as that of original ERFair.

Experiment and Results

Experimental Set Up
Performance Evaluation of EAFBFS Algorithm
Performance Comparison with EA-DPFair Algorithm

In Figure 3.4a, we observe that as the utilization factor of the system increases from 50% to 70%, the difference in energy consumption between type 1 and type 2 systems increases. We further observe that the energy consumption in Type 1 systems is higher when the number of tasks is low (n = 32) than when it is high (n = 96).

Table 3.2: Available levels of frequency Sl. No. Frequency Sl. No. Frequency

DPFair Scheduling with Slowdown and Suspension (DPFair-SS)

Power Specifications
DPFair-SS Scheduling Strategy

The Slowdown-Suspend-Schedule Function (SSS)

Analysis of the algorithm
An Illustrative Example
Experimental Set Up and Results

The pseudocode of the Slowdown-Suspend-Schedule (SSS) function shown in Algorithm 6 describes the overall energy-aware scheduling strategy within the time slice of DPFair-SS. DPFair-SS: Similar to DVFS-based DP-Wrap, T1 is divided into V0 and the frequency rg for the rest of the cores becomes 0.26 (using equation 3.27).

Table 3.3: Dynamic and Static power consumption for 70nm processor

Summary

As the number of processors and tasks in the system grows, scheduling sets of tasks using such models becomes very expensive. Some important terminologies used in later sections of the chapter are listed in Table 4.1.

Motivational Example

Cluster-Oriented Scheduling Technique (COST)

COST: A Cluster-Oriented Scheduling Technique
Analysis of the Algorithm

Experimental Results

We can observe from Figure 4.3c that increase in the number of tasks also leads to increased acceptance ratio for both the algorithms. Since the utilization factor and number of cores remain constant, an increase in the number of tasks leads to reduced individual task weights.

Figure 4.1: Deadline Partitioning & Cluster Formation for Example

HETERO-SCHED: A Low-overhead Heterogeneous Multi-core Scheduler

HETERO-SCHED Algorithm

COMPUTE-ALLOCATION
ASSIGN-NON-MIGRATE
ASSIGN-MIGRATE
COMPUTE-SCHEDULE

Analysis of the Algorithm

Experimental Results

The time complexity of Algorithm 14 (COMPUTATION-SCHEDULE) is O(mnlg n) because the construction of the schedule matrix requires O(nlg n) iterations in each of the m processor cores. Therefore, the acceptance ratio decreases with increasing number of processor cores for SA-M, while it increases for the HETERO-SCHED algorithm.

Summary

System Model

The considered system consists of a set of n periodic tasks T ={T1, T2, .., Tn} to be scheduled on a set of m heterogeneous multicore cores V = {V1, V2, .., Vm} that can run on to a discrete normalized set of frequencies F = {f1, f2, .., fmax}, so that fmax represents the normalized frequency of 1, and all other frequencies lie between (ff1 . max) and 1.

Power Model

Critical frequency: It can be observed from equation (5.2) that although the dynamic power consumption of a core (Pd) has a cubic relationship with the operating frequency, lowering f increases the task execution time and can lead to higher overall energy consumption when execution times become significantly high for small values of f. Thus there exists a critical frequency (fcr) beyond which further frequency reduction actually leads to an increased net energy consumption (summation of dynamic and static energy).

Motivational Example

As a result of this effect, beyond a certain amount of frequency reduction, the waste of static power exceeds the gain obtained by dynamic voltage/frequency scaling.

HEALERS Algorithm

COMPUTE-SCHEDULE

SCHEDULE-NON-MIGRATE
SCHEDULE-MIGRATE

First, it creates a list L3 from L2 to keep track of the unallocated portion of Ti (lines 2 to 4). If ucj is non-zero, SCHEDULE-MIGRATE calculates the unallocated share of Ti with respect to Vj, i.e. usi,j (lines 9 to 13).

COMPUTE-EA-SCHEDULE

2 Let |Tj| denotes the set of jobs scheduled on Vj in T Sk Compute unused capacity of Vj: ucj. 6 Change shares of each non-migrating task Ti onVj: shi,j,k =shi,j,k/fcr 7 Set fopt ←fcr.

Analysis of the Algorithm

If there is enough time to complete the execution of the new task from the end of the current time period, the scheduling of the new task is postponed until the start of the next time period. Otherwise, the system suspends execution from the current time, recalculates new time segments taking into account the period of Ti, and then prepares a new schedule for the tasks.

An Illustrative Example

The Gantt chart representation of the schedule matrix SM1[7×4] (including migration tasks) for the time slotT S1, is shown in Figure 5.1b. Energy-Aware Scheduling: As we can observe from the schedule matrix SM1 (in Figure 5.1a), V1 and V2 do not have any spare capacity in the current time slot.

Figure 5.1: An example to illustrate our proposed algorithm HEALERS

Experimental Set Up and Results

Experimental Set Up

To create task sets with a specific UF, the randomly generated usage values are scaled accordingly. The following three metrics were used to compare the performance of our proposed algorithm to MaxMin-M.

Experimental Results

Benchmark Program Results
Synthetic Task Set Results

We can see from Figure 5.3a that there is a significant difference in the ARat values for HEALERS, HEART and MaxMin-M at U F = 0.9. We calculated the total number of context switches in the system over the entire duration of the simulation.

Figure 5.3: Result Comparison for Synthetic tasks

Summary

System Model

Each instance of Ti has an execution requirement ei to be completed within periodpi and also. At any given instant, rei andrpi indicates the remaining execution requirement and remaining period of the current instance of Ti.

Thermal Model

Γl Temperature limit or limit value at the system level shrirem Remaining fractions Ti andT Sk (En. 6.11). The steady-state temperature of a task represents the core temperature reached when the task is continuously running on core a.

Motivational Example

The TARTS Algorithm

Function TARTS()

At the beginning of the kth time slice T Sk, TARTS determines the magnitude |T Sk| of the time period that followed. Then the task list T L is sorted in non-increasing order of task share values.

Function Task Schedule()

Function Find Mapping()

However, M[j] can be delayed only if the remaining shares shrremM[j] of M[j] is less than the remaining time in. Reserve slot allocation: If no task in the ready queue meets the above condition, coreVj is left idle (thus allowing it to cool down) for the current time slot, i.e.

An Illustrative Example

With dynamic selection of frame sizes, TARTS is able to achieve a performance almost equivalent to TARTS with fixed frame size ( g = 1), as shown in the experimental results.

Analysis of the Algorithm

In the TARTS function, the initial calculation for the time slice size |G| (using Equation 6.6) and finding the slices for all tasks (using Equation 6.7) for the time slice takes O(n) time. Therefore, the time complexity of the TARTS() function can be considered as O(m) per time slice.

Experimental Set Up and Results

Experimental Set Up

For a given utilization factor (U), task weights wti are generated from a normal distribution with a clear mean µwt = 0.3 and standard deviation σwt = 0.2. To create task sets with a specific ATemp, the randomly generated stable temperature values are appropriately scaled.

Experimental Results

We can observe from Figure 6.3 that the ARat values decrease for both algorithms as the system utilization factor increases. We can observe from Figure 6.7 that increasing the ATemp values of the task sets results in decreasing the ARat values for both algorithms.

Figure 6.3: Effect of Utilization Factor on Synthetic Tasksets ( n = 80, m = 4, ATemp = 80 ◦ C and Γ l = 80 ◦ C)

Summary

Therefore, this thesis delved into the design of various energy- and temperature-aware scheduling strategies for such real-time multicore platforms. Over the years, the industry has seen a significant shift in the nature of processing platforms in real-time embedded systems.

Future Works

In addition, the interprocessor message transfer time can vary significantly depending on the type and structure of the interconnect network between processor elements. To meet the requirements of the aforementioned distributed systems, the research presented here needs to be adapted accordingly.

Pictorial representation of the scheduling framework

Temporal Characteristics of real-time task T i

Motivational Example

For the first subset, the ratio of the summation of weights of the tasks with respect to the system's use. We can observe from Figure 4.5a that the acceptance ratio for both the algorithms decreases with increase in utilization factor of the system.

Effect of varying utilization factor on power consumption (n =

Effect of varying number of cores on power consumption (n =

Effect of Skewness on Normalized Power Consumption

Result Comparison: EAFBFS vs EA-DPFair

Task Allocation for Example

Deadline Partitioning & Cluster Formation for Example

Task Schedule for Example

COST: Experimental Results

Example

Experimental Results

An example to illustrate our proposed algorithm HEALERS