In the third contribution of the thesis, we discuss the reliability-aware scheduling of tasks with hard deadlines in the cloud environment. Cjvm Maximum number of VMs that a PM can host fj Probability of machine failure Mj.
Overview of Cloud Computing
- Characteristics
- Service Models
- Cloud System Architecture
- Examples of Commercial Cloud Platforms
- Scheduling in Cloud Environment
SaaS: This is the highest layer in the cloud service group and is responsible for providing services to consumers. Cloud computing has several features that provide benefits to the user as well as the cloud service provider.
Deployment Models of Cloud
Virtualization Technology
Resource Allocation in Cloud Environment
Using VM migration in an efficient way maximizes resource utilization, improves performance, and reduces the power consumption of the cloud system [32]. The power consumption of the system can be reduced by reducing the number of idle systems and increasing resource utilization.
Brief Literature Review
- Related Research on Task Scheduling in Cloud Computing
- Related Research on Virtual Machine Allocation
- Related Research on Interference Aware Scheduling
- Related Research on Constraint Aware Scheduling
- Related Research on Profit Maximization Based Scheduling
- Related Research on Reliability Aware Scheduling in Cloud Sys-
In the same way, resource utilization is considered as the total utilized capacity of cloud resources. Profit is one of the most important factors in the cloud economy and all cloud service providers want to maximize their profit.
Motivation
93] proposed a resource allocation algorithm that introduces an analytical model to analyze system reliability. However, the replication-based approach is widely used to ensure system reliability.
Objectives
We want to incorporate a resource prediction model into the scheduler design to predict the number of PMs required for a job package based on the resource usage pattern of the previous job package. The goal of the scheduler is to optimize resource utilization by deploying fewer host servers.
Contributions
- Interference Aware Scheduling
- Constraint Aware Scheduling
- Reliability Aware Scheduling
- Reliability Ensured Scheduling
In this contribution, we consider a set of real-time tasks to be scheduled on a virtualized cloud environment, where all the VMs of the cloud are homogeneous. Here we schedule the real-time tasks with priority or weight (high priority safety critical and low priority mission critical tasks) considering the failure of the machines.
Thesis organization
Static Allocation
Static allocation strategies spend more time determining a mapping since it is performed in an offline mode, using estimated values of the application parameters. Many researchers worked on static allocation strategies in the cloud environment, and some of the works are discussed here.
Dynamic Allocation
Adjusting these constraints results in the effectiveness of their proposed scheduling algorithm in the cloud environment. Their proposed approach takes into account the network traffic load over the network connections to minimize the problems associated with overprovisioning.
Optimization Objectives of Resource Allocation
Resource Consolidation
The problem was transformed into a stochastic garbage packing (SBP) problem and an online packing algorithm was proposed whereby the number of servers required is within the optimum for. Their model is used to improve the consolidation of the VMs to maximize resource utilization while meeting the application performance requirements.
Energy Consumption
142] proposed makespan-conservative energy reduction along with simple energy-aware scheduling to find a trade-off between the makespan time and energy consumption, where they reduce both makespan time and energy consumption of precedence constraint graph on heterogeneous multiprocessor systems supporting DVFS. 196] To optimize lifetime reliability of application and energy consumption with guarantees of QoS constraint.
User Cost
The goal in this case is to minimize the operating costs and that would maximize the overall profit of the system with better utility profit from the service provider's perspective. From the user's perspective, minimizing implementation costs and meeting budget constraints are required to increase service demand.
Evolutionary Approaches for Resource Allocation
On-line Resource Allocation
The resource allocation problem considered here can be formulated as a vector bin packing problem (VBPP), and some algorithms are available to solve the VBPP. The vector binning problem (VBPP) is one variant of the binning problem proposed by Garey [105].
Summary
The most promising solution to address many user requirements with a cloud system is virtualization technology [45]. To improve the utilization of cloud resources, the service provider allows multiple VMs to be hosted on a physical computer, resulting in performance degradation due to interference between concurrently running applications.
Related Research Work
Problem Formulation
Task Environment
Machine Environment
For example, if the value λ = 0.9, at that point the total resource used by different VMs running on a physical machine should not exceed 90% of the entire accessible resource of the physical machine. In this work we had taken the same value of λ for all types of sources to simplify the model.
Optimization Goal
The resource request of VMs deployed to run concurrently on a host computer must not exceed the upper threshold of available resources with that host. Here, λ represents an upper threshold, which means that the total resource consumption cannot exceed this limit and has the range 0 < λ 6 1.
Methodology
System Architecture
Interference Prediction in Virtualized Cloud Environment
We choose to use the linear regression model to predict the application runtime because the time to build the linear regression model at runtime is less compared to the nonlinear model. The prediction model coefficient also changes as more tasks are considered for the model building process at runtime.
Prediction of Resource Usage of Online Tasks
As in all cases, the predicted value of resource consumption is greater than the observed value; so this helps our disruption-aware scheduling approach to deploy slightly higher than the required number of physical machines to reduce the number of deadlocks. Since a VM locks the entire core once it is initialized until it is destroyed, the number of active machines required for deployment can be calculated using Eq.
Interference Aware Scheduling of Tasks with Admission Control 62
For other cases, we had taken the overlapping time of different tasks to consider the effect of interference. For the entire duration of the expected execution time of the task (from start time s to end time f of the task), the expected resource utilization should be below the threshold utilization of each resource on the machine.
Simulation Setup
Using DES and FUSD, we predict the number of physical machines that will be active in a given time window. Based on the forecast, we activate the same number of physical machines that will be deployed in that specific time window (5 min.).
Performance of the Scheduler for Google Cluster Data
The number of lost tasks for IARPS is less compared to other approaches because it schedules the task considering the effect of interference of the co-allocated tasks, deadline and access control mechanism. The dynamic nature of cloud environment and infinite resource availability would not help us to get 100% TGR and PGR.
Performance of the Scheduler with Different Co-allocated VMs 70
But the overhead associated with virtualization is about 15%, which reduces the profit for the cloud service provider. Missing the deadline for the task increases the penalty, thus reducing the profit for the service provider.
Related Research Work
These schedulers ignore hard and soft constraints for task placement in heterogeneous data centers. All the well-known schedulers discussed above do not consider both the constraint and the task deadline.
Problem Formulation
- Machine Environment
- Task Environment
- Revenue Model
- Cost Model
- Penalty Model
- Problem Statement
The income (revenuei) collected for the performance of task Ti can be defined as;. Penalty cap refers to being the maximum value of the penalty that the service provider repays for violating the SLA.
System Architecture
Solution for CAPM
- Ordering of Tasks
- Task Mapping without Allowing Deadline Miss
- Task Mapping with Allowing Deadline Miss
- Simulated Annealing for CAPM
So here our objective is to map all the tasks of the task set (T G) to the machine set (M G) so that the total profit is maximized. Here, we rank the tasks based on the descending order of their expected payoff (Eq. 4.7 without considering the penalty), which is the difference between the task's revenue and the task's execution cost.
Experimental Setup and Results
- Simulation Setup
- Performance of Different Task Ordering
- Performance of HOM-CAPM with or without Allowing to be
- Performance of Different Scheduling Approaches
- Impact of Threshold
- Performance of SA-CAPM and HOM-CAPM
Number of tasks HOM-CAPM with allowing tasks to be missed HOM-CAPM without allowing tasks to be missed. The profit obtained for the case of HOM-CAPM by missable tasks varies 3%.
Summary
There are many causes of failure in a cloud environment, which affect the reliability of the cloud service [80]. Here we want to schedule real-time tasks with priority or weight (high-priority safety-critical tasks and low-priority critical tasks) considering machine failure.
Related Research Work
In general, the reliability of a parallel application is represented as the product of the reliability values of all tasks [262]. 35] developed an approach to reliable resource allocation that tries to increase reliability while minimizing costs.
Problem Statement
Task Environment
Machine Environment
Reliability Model
Optimization Goal
Study of Task Scheduling Approaches on Reliable Machines
Scheduling of Tasks with Common Deadline on Machines with
Machines with the same failure rate can be represented as Mj(γj, fj = f), where the failure rate of all the machines is f. The occurring case of the same problem can be solved by pseudo-polynomially solvable 0-1 knapsack problem [160].
Scheduling of Equal Execution Time Tasks on Unreliable Machine112
Either a task is assigned to the most reliable machine or the least reliable machine, depending on the weight of the task. This approach assigns the current task to the most reliable machine (the machine with the lowest failure rate) that is available at the time the scheduling decision for the task is made.
Task Scheduling on Unreliable Machine with Repetition and Replication118
- Parameter Setup
- Result of Different Task Ordering and Task Mapping
- Result for Different Task Mapping for Different Execution Time
- Result for Different Weight Distributions
- Result for Real World Traces
- Result for Task Repetition and Replication
Based on the value of α0 and β0, this approach allows up to one iteration or one iteration of the task to improve performance. Here we report the performance of different heuristics based on the weight assigned to the tasks.
Summary
VM replication [242] is used to deploy redundant copies of VMs to meet application reliability requirements. And propose a heuristic for efficient replication on machines with the same failure rate (called REFR) using the analyzed minimum number of replicas to ensure the reliability requirement of delay-sensitive independent jobs in the target environment.
Related Research Work
We also calculate and use the job subreliability requirement for the jobs and other parameters to deploy the minimum number of machines, for efficiently scheduling jobs to meet the deadline and reliability requirement. We defined and used two performance comparison metrics, namely average VM per task (AVT) and reliability guarantee ratio (RGR) to compare the performance of the proposed approaches and other state-of-the-art approaches.
Problem Formulation
- Application Environment
- Machine Environment
- Reliability Model
- Optimization Goal
All tasks of a task can be executed in parallel through independent VMs (i.e. task Ji has a number of tasks) without any communication between them. Suppose that a machine Mj with failure probability fj and a task Til with execution time ei are planned on that machine, then the reliability of the task Til can be defined as.
State-of-the-art Approaches
K-redundancy Method (KR)
The term ki represents the number of additional VMs deployed for the job. Jito meets its reliability requirements. By combining the two constraint equations (Eq. 6.7 and Eq. 6.9), the minimum number of hosting servers mKR using the k-redundancy method can be expressed as follows.
First-Fit Decrease (FFD)
As the value of k increases, the reliability requirement is met, but the number of PMs required to host the VMs will be larger. For example, mF FD = 4,(M1, M2, M3, M4) is the minimum number of PMs required for the feasible job assignment.
Scheduling on Machines with Equal Failure Rate
Since the number of PM (m) must be an integer, so is the lower bound of m. R(Til) = 1−(1−e−e−f.ei)k (6.15) At least a vi number of copies must be scheduled for each task, and more copies of the tasks may be needed to meet the reliability requirement.
Scheduling on Machines with Different Failure Rates
Task Prioritization
Longer runtime (HDJ) jobs need more reliable machine scheduling, regardless of the reliability requirement and the number of tasks per job. Medium-interval runtime (MDJ) jobs with high reliability requirements and a larger number of job tasks need more reliable machine scheduling.
Host Machine Selection
So any job with more number of tasks should be given priority to be scheduled for high reliable machines as the reliability requirement is high for each task. Tivi, the subreliability requirement is calculated based on the actual allocation of its previous tasks [261].
Overall Approach
Considering all the tasks to satisfy their reliability requirement and finish their execution before the deadline is a challenging job. The allocation policy looks at the machine with the highest failure rate to meet the task's reliability requirement from the set of active machines where it meets its deadline requirement.
Experimental Setup and Results
Simulation Environment, Parameter Setup and Evaluation Cri-
As capacity increases, the number of PMs required to allocate the jobs decreases. As the value of k = 1 for KR and FFD, the number of jobs missing their reliability requirement decreases.
System architecture
Machine and task grouping
Example of problem formulation for simulated annealing
Arrival of different types of tasks in Google traces
Effect of task ordering on the profit
Profit considering HOM-CAPM approaches with or without allowing
Performance of different scheduling approaches
Impact of ρ on system performance with different loads
Performance of HOM-CAPM against SA-CAPM
Performance of HOM-CAPM against SA-CAPM in terms of running
Performance of various approaches on reliable machines
Scheduling methodology
Reliability distribution of tasks
Failure detection example
Reliability distribution of tasks
Reliability model
Results for the machines with equal failure rate
Results for the machines with different failure rate
Average VMs per task (AVT)
Reliability guarantee ratio (RGR)
Results by varying the reliability requirement level
Results by varying the number of VMs per host (v)
Google trace results