The work in this thesis addresses the problems of cost-aware capacity provisioning and load balancing in fault-tolerant GDCs (to mask single-site failures). In this thesis, we also address another important problem, cost-aware load balancing in large-scale fault-tolerant GDCs.
Introduction
Motivation of the Research Work
In this thesis we consider two models for using renewable energy for green data center design. Each data center is characterized by spatiotemporal variation in electricity price, renewable energy availability and failure rate.
Contributions of the Thesis
- Cost-aware Provisioning of Spare Capacity for Fault- tolerant GDCstolerant GDCs
- Capacity Planning in Fault-tolerant GDCs Collocated with Renewable Energy Sourceswith Renewable Energy Sources
- Optimizing Energy Cost in Fault-tolerant GDCs Sat- isfying Green Energy Boundisfying Green Energy Bound
- Game-theoretic Model for Load Balancing in Fault- tolerant GDCstolerant GDCs
- Distributed Failure Detection and Efficient Load Bal- ancing in Fault-tolerant GDCsancing in Fault-tolerant GDCs
- Organisation of the Thesis
Most of the work in the literature on load balancing in distributed data centers focuses on centrally solving the proposed solution. In Chapter 7, we propose a data center-initiated distributed load balancing algorithm to ensure QoS after an outage while minimizing operating costs.
Architecture of a GDC
We discuss the popular model to ensure high availability in GDCs and the efforts to minimize operating costs. We also show how demand multiplexing can be leveraged to minimize the operating costs of GDCs.
High Availability Requirement
We consider this approach to provide spare capacity across the sites to mask the unavailability of the data center at any site. We have assumed that the failure of the data center at a site is an independent process, that is, simultaneous failure of the data center at more than one location is rare.
Energy Cost Components and their Dynamics
- Brown Energy Pricing
- Renewable Energy Sources and Cost Model
- Demand Multiplexing for Improving Utilization
- Geographical Load Balancing
On-site renewable energy generation: In this model, renewable energy sources such as solar panels and wind turbines are connected to the data center as shown in the figure. Renewable Energy Certificate (REC): Also known as a green certificate, it is a market-based instrument to promote renewable energy and facilitate the fulfillment of renewable energy purchase obligations (RPO).
Related Work
- Data Center Placement and Capacity Provisioning
- Geo-Distributed Load Balancing Approaches
The authors of [18] proposed an optimization framework for data center placement/capacity provisioning and demand flow control/resource allocation in a common way. There are several studies that have taken into account additional factors such as powering the data center using renewable energy sources (at least partially).
Summary
Therefore, the work in this thesis proposes optimization models and algorithms for the cost-aware design of fault-tolerant GDCs and their load balancing. In the next chapter, we discuss the cost-aware spare capacity provisioning problem and the optimization model for the same.
Introduction
- Motivation
However, multi-site replication involves large replication costs as the data center operators are typically charged for the number of bytes transferred [63] and/or the bandwidth costs between the replication sites [16]. Although the basic problem was similar, we used the minimization of the TCO as an objective, apart from dealing with the replication costs. Therefore, we used the minimization of the TCO as an objective in the provision of spare capacity while considering different models for data replication.
MILP Model Formulation
- Assumptions
- System Model
- Cost Models
- CACP Model
- Example for Working of the CACP Model
- Complexity Analysis
In this case, the data is replicated from a primary data center to the nearest data center. Power consumption: Let θhs denote the electricity price at data center location s in hour h of the day. The number of servers allocated across all data center locations is the same with the MS model.
Numerical Results
- System Parameters
- Results
We also studied how the choice of data center locations affects the TCO with the CACP model. Therefore, the CACP model (without replication costs) lowers the TCO by about 20% compared to the model with replication. In the plot, CACP-w/ft and CACP-w/o ft indicate the TCO achieved using the CACP model with and without failure, respectively.
Conclusion
In this chapter we have assumed that a data center at a site has completely failed and also that the data centers are powered only by brown energy sources. In the next chapter, we look at a common failure model (partial and complete failure at a site) and data centers co-located with renewable energy sources. For such GDCs, we aim to determine the optimal server distribution that minimizes the total cost while maximizing the use of renewable energy.
Introduction
An interesting aspect of capacity provision in green data centers is that the cost of powering servers is highly dependent on spatio-temporal variation in electricity price, availability of green energy, and user demand (for daily applications). Although renewable energy production is highly intermittent, it is found to become more predictable with an increase in the number of renewable energy sources connected to the grid across multiple locations [74]. This is due to the effect of geographical diversity and the law of large numbers, i.e. the large number of geographically distributed renewable energy sources installed makes the availability of renewable energy more predictable with a certain degree of accuracy [27, 28].
MILP Framework
- System Setup and Assumptions
- Definitions and Cost Model
- Optimization Problem Formulation
Ghs is the actual renewable energy generated (both wind and solar energy) at a data center s during hourh. For each data center s, hourh and failed data center f, we denote GUsf has the renewable energy used, GSsf h as the renewable energy sold (net metering) and Zsf h as the energy supplied to the battery (when Zsf h > 0) or consumed from the battery (when Zsf h < 0). The battery level at any data centers, during hour h, after the fifth data center has failed, is given by .
Numerical Results
- Experimental Setup
- Results
We see that the TCO for the GCACP model decreases as the number of data centers increases due to demand multiplexing and variations in electricity prices and renewable energy availability. 4.2, there is a slight increase in costs as the number of data centers with an MS model increases. We can conclude that the GCACP model is beneficial when not all data centers are operating at peak usage.
Conclusion
However, we can see that even if a data center fails completely, the GCACP solution yields about 19% and 46% lower cost relative to the MS and CDN models, respectively. We conclude that with partial failure, the GCACP model is advantageous due to the lower reserve capacity requirement and the lower brown energy consumption (since the available renewable energy is constant). Considering the fact that data center operators try to gradually increase their use of renewable energy, we model the problem of providing spare capacity to meet a target use of green energy at a minimum cost, when the data center data is enabled by a combination of brown and green. sources of energy.
Introduction
Mostly we observe that the frequency of partial data center failures is very high, while complete data center failure is rare (maybe once every two years) [80]. We provide spare capacity in data centers so that demand is met even after a data center site failure (either partial or total) while reducing TCO. Our model solution provides optimal distribution of servers across sites and demand distribution that minimizes TCO.
Optimization Model
- System Architecture
- System Model
- Cost Model
Each data center consolidates the workload to keep power consumption proportional to the workload. Failure in more than one data center at the same time is avoided by choosing locations. Let λaf hsu denote the number of requests mapped from client region u to data center s (s ∈ S), at time h for an application type a.
Numerical Results
- TCO Comparison
- Impact of Failure Percentage
- Impact of Demand
- Impact of Latency
- Sensitivity Analysis
In this experiment, we investigated the impact of varying the error rate on TCO while meeting the green usage limit. However, TCO with the GACED model increases with demand due to the green energy consumption constraint. In this experiment, we investigated the effect of relaxing the bound latency on TCO.
Conclusion
The proposed model optimally plans the demand taking into account the availability of green energy and its price change to minimize TCO. We conclude that with an appropriate model, green energy integration lowers the design cost of fault-tolerant GDCs with reduced carbon footprints. Therefore, we expect our work to help data center operators make informed capacity planning decisions in the presence of the green energy usage objective and variation in electricity prices, demand, and failure rates.
Introduction
Load balancing and resource management in distributed systems have been addressed by non-cooperative game theory in previous works such as [81] and [82]. The problem is formulated as a non-cooperative game between users who try to minimize the expected response time of their own tasks. For the first time, we model load balancing in GDCs as a non-cooperative game between front-end brokers.
System Model
Data center energy consumption includes three components: power consumed by idle servers, given by ms(Pidle); power consumed by servers operating at utilization η given by ms(Ppeak − Pidle)η;. 6.7) Given the load λsu and the price of electricity per unit ρs, the cost arising from the consumed energy in the data center (also called operating cost in this chapter) is given by. The cost incurred due to the delay in the request λsu of the client region u in the data center s, denoted by ∆su, is given by .
Load Balancing as a Non-cooperative Game
The Nash equilibrium of the above-mentioned load-balancing game is a load-balancing strategy λ, such that for each front-end proxy u. The following statement defines player's best answer strategy, i.e. the solution to the best answer. Feasibility: Among the three constraints of the optimization framework, we see that the stability (Eq. 6.25) is always satisfied by the Nash equilibrium solution due to Eq. 6.21) and the fact that the total computing capacity of the data center is greater than the cumulative customer demand.
A Distributed Load Balancing Algorithm
Numerical Results
- System Setup
- Results
- Impact of Demand
- Impact of β
- Client Latency
- Convergence of NCG Algorithm
Demand from different customer locations is proportional to the number of Internet users from that region [67]. We vary the number of data centers in the system from 6 to 10 and investigate their impact on costs and fairness. In order to determine the number of iterations required for NCG convergence, we consider two scenarios.
Summary
Introduction
A cost-aware load balancing strategy operating in the presence of failures should select a new data center for request rerouting considering the renewable energy consumption requirements, electricity costs, and QoS requirements. For a scalable load balancing system, we propose a data center-initiated, distributed load balancing strategy to satisfy post-failover QoS requirements while minimizing the operating cost. We model the problem of load balancing in fault-tolerant data centers using linear programming (LP) to optimize both the cost of energy consumption and to minimize the client latency even after failure.
Problem Formulation
- System Model
- Optimization Model
Question: We define λufs as the number of requests mapped from customer region u to data centers. We assume that among ms servers in a data center, mfs is activated after the fifth data center fails. Φufsi indicates the workload of customer region u in data centers served by consuming energy type i, after the fifth data center failed.
Distributed Load Balancing Algorithms
- Shift Workload Algorithm
- Request Re-routing Algorithm
- Time Complexity Analysis
Then, the workload to be migrated to other data centers is estimated using Eq. Finally, the data centers that could handle the load of a failed data center are determined based on the cost of power consumption. All demand previously served by a failed data center is allocated to servers in the remaining data centers.
Numerical Results
- Experimental Setup
- Results
The performance metric considered is the normalized average energy cost relative to the peak energy cost in both solutions for a given scenario. For each hour, we distribute the demand among different customer locations in proportion to the number of Internet users at each location [67]. It can be seen that as the number of data centers increases, energy costs decrease due to more options available to take advantage of demand multiplexing and power price variation.
Conclusion
Wang, “Limiting the electricity costs of cloud-scale data centers with impact on power markets,” in Proc. Neely, “Data Center Power Reduction: A Dual Time Scale Approach for Delay Tolerant Workloads,” in Proc. Marwah, “Minimizing data center SLA violations and power consumption through hybrid resource provisioning,” in Proc.