Resource Allocation in Streaming Environments

The proliferation of the Internet and sensor networks has fueled the development of applications that process, analyze, and respond to continuous streams of data in a near-real-time manner. The goal is not to reduce the scope, as in the conventional multiprocessor scheduling problem, but rather to optimize the net economic value of the calculations.

Thesis Structure

The study of resource allocation and scheduling has moved from uniprocessor systems to multiprocessor systems, from offline problems to online problems, from independent tasks to mutual tasks, and from one-time tasks to repetitive tasks. Some resource allocation methods use conventional computer science approaches, while others use microeconomic theory and use market or auction mechanisms.

Scheduling Heuristics

It randomly chooses a task from the unassigned set of tasks and assigns it to the next available machine. In genetic algorithms [14,19], each chromosome is a vector that encodes a particular way of mapping the sequence of tasks to machines.

Figure 2.1: An example showing poor RM scheduling performance

State-of-the-Art Resource Management Systems

MSHN

Moreover, the system works in an online environment, so the amount of resource allocated to each task changes dynamically depending on the machine resource usage by other tasks and their bid amounts. In the following chapters, we focus on the resource allocation problem for this new class of tasks.

Computational Fabric

In this chapter, we first describe the components of the problem domain: computer structure and streaming applications.

Streaming Application

The vertices of the graph represent tasks, and a directed edge represents the flow of information from one task to another. We enable communication between applications, which means that the output of one calculation can be the input of another calculation. The persistence interval [Tstarti,Tendi], where Tstarti is the instant of time when the application starts receiving inputs and becomes active, and Tendi is the instant of time when it stops receiving inputs and becomes inactive.

A utility function U (r ) that maps the amount of a resource r to the value realized by processing inputs with that guaranteed amount of the resource.

Proposed Solution Space: Resource Reservation Systems

However, calculations on these flows often have a substantial status; For example, a calculation in a trading application keeps track of the status of the trade. Tasks T1 and T2 are assigned to machine M1, and they each receive a certain percentage of M1's resources reserved for their calculations; TaskT3 is assigned to machineM3 and gets 100% of M3's resources; Tasks T4 and T5 are assigned to machineM2 and each receive a certain percentage of M2's resources. Our focus is on the first step: assigning tasks to machines and making one reservation for each streaming application.

We investigate a simpler version of the problem, where each streaming application consists of a single processing unit instead of a graph of interacting processing units, and all streaming applications have the same existence interval. In our system, we assume that machine capacities are large compared to the resource demand of an individual streaming application. The objective is to determine the resource allocation, ri (amount of resource allocated for the streamiduring existence interval) and xij (a value of zero or one indicating whether the resource of machine j is allocated to the stream), that maximizes the total utility:.

Hardness Proof

Therefore, we can convert any instance of the MKP into an instance of our problem. To avoid requiring each consumer to discover his utility function to a centralized solver, we design a distributed method to solve the problem using the market mechanism to solve his dual-price problem. The supplier and consumers interact with each other and participate in the market price adjustment process.

The supplier reacts to the demand and updates the market price according to the excess demand. The intersection of the aggregate demand and supply curves corresponds to the market equilibrium price. Consumer demands for that price, r1∗andr2∗, are the optimal solution to the resource allocation problem.

Figure 4.1: Demand, supply and optimal allocations

Single Market Resource Reservation System

In this step, broadcast applications are assigned to machines based on the optimal distribution found in the previous step. Assign group tasks {RT} to machines that use the most appropriate heuristics for reducing the value, regardless of the fact that the sizes of streaming applications are larger than the remaining capacity on the machines. Ui(ri) be the total utility achieved by our solution and Uopt be the total utility achieved by the optimal allocation; then U≥ 1.

Thus, the total utility obtained at the first step, U, serves as the upper bound of the optimal solution: U≥Uopt. Therefore, according to (2a), the total utility realized by the tasks assigned to the machines is more than half of the optimal one. Therefore, the total utility achieved by our heuristic is at least half of the optimal one.

Multiple Market Resource Reservation System

This means that if the original assignment (ri∗ for the tasks assigned during (2a) and 0 for the tasks assigned during (2b)) is optimal, then it is preserved, otherwise the reallocation achieves a higher total utility. Initially, each streaming app is assigned to one market (randomly or based on location). We enforce restrictions on when a streaming application can move from one machine to another: the move is allowed if and only if it increases the sum of the total utilities of the source and destination machines (the total utility of their pair).

Each step in this process involves a pair of machines attempting to move a streaming application from one to the other and comparing the pair-total utilities before and after this attempt. This is an iterative monotonic process that increases (or leaves constant) the total utility at each step and terminates when no streaming applications can be moved from one machine to another to increase the total utility. While there are at least a few streams in the system whose locations can be swapped to increase overall utility:.

Figure 4.4: System view of the multiple-market method

Summary

In the experiments, we assume that all streaming applications have utility functions of the form Ui(ri)=bi. We choose this particular class of functions because it captures/approximates many concave functions commonly used as utility functions [25]. Base-bound U: the total utility achieved by the naive balanced stream, which assigns flows to balance the number of flows per machine.

In most of the figures below, we compare the performance of our heuristic and the balanced-streams heuristic by plotting the normalized performance gap (NPG) for each. The NPG of a heuristic is calculated as U−UU, where U is the total utility obtained by the heuristic and U is the upper bound on the performance. NPG is thus a number between 0 and 1; the smaller the NPG, the better the performance.

A Motivating Example

Heavy-tail Distribution

The stream processing applications follow a heavy distribution if, under the optimal allocation on the virtual machine, many flows have small allocations, a few flows have large allocations, and very few flows in between. This division is typical of flows on the Internet and is known as the “elephants and mice” division. To test the performance of our heuristics under this distribution, we conduct the following experiments.

Suppose there are 2 machines in the system, each with unit capacity, and N streaming applications each with a utility function Ui(ri)=bir. To compare the performance of the heuristics, we calculate the average performance of the balanced flow heuristics. The fully decentralized multi-market heuristic performs as well as the single-market heuristic.

Uniform Distribution

The performance of our two market-based heuristics quickly reaches optimal performance as the number of streaming applications increases. The performance gap for the balanced stream heuristic decreases as the number of streams increases, but does not converge to optimal (for up to 500 streams in the system), remaining 1% to 10% below optimal. For any fixed number of streaming applications in the system, the performance gap for the naive balanced streams heuristic increases as the number of machines increases.

Initially, for each fixed number of streaming applications in the system, the performance gaps for the two market-based heuristics increase as the number of machines increases;. The performance gap decreases as the number of streams increases, eventually converging to zero. The performance gap decreases as the number of streams increases and eventually converges to zero.

When there are a large number of currents in the system, the complementary effect has more influence and involves the same effect. It is clear from the empirical result that the time complexity of the multi-market heuristic is approximately quadratic in terms of the number of flows.

Figure 5.4: The effect of number of machines on heuristic performance

Summary

This makes it a good candidate for use in large distributed systems and grid-like environments. To determine the complexity of this heuristic, we conducted an empirical study by measuring the time it takes to complete in simulation. The results show that this heuristic's running time is independent of the number of machines in the system, and is approximately quadratic in the number of streaming applications.

Thus, it scales well with an increasing number of flows and machines, making it a practical and attractive alternative to the single market heuristic.

Applications

Enterprise Computing As sensors and RFID tags become more widely deployed, and data streams from commodity exchanges, telecom services and other sources become more widely and freely available, new application areas for business computing systems are emerging. It is becoming increasingly important for enterprises to build applications that use this data to detect and respond to potential threats and opportunities (“critical states”). Streaming applications analyze events from many different sources and in many different forms (numerical, textual, and visual) to determine when a critical condition exists and what the appropriate response should be.

They allow enterprise computer systems to act as "information factories"; just as industrial factories create value by transforming raw materials into finished products, information factories create value by transforming raw events into structured data. This transformation can require significant computational resources, so it is important to both design efficient and reliable heuristics and to make the best possible use of the available computing infrastructure when executing these heuristics.

Future Directions

A comparison of eleven static heuristics for mapping a class of independent tasks on heterogeneous distributed computing systems.Journal of Parallel and Distributed Computing. Timing Analysis for Fixed-Priority Scheduling of Hard Real-Time Systems.IEEE Transaction of Software Engineering J. On the Complexity of Fixed-Priority Scheduling of Periodic Real-Time Tasks.Performance Evaluation February 1982.