Load Balancing in Conservative Simulatio

(1)

LOAD BALANCING IN CONSERVATIVE SIMULATION

Jouni Ikonen and Jari Porras

Lappeenranta University of Technology P.O. Box 20, FIN-53851

Lappeenranta, Finland { jouni.ikonen, jari.porras }@lut.fi

KEYWORDS

Load balancing, distributed simulation, cellular network simulation, conservative simulation

ABSTRACT

In this paper the need for load balancing in distributed conservative simulation is studied. The performance of a GSM simulator under different load distributions has been presented. These results indicate that special attention has to be given to load distribution. 3 to 7x slow-down was observed in a simple parallel simulation with unwise problem distribution. An algorithm for dynamic load balancing is proposed. The algorithm helps the user in such a way that he is not required to understand the internals of the application, as the algorithm performs the problem distribution automatically.

INTRODUCTION

The meaning of distributed simulation has increased over the years. Large technical systems are simulated whenever analytical models are too complex. Serial simulations can be insufficient because of time requirements. As a simulation is often required to inspect such systems under tight time constraints, parallel simulation methods must be used. The development of new processor generations is very fast and more powerful processors come frequently to the market. There still exist applications which require even more processing power to be completed in a reasonable time. A common example of such an application is the simulation of a mobile communication network [Hut99a][Hut99b]. Serial simulation of the emerging mobile networks can easily take weeks. Long simulation time can easily slow down the development, and all solutions for speeding up the development are welcomed by the industry. If better processors are not available, the choices are very limited. In some cases the

algorithms can be improved, but besides that, some kind of a parallel approach is needed.

Many problems lie in the way of a successful system; how to divide the problem into multiple processors, how to handle communication between processors, what kind of hardware is required, and how to balance calculation between processors. Depending of the application, applying parallel methods can be an easy or a very difficult task. If the application has been written without a thought to parallelism a good speedup might require the whole application to be redesigned and rewritten. Some problems are naturally parallel, like the ray tracing of a radio signal [Fri95] [Hut98] [Ina94] [Sal94] [Tut96] where the tracing of beams can easily be divided between processors. Load balancing can be achieved by dividing the work into small chunks, which are given to different processors. When a processor is idle it can request a new task to be executed. As some processors might be slower or they have more to do they will not request new tasks so often. This approach has been used e.g. in [Hut98].

(2)

In this paper the problem of dividing logical parts of a problem between workstations in an efficient way is tackled. An attempt is made to remove the need for users to divide logical processes between workstations efficiently. This requires the system to analyze the simulation problem and try to locate problem parts so that communication between workstations is minimized, thus reducing simulation time. This can be done by moving problem parts from one workstation to another. These problems are considered later in this paper. Load balancing is used in many applications; e.g. [Lu98] presents a dynamic load distribution algorithm for relocating application tasks. Also usage of idle workstations for parallel computation has been considered in many papers, e.g. [Arch97], [Arp95] and [Du97]. A lot of work done in load balancing in distributed simulation is presented in [Fuj92]. Most of the distributed load balancing is done on optimistic simulation models. In this work, load balancing in conservative distributed simulation is considered.

DISTRIBUTED SIMULATION OF A GSM

The simulation of a problem requires the logical parts of the problem to be identified. A GSM network simulation is used as an example application. It was chosen because of its requirement for computing power and the demand from industry to reduce the time required. The GSM system is composed of mobile equipment, base station subsystem and network subsystem. Each base transceiver station (later referred to as transmission tower or base station) has a set of frequencies that it can use. These frequencies are divided into time slots. A call normally reserves one time slot from some frequency. This way multiple users can concurrently use the same frequency. If the radio interface of the system is simulated, the interest of the simulator could be the radio interference on a radio channel caused by other users, devices, etc. Other interests could be to create a signal quality estimate for some area or estimate the call capacity need. Figure 1 presents a four cell GSM system. The users can move around with their cellular phones, and a handover must be performed between cells if a user moves from a cell to another during a call. If the cell where the user is moving, does not have capacity to handle the call, the call must be terminated. Calls terminated by the network are very irritating for the user and must be minimized to keep customers happy.

Figure 1. A four cell GSM system

For distributed simulation, the GSM problem can be divided in many ways. A natural way is to divide a transmission tower and a mobile station into separate logical entities. If there are a lot of mobiles and the user is not interested to simulate the operation of a particular cellular phone model, all mobile phones in the area of a transmission tower could be mapped into one process. If different approaches are needed, the problem could be divided into parts by the frequencies. In advanced systems a user can reserve multiple time slots to gain larger bandwidth for digital data transmission. Possible users for the larger

bandwidths are file transfers and

videoconferencing. Often there are multiple ways in which a problem could be divided, and the division should be done by looking at the requirements of the simulation.

(3)

relevant and can be handled by some communication protocol. It is, however, required that messages can not pass each other in the network.

LP2

LP1 LP4

LP3

Figure 2. Four logical processes and channels

between them

In Diworse

[

Por97

]

the communication of an LP is taken care of by a Scheduler, which knows where each LP is located. The structure of a simulation process is illustrated in Figure 3. The scheduler gives execution time slots for each LP in that simulation process. The simulation process is later referred to as an Agent. Each workstation participating in a simulation has an Agent. This is illustrated in Figure 4.

If an LP is transferred, all schedulers have to update their location registers. The problem of transferring an LP is unfortunately not this easy. An important part of load balancing is the logic used in balancing decisions. This requires Schedulers to acquire information about the load in other workstations. How the load is measured and how it is acquired is an important question. The load measurement and acquirement cost must be reasonable. Also a protocol for transferring LPs is required. It is not very useful to use load balancing if the workstations notice that one workstation has free time and all others transfer part of their work into that workstation and overwhelm it. A negotiation about, who can transfer and how many LPs, is required. In some cases it might be feasible that LPs are transferred in clusters. This is the case with the GSM application, where mobiles always communicate with the logical process of the transmission tower of their current area.

LP

LP LP

Process

Scheduler

Figure 3. Simulation processes run in each

workstation

Before transferring an LP its state must be frozen and a snapshot must be taken from it. During the transfer of an LP other LPs must hold their messages to that LP and wait until the transfer is complete or one of the schedulers participating in the transfer must hold messages and give them to the LP when the transfer is complete. The freezing of an LP will eventually stop the whole simulation, as other LPs will need a “safe to proceed” signal from the frozen LP at some point. It is clear that the simulation must not stop too often and the frequency of the load balancing operation must be limited, so that the gain from better load division is not lost in the way. How often load balancing can be performed depends on the simulation problem. Another problem is how to decide which LP to transfer. The decision can be based on the connectivity of LPs, message volume transferred between LPs, etc. It can be assumed that the number of messages traveling via the network must be minimized.

Scheduler

Manager

Agent Agent Agent

Scheduler

LP LP

Scheduler Scheduler

(4)

Experiments with different load distributions

In these experiments the need for load balancing is shown. Partitioning of the simulation problem between workstations is an important task. With a good partitioning, the workstations can carry out their tasks as independently as possible. This means that there are fewer message transmissions between workstations, which minimizes the delay caused by the network. Running the simulation in a single workstation minimizes the network usage, but it will hardly speed up the simulation. A better approach is to divide the problem evenly into all workstations according to the relative speeds of the workstations. Figure 5 presents simulation runs with three different LP partitionings. The GSM network application was used in these tests. 15 minutes of the life of a GSM system was simulated with 9 workstations. The base stations presented one class of LPs and the mobiles in the area of a base station another class. The best balancing is marked with “A”. In this setup the LPs are grouped in such a way that communication between workstations is minimized. The test where network traffic was intentionally maximized is marked with “C”. A possible, but randomly done division is marked with “B”. It can easily be seen that it is important to focus on load balancing. A 5-fold increase in simulation time can be seen between the best and the worst division.

The results show that good simulation time requires knowledge of the application. The load balancing approach could be used to free the user from this. A simulation could be started with a random LP distribution and load balancing could be used to relocate the LPs. A good distribution requires an analysis of communication between LPs and knowledge of the location of LPs.

0 50 100 150 200 250 300 350 400 450 500

100 200 300 400 500 600 700 800

Load (Calls/h) Time (s)

A B C Configuration

Figure 5. Balanced vs. unbalanced simulation.

ALGORITHM FOR LOAD BALANCING

In this chapter an algorithm for load balancing is presented. The algorithm assumes that the load in the LPs is fairly constant and does not change heavily over time. If the LPs can be in different states, e.g. sleeping or active during the simulation, this will affect the number of LP transfers considerably. A suitable transfer threshold and restriction time between transfers might be required. In the following algorithm, the load in each LP is assumed to be slowly changing and the LPs are assumed to be of equal size.

During a distributed simulation the Agents have to wait for synchronization messages from other Agents. In other words the Agents are frequently idle as they are waiting for messages. During this time the Agents can calculate the load in their workstation and send the information to other Agents. Load announcements should be done now and then, but not always when the Agent is idle. How often the announcement should be done depends on the simulation problem, the length of the simulation and the network. A 10 or 30 second pause between load announcements could be reasonable, as load balancing must not degrade the performance and processes can not be transferred too often.

If simulation idle and no recent load announcement

Calculate load and announce load

Yes

No

Continue simulation

Figure 6. Load announcement

(5)

load announcements when they are idle. There is no need to announce load if there is no free capacity. As a load announcement message is received, the value carried by the message is stored into the load table. These announcements can be sent by an unreliable multicast protocol, as an error on load value is not so critical. The transfer of load must be negotiated even if an Agent advertises free capacity. This is to avoid overloading.

Receive load announcement

Update load announcement

table

Continue simulation

Figure 7. Load table update

Simulation Agents must check the load situation in the workstation periodically. If the workstation has more work to do than defined by some threshold it must check the load announcements done by the other Agents. This is illustrated in Figure 8. If there are announcements that indicate that there exist hosts that could do more work, one of them is selected randomly. This is to avoid situations where all Agents want to transfer a load to the same workstation. A load balancing query is sent to the selected host. This is for confirmation that there will not be multiple hosts transferring loads to the same workstation at the same time and that the receiver can handle the extra load. If there are no hosts that have indicated that they can handle more work, the simulation continues without changes.

When a load balancing request is received (Figure 9), an Agent must check if it has just received another. If the processing of a request is not completed the Agent must reject the request and inform the Agent which made the request by a rejection message. If the Agent has no other requests pending and it has still idle processing

power it can accept the request and reply by an accept message. After sending the accept message the Agent continues simulation as before and waits for the actual load balancing operation to start.

If load > threashold

Underutilized workstations

available

Send load balancing request Randomly select

host from possible receivers

Yes

Continue simulation No

No

Figure 8. Initiation of load balancing

Receive load balancing

request

Recent request pending

or accepted

Send reject

Load < accept threashold

Send accept

Continue simulation Yes

No

Yes

Figure 9. Handling of load balancing request

(6)

whether the load balancing is still needed. If it is not needed any more, accept is replied by a cancel message. The processing of the accept and cancel messages are shown in Figure 10 and Figure 11. If load balancing is continued, an LP must be selected for the transfer, if not selected earlier. A message must be sent to the Agents who have LPs connected to the selected LP. This message informs other Agents that messages can not be sent to the LP to be transferred and they must be stored until later time (Figure 12). The Agent must form a message from the LP, its variables, messages and queues. This information is sent to the selected host. After this it can continue the simulation. The transferred LP should be deleted only after it is sure that there are no messages in transit for it. If there are messages arriving after the transfer of an LP, they must be forwarded into a new Agent.

Receive accept load balancing

message

Load balancing still needed

Send cancel

Send message to other workstations to

stop sending messages to LP to be transferrend.

Continue simulation

No

Take snapshot from the LP. and transfer

the LP Yes

Figure 10. Receive accept load balancing

message.

Receive cancel

Clear information

about negoation

Continue simulation

Figure 11. Cancel load balancing request

message.

When a new LP message is received (Figure 13), the Agent creates the LP object and reconstructs messages into its queues. After this the other Agents must be informed that the LP has a new location and messages must be sent there. After it has been confirmed that there are no more messages in transit via the old location of the LP, the old LP can be deleted and a new one started.

Receive LP transfer message

Stop message queues to that

LP.

Continue simulation

Figure 12. Receive info about the transfer of

an LP.

(7)

Normal load measurement in UNIX based operating systems is an average of process queue length in some time unit. Usually the system gives load averages for 1, 5 and 15 minutes. For simulation purposes these numbers are suitable in most cases. Runtime versus load has been analyzed in [Mey97]. When a simulation is starting, shorter time load averages are needed as the load should be distributed quickly. An analysis of the load based on the LPs could be a fruitful approach. This way the selection of a transferred LP could be negotiated to be a suitable chunk of work for the receiver. This approach can be too complex for an initial approach for load balancing and the use of direct load measurement given by the system is proposed, as it shows the overall capacity in the system.

Receive an LP

Setup the LP

Send message to other Agents to

release messages to

this LP.

Confirm that there are no messages in transit for this

LP.

Start the LP

Continue Simulation

Figure 13. Receive an LP.

CONCLUSIONS AND FUTURE WORK

Distributed simulation was shown to require load balancing. Load balancing can be performed by hand as an initial division of the problem and placement of the simulation problem. Load distribution by hand requires good knowledge of the application. This work can be automated by analyzing the load in workstations during the simulation and communication between LPs and by creating a distribution file from this information. Load balancing done this way can not respond to load changes in simulation equipment or periodic load changes in a simulation problem.

An algorithm for dynamic load was presented. It requires load information from the system and offers process transfer negotiation between Agents. Processor thrashing by sending too much work for an Agent is avoided by this protocol.

Further work for dynamic load balancing requires an implementation of the presented load balancing algorithm. The implementation will be tested on a previously implemented distributed simulation environment (Diworse). The tests will be used to determine suitable load transfer thresholds for the GSM application. Also methods for determining suitable LPs for transfer and their cost are looked at.

REFERENCES

[Arc97] Archarya A., Edjlali G. and Saltz J.: The Utility of Exploiting Idle Workstations for Parallel Computation, ACM SIGMETRICS, 1997.

[Arp95] Arpaci R. H., Dusseau A. C., Vahdat A. M., Liu L. T., Anderson T. E. and Patterson D. A.: The Interaction of Parallel and Sequential Workloads on a Network of Workstations, SIGMETRICS, 1995.

[_Cha79] _{Chandy K and Misra J.: Distributed}

Simulation: A Case Study in Design and Verification of Distributed Programs, IEEE Transactions on Software Engineering, 1979, pp. 440-452.

(8)

and Distributed Computing, vol. 46, 1997, pp. 125-135.

[_Fri95] _{Fritsch T., Tutschku K. and Laeibnitz}

K.: Field strength prediction by ray-tracing for adaptive base station positioning in mobile communication networks, University of Wurzburg, Research report No. 122, Aug. 1995.

[Fuj92] Fujimoto R. and Nicol D.: State of the Art in Parallel Simulation, Proceedings of the Winter Simulation Conference 1992, Arlington, USA, 1992, pp. 246-254.

[Fuj98] Fujimoto R. M.: Time Managment in The High Level Architecture, Simulation, vol. 71, Number 6, December 1998, pp. 388-400.

[Hut98] Huttunen P., Porras J., Ikonen J. and Sipilä K.: Using Cray T3E for the Parallel Simulation of Cellular Radio Coverage Calculation, Proceedings of the Eurosim’98, 1998.

[Hut99a] Huttunen P.: Increasing the Performance of a WCDMA System Simulator Through Parallel

Computing Techniques,

Lappeenranta University of

Technology, Master’s thesis, 1999.

[Hut99b] Huttunen P., Ikonen J. and Porras J: Parallellization of a WCDMA System Simulator for a Shared Memory Multiprocessor Machine, to be published in European Simulation Symposium (ESS ’99), Erlangen-Nuremberg, October 26-28, 1999.

[Ina94] Inanoglu H. and Topuz E.: A ray based indoor propagation model for DECT applications, European Simulation Symposium 1994.

[Lu98] Lu Q. and Lau S-M.: A Negotiation Protocol for Dynamic Load Distribution Using Batch Task Assignments, Journal of Parallel and Distributed Computing 55, 1998, pp. 166-191.

[Por97] Meyer T., Davis J. and Davidson J.: Analysis of Load Average and its Relationship to Program Run Time on Networks of Workstations, Journal of Parallel and Distributed Computing, Vol. 44, 1997, pp. 141-146.

[Por97] Porras J., Hara V., Harju J. and

Ikonen J.: Improving the

Performance of the Chany-Misra Parallel Simulation Algorithm in a

Distributed Workstation

Environment, Proceedings of the 1997 Summer Computer Simulation Conference, Arlington, USA, 1997, pp.657-662.

[Sal94] Salmi M.: Parallel ray tracing in propagation model of indoor mobile radio communication, Research report 51, Department of Information

Technology, Lappeenranta

University of Technology,

Lappeenranta Finland, 1994.

[Tut96] Tutschku K. and Leibnitz K.: Fast Ray-Tracing for Field Strength Prediction in Cellular Mobile Network Planning, University of Wurzburg, Research report No. 134, 1996.

BIOGRAPY

Jouni Ikonen received Master of Science degree from Michigan Technological University in 1994, Master of Science degree from Lappeenranta University of Technology in 1995. He is currently working as research engineer at Lappeenranta University of Technology. His interests include distributed simulation, wireless networks and data communications.

Jari Porras received Master of Science degree from Michigan Technological University in 1993, Master of Science in 1993, Licentiate in Technology degree in 1996 and Doctor of Technology degree from Lappeenranta University of Technology in 1998. Currently he is working as a professor at the Lappeenranta University of Technology. His main research interests are the use of clustered workstations for distributed computing, parallel algorithms and cellular networks

ACKNOWLEDGEMENTS