Polyagents for Battle Planning - Polyagent Combat Simulation

MAS Combat Simulation

H. Van Dyke Parunak

3. Polyagent Combat Simulation

3.3. Polyagents for Battle Planning

potential ﬁeld algorithm that makes it much better suited for centralized computation.

• The swarming approach is dynamic. Ghosts are continually emitted by their avatars as the avatars move, and any changes to the landscape during the course of the mission automatically vary the portion of the path not yet traversed. The potential ﬁeld method plans a complete path based on a snapshot of the terrain being traversed. If the landscape changes, the user must decide whether to continue to use an old path that may no longer be optimal, or recompute a new path.

• The swarming approach is stochastic. The polyagents ghosts explore a range of alternative trajectories for the robot, reflecting the uncertainty of movement in the physical world, and the path that is computed is a weighted combination of these trajectories. The potential field method is determinis- tic. It reflects the likely experience of the entity that is to follow the path only in the single loss, cost, or friction value that it assigns to its initial raster, and does not account for possible variation in the experience of the entity as it follows the path.

current time t. The timeline is divided into discrete “pages”, each representing a successive value ofτ. The avatar inserts the ghosts at the insertion horizon. In one instantiation of this system, the insertion horizon is atτ−t=−30, meaning that ghosts are inserted into a page representing the state of the world 30 minutes ago.

At the insertion horizon, each ghosts behavioral parameters (desires and dispo- sitions) are sampled from distributions to explore alternative personalities of the entity it represents.

Figure 3. Behavioral Evolution and Extrapolation. Each avatar generates (a) a stream of ghosts that sample the personality space of its entity. They evolve (b, c) against the entitys recent observed behavior. The ﬁttest ghosts run into the future (d), and the avatar analyzes their behavior (e) to generate predictions.

Each page between the insertion horizon andτ =t(“now”) records the his- torical state of the world at the point in the past to which it corresponds. As ghosts move from page to page, they interact with this past state, based on their behavioral parameters. These interactions mean that their ﬁtness depends not just on their own actions, but also on the behaviors of the rest of the population, which is also evolving. Becauseτ advances faster than real time, eventuallyτ =t(actual time). At this point, each ghost is evaluated based on its location compared with the actual location of its corresponding real-world entity.

The ﬁttest ghosts have three functions.

1. The personality of each entitys ﬁttest ghost is reported to the rest of the system as the likely personality of that entity. This information can be used

to detect the emotional state of individual entities [15], or to identify diﬀerent roles in the adversarial organization.

2. The fittest ghosts breed genetically and their offspring return to the insertion horizon to continue the fitting process.

3. The fittest ghosts for each entity form the basis for a population of ghosts that run past the avatar’s present into the future. Each ghost that runs into the future explores a different possible future of the battle, analogous to how some people plan ahead by mentally simulating different ways that a situation might unfold. Analysis of the behaviors of these different possible futures yields predictions.

Thus BEE has three distinct notions of time, all of which may be distinct from real-world time.

1. Domain time t is the current time in the domain being modeled. If BEE is applied to a real-world situation, this time is the same as real-world time. In our experiments, we apply BEE to a simulated battle, and domain time is the time stamp published by the simulator. During actual runs, the simulator is often paused, so domain time runs slower than real time. When we replay logs from simulation runs, we can speed them up so that domain time runs faster than real time.

2. BEE timeτ for a page records the domain time corresponding to the state of the world represented on that page, and is oﬀset from the current domain time.

3. Shift time is incremented every time the ghosts move from one page to the next. The relation between shift time and real time depends on the processing resources available.

The distribution of each pheromone flavor over the environment forms a field that represents some aspect of the state of the world at an instant in time. Each page of the timeline is a complete pheromone field for the world at the BEE time τ represented by that page. The behavior of the pheromones on each page depends on whether the page represents the past or the future.

In pages representing the future (τ > t), the usual pheromone mechanisms apply. Ghosts deposit pheromone each time they move to a new page, and pheromones evaporate and propagate from one page to the next.

In pages representing the past (τ≤t), we have an observed state of the real world. This has two consequences for pheromone management. First, we will have observed some of the entities, and can generate the pheromone ﬁelds directly from their observed locations. Second, we can adjust the pheromone intensities based on the changed locations of entities from page to page, so we do not need to evaporate or propagate the pheromones.

Execution of the pheromone infrastructure proceeds on two time scales, run- ning in separate threads.

The ﬁrst thread updates the book of pages each time the domain time advances past the next page boundary, executing this algorithm at each time step:

Replace ‘‘now + 1’’ page with page showing locations and strengths of observed units;

Add empty page at the prediction horizon;

Discard the oldest page (since it has passed the insertion horizon).

The second thread moves the ghosts from one page to the next, as fast as the processor allows, executing the following algorithm at each step:

For each ghost reaching the τ=t page Evaluate fitness

Remove or breed

Insert new ghosts from avatars and evolution at insertion horizon

Insert fittest ghosts at τ =t to run into the future Remove ghosts that have reached the prediction horizon For each ghost

Plan next actions based on pheromone field in current page Move to next page

Execute planned actions (including pheromone deposits) For each future page, evaporate and propagate pheromones

Ghost movement based on pheromone gradients is a simple process, so this system can support realistic agent populations without excessive computer load.

In our current system, each avatar generates eight ghosts per shift. Since there are about 50 entities in the battlespace (about 20 units each of Red and Blue and about 5 of Green), we must support about 400 ghosts per page, or about 24000 over the entire book.

How fast a processor do we need? Letpbe the real-time duration of a page in seconds. If each page represents 60 seconds of domain time, and we are replaying a simulation at 2x domain time, p= 30. Let n be the number of pages between the insertion horizon andτ =t. In our current system,n= 30. Then a shift rate ofn/pshifts per second will permit ghosts to run from the insertion horizon to the current time at least once before a new page is generated. Empirically, this level is

a lower bound for reasonable performance, and easily achievable on stock WinTel platforms.

The ﬂexibility of the BEEs pheromone infrastructure permits the integration of numerous information sources as input to our characterizations of entity personalities and predictions of their future behavior. Our current system draws on three sources of information, but others can readily be added.

Observations from the real world are encoded into the pheromone ﬁeld each increment of BEE time, as a new “current page” is generated. Statistical tech- niques⁴ estimate the level of threat to each force (Red or Blue), based on the topology of the battleﬁeld and the known disposition of forces. For example, a broad open area with no cover is threatening, especially if the opposite force occupies its margins. The results of this process are posted to the pheromone pages as “RedThreat” pheromone (representing a threat to red) and “BlueThreat”

pheromone (representing a threat to Blue).

While plan recognition is not sufficient for effective prediction, it is a valuable input. We dynamically configure a Bayes net based on heuristics to identify the likely goals that each entity may hold⁵. The destinations of these goals function as

“virtual pheromones”. Ghosts include their distance to such points in their action decisions, achieving the result of gradient following without the computational expense of maintaining a pheromone ﬁeld.

The BEE technology performs impressively compared both with human and alternative computational predictors [14].

In one series of wargames, one of the operators controlling a conventional simulation used to evaluate the consequences of players decisions superimposed a coward personality on two units. If they were in a threatening situation, they would ignore the commands issued by the decision-makers and instead ﬂee. Experienced human observers tried to identify which units were the cowards. Fig. 4 shows that our polyagent system was able to identify the cowards as accurately as experienced human oﬃcers, but more rapidly.

Fig. 5 shows the superior accuracy of our predictions of the positions of Red units compared with those of human observers. The Wilcoxon test shows that the difference between the H15 scores is significant at the 99.76% level, while that between the H0 scores is significant at more than 99.999%. Fig. 6 shows that our predictions are also more accurate than those produced by a game-theoretic predictor [29].

4This process, known as SAD (Statistical Anomaly Detection), was developed by our colleagues Rafael Alonso, Hua Li, and John Asmuth at Sarnoﬀ Corporation. Alonso and Li are now at SET Corporation.

5This process, known as KIP (Knowledge-based Intention Projection), was developed by our colleagues Paul Nielsen, Jacob Crossman, and Rich Frederiksen at Soar Technology.

Figure 4. BEE vs. Human Emotion Detection.

Figure 5. Box-and-whisker plots of RAID (BEE) and Staﬀ predictions at 0 and 15 minutes Horizons. Y-axis is CEP radius in meters; lower values indicate greater accuracy.

Dalam dokumen Autonomous Agents and Multi-Agent Systems (Halaman 148-153)