Experiments - Algorithms for Large-Scale Adversarial Decision Problems

our case, using the parity basis, this sum of products can be reorganized as a product of sums:g^a(y) =∏_i|X_i_∈CP(xi=0|y,a)−P(xi=1|y,a). These terms can be precomputed for each state variable allowing efficient computation. Second, we considerα =∑

xα(x)h(x) =

∑

c∈C

α(c)h(c), where α(c) is the marginal of α over Dom[C]. In the case of parity basis functions,α_l=0,∀l6=0 andβ =|∑ku^γg_k ^l^−h^lλ_k|.

Algorithm 4Greedy Factored MDP Interdiction A_av=A

A_m= /0 A_n=A_av Vˆ =∞

whileA_n6= /0do

a=C^HOOSERANDOM(A_av)

V =ATTACKERPOLICY(A_av\a,H) ifV^A<Vˆ then

A_m=A_m∪a Vˆ =V A_av=A_av\a A_n=A_av ifA_av=/0then

break else

A_n=A_n\a

returnV =GENERATEBASIS(A\A_m,H)

advising and c) wildfire. While these have little direct connection to security, they provide the most meaningful evaluation of our approaches in terms of effectiveness and scalability:

prior security-related domains which consider multi-stage attacks use toy examples which would not provide a meaningful evaluation.

For all experiments, each defender mitigationm∈M blocks exactly one actiona. We also letR^A(x,a) =−R^D(x,a)−C_a, whereC_ais the cost of action a, which we set to 0 for the default (no-op) action and to 0.5 for all other actions.. We set the cost of imposing a mitigationC_m=1 for allm. We use the discount factor ofγ =0.9. The experiments are run on a 2.4GHz hyperthreaded 8-core Ubuntu Linux machine with 16 GB RAM, with CPLEX version 12.51 used to solve MILP instances.

4.6.2 Comparison with Exact MDP Interdiction

First, we compare the performance of the constraint generation with basis selection algorithm to the state-of-the-art optimal solution in MDP interdiction proposed by Letchford and Vorobeychik [19]. We consider the sysadmin domain with n=2−10 state variables

2 3 4 5 6 7 8 9 10 Number of state variables 0.0

0.5 1.0 1.5 2.0

Runtime in seconds

1e6 optimal s=4s=3 s=2s=1

2 3 4 5 6 7 8 9 10 Number of state variables 105

1520 2530 3540 45

Utility

optimal s=4s=3 s=2s=1

Figure 4.1: Comparison of exact and approximate MDP interdiction in terms of runtime (left) and attacker utility (right; lower is better for the defender).

(2ⁿstates) and 10 actions. We evaluate our approach withs=1,2,3 and 4, wheresis the maximum number of state variables in the scope of any basis.

As expected, the runtime of the exact MDPI is dominated by our approach for suffi- ciently many state variables (Figure 4.1(a)); more significantly, the exact approach runs out of memory for larger problem sizes.

From Figure 4.1(b) we can see that while the utility of approximate interdiction improves significantly ass increases from 1 to 2, it already becomes close to optimal when s=2, with little added value from increasing it further. The results are similar for other IPC domains. Consequently, our experiments below uses=2.

4.6.3 Scalability

We evaluate the constraint generation approach on larger problem sizes on the sysadmin domain (up to 60 state variables and 60 actions). Even with constraint generation with only a subset of basis functions, our baseline algorithm (marked as “slow bilevel”) scales poorly forn>30. On the other hand, the use of fast constraint generation (Algorithm 2, marked as “fast bilevel”), significantly improves scalability (Figure 5.1 left). Indeed, the baseline (slow bilevel) becomes intractable forn≥50, whereas we can successfully solve these with the “fast” approach. Since we compute the utility of the final attacker policy using basis

generation, the solution accuracy is not compromised (Figure 5.1 right).

10 20 30 40 50 60

Number of state variables 10¹

10² 10³ 10⁴ 10⁵ 10⁶

Runtime in seconds slow bilevel fast_bilevel

10 15 20 25 30 35 40 45 Number of state variables 20

40 60 80 100 120

Utility

slow bilevel fast_bilevel

Figure 4.2: Comparison between baseline (slow) and fast interdiction on the sysadmin domain in terms of runtime (left) and utility (right).

In the second set of scalability experiments, we evaluate our approaches on 10 problem instances of the academic advising domain. The problem size increases with problem number from 10 to 30 courses (20 to 60 state variables and 10 to 30 actions). For each problem size, there are two instances, corresponding to different program requirements and course prerequisites. The first (odd numbered) problem instance is somewhat simpler (fewer prerequisites per course). The second (even numbered) instance is more complicated, with a larger number of prerequisites per course (larger number of connections in the underlying DBN). Problem 10 has the largest problem size with 30 courses, 11 program requirements, 3 prerequisites for most courses and 4 prerequisites for 8 courses. As demonstrated in Figure 5.2, we observe a similar trend as before: the fast constraint generation approach significantly outperforms baseline without compromising much solution quality. The baseline is intractable for problems 7 to 10 (n≥50).

In the third set of experiments, we evaluate on 6 problem instances of the wildfire domain. The grid size increases with problem number fromm=3 to 5 (n=2×m²=18 to 50 state variables, and 36 to 100 actions). For each grid size, there are two instances, corresponding to different neighbourhood configurations and targets (cells on the grid that need to be protected). The first (odd numbered) problem instance has fewer targets than the second (even numbered) instance. The results are shown in Figure 5.3. The baseline is

1 2 3 4 5 6 7 8 9 10 Problem number

10² 10³ 10⁴ 10⁵ 10⁶

Runtime in seconds

slow bilevel fast_bilevel

1 2 3 4 5 6

Problem number

−280−260

−240−220

−200−180

−160−140

−120−100

Ut ilit y

slow bilevel fast_bilevel

Figure 4.3: Comparison between baseline (slow) and fast interdiction on the academic advising domain in terms of runtime (left) and utility (right).

again intractable on problems 5 and 6 (n=50) which can be solved by fast bilevel.

1 2 3 4 5 6

Problem number 10²

10³ 10⁴ 10⁵ 10⁶

Runtime in seconds slow bilevel fast bilevel

1.0 1.5 2.0 2.5 3.0 3.5 4.0 Problem number

−12000

−10000

−8000

−6000

−4000

−2000 0

Utility

slow bilevel fast bilevel

Figure 4.4: Comparison between baseline (slow) and fast interdiction on the wildfire domain in terms of runtime (left) and utility (right).

4.6.4 Effectiveness of Greedy Interdiction

Finally, we compare the greedy interdiction algorithm to fast constraint generation. As shown in Figures 4.5-4.7, the greedy algorithm is faster for larger problem sizes, saving up to an order of magnitude of computation time, without significantly compromising solution quality.

10 20 30 40 50 60 Number of state variables 10¹

10² 10³ 10⁴ 10⁵

Runtime in seconds ^greedyfast_bilevel

10 20 30 40 50 60

Number of state variables 2040

6080 100120 140160

Utility

greedy fast_bilevel

Figure 4.5: Comparison between fast interdiction and greedy in terms of runtime (left) and utility (right) on the sysadmin domain.

1 2 3 4 5 6 7 8 9 10 Problem number

10² 10³ 10⁴ 10⁵ 10⁶

Runtime in seconds

greedy fast_bilevel

1 2 3 4 5 6 7 8 9 10 Problem number

−300

−250

−200

−150

−100

Ut ilit y

fast_bilevel greedy

Figure 4.6: Comparison between fast interdiction and greedy in terms of runtime (left) and utility (right) on the academic advising domain.

1 2 3 4 5 6

Problem number 10²

10³ 10⁴ 10⁵ 10⁶

Runtime in seconds ^greedyfast bilevel

1 2 3 4 5 6

Problem number

−18000

−16000

−14000

−12000

−10000−8000−6000−4000−20000

Utility

greedy fast bilevel

Figure 4.7: Comparison between fast interdiction and greedy in terms of runtime (left) and utility (right) on the wildfire domain.

Dalam dokumen Algorithms for Large-Scale Adversarial Decision Problems (Halaman 67-72)