• Tidak ada hasil yang ditemukan

constraints will have to made if the involved I-state belongs to theGss states of the FSC controller.

In addition, if an I-state is added to the FSC, it must also be assigned to eitherGtr orGss, because the next policy evaluation iteration depends on the I-state partitioning in the computation ofTmodP Mϕ. The procedure for adding I-states is provided in Algorithm 6.4.

Algorithm 6.4 can be understood as follows. Assume that a tangent beliefb for some I-stateg.

Similar to Algorithm 6.2, instead of directly improving the value of the tangent belief, the algorithm tries to improve the value of forwarded beliefs reachable in one step from the tangent beliefs. This is given in Step 4 of Algorithm 6.4. Recall from Section 6.4.1 that when a new I-state is added, its successor states are chosen from the existing I-states. A similar approach is used in Algorithm 6.4.

However, a new node may be added to eitherGtrorGssdepending on the I-state that generated the original tangent belief. Recall that I-states inGsshave two additional constraints. First, no state in Gsscan transition to any state inGtr. This is enforced by limiting the successor state candidates in Steps 6-9. Secondly, for improving a node inGss, the allowed actions and transitions must satisfy the Poisson Equation constraints of Equation (6.43). This further reduces or prunes the possible successor candidates in Step 10, which is elaborated as a separate procedure in Algorithm 6.5. The rest of the procedure is identical to Algorithm 6.2, except for Step 20, in which any newly added I-state is placed in the correct partition ofGtr orGss.

Algorithm 6.5 prevents any new I-states to choose a pair of action and successor I-state that may violate the Feasibility Constraints of Equation (6.43). In order to carry out this procedure, a phantom I-state,gphantomPGssis temporarily added to the current FSC for a pairpg,αq Pcandidates. Next, the modified transition distributionTmod,phantomP Mϕ is computed using Equation (5.3), and the Poisson Equation is solved to obtain a new⃗gwhich can be used to verify the Feasibility Constraint. If this constraint is violated. i.e., thenpg,αqis removed from the setcandidates. Note that the algorithm works on a copy of the original FSC, and the solution of the Poisson Equation computed at the last Policy Evaluation step. The addition of gphantom, and recomputation of the Poisson Equation is only used within Algorithm 6.5.

Algorithm 6.4 Adding I-states to Escape Local Maxima of Constrained Optimization Criterion Input: (a) Set B of tangent beliefs for each I-state. (b) A functionnode:B ÑGidentifying the

I-state which yields each tangent belief. (c)Nnew the maximum number of I-states to add.

1: NaddedÐ0.

2: repeat

3: PickbPB, BÐBztbu, gÐnodepbq.

4: Compute the set of forwarded beliefs,F wd, as in Steps 4-10 of Algorithm 6.2.

5: for allbf wdPF wd do

6: if gPGtr then

7: candidatesÐGˆAct.

8: else

9: candidatesÐGssˆAct.

10: candidatesÐPruneCandidates(candidates,bf wd,V⃗av,⃗g) using Algorithm 6.5.

11: end if

12: if candidatesÐHthen

13: Go to step 5.

14: end if

15: Apply the r.h.s. of DP Backup Equation tobf wd

Vβ,backeduppbf wdq “ max

pg,αqPcandidates

#

rβpbf wdq `β ÿ

oPO

Prpo|bf wd

bo,αf wdpsqVgβpsq¯+ (6.48) where, bo,αf wd is computed for each product states1PS as follows

bo,αf wdps1q “ÿ

s

Tps1|s,αq Opo|sqbf wdpsq ř

o1POOpo1|sqbf wdpsq. (6.49)

16: Note the maximizing actionα˚ and I-stateg˚.

17: end for

18: if Vβ,backeduppbf wdqąVβpbf wdqthen

19: Add new deterministic I-state gnewsuch that ωpgnew|g˚˚, oq “1@oPO.

20: Assigngnew to correct FSC partition as follows:

gnewP

"

Gtr ifgPGtr

Gss otherwise. (6.50)

21: NaddedÐNadded`1.

22: end if

23: if NaddeděNnew then

24: return

25: end if

26: untilB “ H.

Algorithm 6.5 Pruning candidate successor I-states and actions to satisfy recurrence constraints.

Input: Set of candidate successor states and actionscandidatesĎGssˆAct.

1: for allpg,αq Pcandidatesdo

2: Add new stategphantomto Gss to create a larger FSC where,

ωpg, a|gphantom, oq “1 @oPO. (6.51)

3: ComputeTmodP Mϕ and⃗ιssinit for the new larger global ssd Markov chain.

4: Solve Poisson Equation for the new larger global Markov chain to obtain solutions⃗g,V⃗av.

5: if Any Feasibility Constraints in Equation (6.43) are violated under the larger FSCthen

6: candidateÐcandidatesztpg,αqu.

7: end if

8: end for

9: return candidates

⃗gf eas “ TmodP Mϕ⃗gf eas, and V⃗f easav `⃗gf eas “ ⃗rβ`TmodP MϕV⃗f easav

(6.52)

Then, it can be shown that some state inRepeatP Mr ϕˆGss is recurrent and can be reached from the initial distribution with positive probability if and only ifDgPGss such that

´⃗ιP Minitϕ¯T

⃗gf eas,gą0. (6.53)

However, the constraint of never visiting the avoid states still applies. These procedures and con- straints can be collected together in the following bilinear maximization problem.

max

ω,Vav,Vf easav ,⃗g,⃗gf eas

`⃗ιP Minitϕ˘T

⃗gf eas,g subject to

Poisson Equation 1:

V⃗av`⃗g “ ⃗rav`TmodP MϕV⃗av

⃗g “ TmodP Mϕ⃗g Poisson Equation 2:

V⃗f easav `⃗gbeta “ ⃗rβ`TmodP MϕV⃗f easav

⃗gf eas “ TmodP Mϕ⃗gf eas Feasibility constraints (@gPGss)

`⃗ιssinit,g˘T

⃗g “ 0 FSC Structure Constratins:

ωpg1,α|g, oq “ 0 ifgPGtr andgPGss Probability constraints:

ř

g1

ωpg1,α|g, oq “ 1 @o ωpg1,α|g, oq ě 0 @g1,α, o

(6.54)

Any positive value of the objective `

⃗ιP Minitϕ˘T

⃗gf eas,g gives a feasible controller, and therefore the optimization need not be carried out to optimality. If the problem is infeasible, then states inGss can be successively added to search for a positive objective.