Finding an Initial Feasible Controller - Formal Methods for Control Synthesis in Partially Obse

constraints will have to made if the involved I-state belongs to theG^ss states of the FSC controller.

In addition, if an I-state is added to the FSC, it must also be assigned to eitherG^tr orG^ss, because the next policy evaluation iteration depends on the I-state partitioning in the computation ofT_mod^{P M}^ϕ. The procedure for adding I-states is provided in Algorithm 6.4.

Algorithm 6.4 can be understood as follows. Assume that a tangent beliefb for some I-stateg.

Similar to Algorithm 6.2, instead of directly improving the value of the tangent belief, the algorithm tries to improve the value of forwarded beliefs reachable in one step from the tangent beliefs. This is given in Step 4 of Algorithm 6.4. Recall from Section 6.4.1 that when a new I-state is added, its successor states are chosen from the existing I-states. A similar approach is used in Algorithm 6.4.

However, a new node may be added to eitherG^trorG^ssdepending on the I-state that generated the original tangent belief. Recall that I-states inG^sshave two additional constraints. First, no state in G^sscan transition to any state inG^tr. This is enforced by limiting the successor state candidates in Steps 6-9. Secondly, for improving a node inG^ss, the allowed actions and transitions must satisfy the Poisson Equation constraints of Equation (6.43). This further reduces or prunes the possible successor candidates in Step 10, which is elaborated as a separate procedure in Algorithm 6.5. The rest of the procedure is identical to Algorithm 6.2, except for Step 20, in which any newly added I-state is placed in the correct partition ofG^tr orG^ss.

Algorithm 6.5 prevents any new I-states to choose a pair of action and successor I-state that may violate the Feasibility Constraints of Equation (6.43). In order to carry out this procedure, a phantom I-state,gphantomPG^ssis temporarily added to the current FSC for a pairpg,αq Pcandidates. Next, the modified transition distributionTmod,phantom^{P M}^ϕ is computed using Equation (5.3), and the Poisson Equation is solved to obtain a new⃗gwhich can be used to verify the Feasibility Constraint. If this constraint is violated. i.e., thenpg,αqis removed from the setcandidates. Note that the algorithm works on a copy of the original FSC, and the solution of the Poisson Equation computed at the last Policy Evaluation step. The addition of gphantom, and recomputation of the Poisson Equation is only used within Algorithm 6.5.

Algorithm 6.4 Adding I-states to Escape Local Maxima of Constrained Optimization Criterion Input: (a) Set B of tangent beliefs for each I-state. (b) A functionnode:B ÑGidentifying the

I-state which yields each tangent belief. (c)N_new the maximum number of I-states to add.

1: N_addedÐ0.

2: repeat

3: PickbPB, BÐBztbu, gÐnodepbq.

4: Compute the set of forwarded beliefs,F wd, as in Steps 4-10 of Algorithm 6.2.

5: for allbf wdPF wd do

6: if gPG^tr then

7: candidatesÐGˆAct.

8: else

9: candidatesÐG^ssˆAct.

10: candidatesÐPruneCandidates(candidates,bf wd,V⃗^av,⃗g) using Algorithm 6.5.

11: end if

12: if candidatesÐHthen

13: Go to step 5.

14: end if

15: Apply the r.h.s. of DP Backup Equation tobf wd

V^β,backeduppbf wdq “ max

pg,αqPcandidates

r^βpbf wdq `β ÿ

oPO

Prpo|bf wdq´

b^o,α_{f wd}psqV_g^βpsq¯+ (6.48) where, b^o,α_{f wd} is computed for each product states¹PS as follows

b^o,α_{f wd}ps¹q “ÿ

Tps¹|s,αq Opo|sqbf wdpsq ř

o¹POOpo¹|sqbf wdpsq. (6.49)

16: Note the maximizing actionα^˚ and I-stateg^˚.

17: end for

18: if V^β,backeduppbf wdqąV^βpbf wdqthen

19: Add new deterministic I-state gnewsuch that ωpgnew|g^˚,α^˚, oq “1@oPO.

20: Assigng_new to correct FSC partition as follows:

g_newP

G^tr ifgPG^tr

G^ss otherwise. (6.50)

21: N_addedÐN_added`1.

22: end if

23: if N_addeděN_new then

24: return

25: end if

26: untilB “ H.

Algorithm 6.5 Pruning candidate successor I-states and actions to satisfy recurrence constraints.

Input: Set of candidate successor states and actionscandidatesĎG^ssˆAct.

1: for allpg,αq Pcandidatesdo

2: Add new stateg_phantomto G^ss to create a larger FSC where,

ωpg, a|gphantom, oq “1 @oPO. (6.51)

3: ComputeT_mod^{P M}^ϕ and⃗ι^ss_init for the new larger global ssd Markov chain.

4: Solve Poisson Equation for the new larger global Markov chain to obtain solutions⃗g,V⃗^av.

5: if Any Feasibility Constraints in Equation (6.43) are violated under the larger FSCthen

6: candidateÐcandidatesztpg,αqu.

7: end if

8: end for

9: return candidates

⃗g_{f eas} “ T_mod^{P M}^ϕ⃗g_{f eas}, and V⃗_{f eas}^av `⃗g_{f eas} “ ⃗r^β`T_mod^{P M}^ϕV⃗_{f eas}^av

(6.52)

Then, it can be shown that some state inRepeat^{P M}_r ^ϕˆG^ss is recurrent and can be reached from the initial distribution with positive probability if and only ifDgPG^ss such that

´⃗ι^{P M}_init^ϕ¯T

⃗g_{f eas,g}ą0. (6.53)

However, the constraint of never visiting the avoid states still applies. These procedures and constraints can be collected together in the following bilinear maximization problem.

max

ω,V⃗^av,V⃗_{f eas}^av ,⃗g,⃗gf eas

`⃗ι^{P M}_init^ϕ˘T

⃗g_{f eas,g} subject to

Poisson Equation 1:

V⃗âv`⃗g “ ⃗râv`T_mod^{P M}^ϕV⃗âv

⃗g “ T_mod^{P M}^ϕ⃗g Poisson Equation 2:

V⃗_{f eas}^av `⃗g_beta “ ⃗r^β`T_mod^{P M}^ϕV⃗_{f eas}^av

⃗g_{f eas} “ T_mod^{P M}^ϕ⃗g_{f eas} Feasibility constraints (@gPG^ss)

`⃗ι^ss_init,g˘T

⃗g “ 0 FSC Structure Constratins:

ωpg¹,α|g, oq “ 0 ifgPG^tr andgPG^ss Probability constraints:

g¹,α

ωpg¹,α|g, oq “ 1 @o ωpg¹,α|g, oq ě 0 @g¹,α, o

(6.54)

Any positive value of the objective `

⃗ι^{P M}_init^ϕ˘T

⃗g_{f eas,g} gives a feasible controller, and therefore the optimization need not be carried out to optimality. If the problem is infeasible, then states inG^ss can be successively added to search for a positive objective.

Dalam dokumen Formal Methods for Control Synthesis in Partially Observed Environments: Application to Autonomous Robotic (Halaman 116-119)