Maximizing Probability of LTL Satisfaction for an FSC of Fixed Structure

4.2 Maximizing Probability of LTL Satisfaction for an FSC

ordering ĺ¹. That is, for any two Ri,Rj, either Ri ĺ¹ Rj or Rj ĺ¹ Ri, and ĺ¹ also satisfies the transitivity and antisymmetry conditions of Definition 4.2.1. Let eachRk PRecSets^GpIqhave total order ĺ_kĎRkˆRk over its member global states. Further, since some states in the global state spaceSˆG are transient, collect all transient states in the setT and endow it with an arbitrary, but fixed total ordering,ĺ_TĎ T ˆT. Finally define the following relationĺ on the global state spaceSˆGas follows. For@rs, gs,rs¹, g¹s PSˆG,

rs, gsĺ“ s¹, g¹‰

if and only if

$’

’’

’&

’’

rs, gs,rs¹, g¹s PRk and rs, gsĺ_k rs¹, g¹s, or, rs, gs PRk,rs¹, g¹s PRl, k‰l, andRk ĺ¹Rl, or, rs, gs PRk ĎRecSets^G and rs¹, g¹s PT, or, rs, gs,rs¹, g¹s PT and rs, gsĺ_T rs¹, g¹s.

(4.19)

Proposition 4.2.2 The relation ĺin Equation (4.19) defines a total order over the set of states SˆG.

Proof Sketch : See Appendix B.2.

When the transition probabilities of the controlled system are written in matrix form, then the above ordering results in acanonical block diagonal form

T^{P M}^ϕ^,G “

——

—–

TR₁ 0 . . . 0 0

0 TR₂ . . . 0 0

... ... . .. ... ...

0 0 . . . TRN 0

TTÑR1 TTÑR2 . . . TTÑRN TTÑT

fi ffiffi ffiffi ffiffi ffiffi ffifl

|S||G|ˆ|S||G|

(4.20)

where the matrices TRk, corresponding to the recurrent sets, are stochastic (each row sums to 1), and can be directly used to represent the restriction of the global Markov chain toRk.

Similar to the canonical form in Equation (4.20), the same global state ordering in Equation (4.19) results in a block form for the initial distribution of the controlled system as follows

⃗ι^{P M}_init^ϕ^,G “

——

—–

⃗ι^{P M}_init^ϕ^,GpR1q

⃗ι^{P M}_init^ϕ^,GpR2q ...

⃗ι^{P M}_init^ϕ^,GpRNq

⃗ι^{P M}_init^ϕ^,GpTq fi ffiffi ffiffi ffiffi ffiffi ffifl

|S||G|ˆ1

. (4.21)

4.2.3 Probability of Absorption in ϕ-feasible Recurrent Sets

Recall that for a given FSCG PGpIqof fixed structure, the recurrent set ϕ-RecSets^G, which is a subset of the global state spaceS ˆG, denotes the union of all recurrent sets that areϕ-feasible, and is uniquely determined by the structure, I. This section aims to compute the probability of absorption into this set, given the initial distribution of the product-POMDP, PM^ϕ. It will be shown to be a function of parameters Φ and Θ and analytical expressions of the probability of absorption in terms of these parameters will be derived.

The probability of absorption into a recurrent setRkfor finite Markov chains is well known [71].

Let ⃗1Mˆ1 denote a column vector of size M with all entries equal to 1. Then using the block decomposition in Equations (4.20) and (4.21), the following holds.

ΛpRkq “ Pr`

πÑRk|ι^{P M}_init^ϕ˘

“ ´

⃗ι^{P M}_init^ϕ^,GpRkq¯T

⃗1_|R_k_|ˆ1

loooooooooooooomoooooooooooooon Term 1

`´

⃗ι^{P M}_init^ϕ^,GpTq¯T´

I`TTÑT `T_T²_Ñ_T `. . .¯

TTÑRk⃗1_|R_k_|ˆ1

looooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooon Term 2

(4.22)

In the above equation, Term 1 is simply the probability that under the initial distribution,⃗ι^{P M}_init^ϕ^,G, the initial global stateps0, g0q PSˆGis in the recurrent setR_k. If this is so, any resulting path of the controlled system is guaranteed to remain inRk forever. Next, Term 2 can be rewritten as

Term 2“ ÿ8 t“0

⃗ι^{P M}_init^ϕ^,GpTq¯T´ TTÑT

¯t

TTÑRk⃗1_|R_k_|ˆ1. (4.23)

For each t, the corresponding summand denotes the probability that the execution started and stayed in some transient state in T, until exactlyt-th time step before getting absorbed in Rk at time stept`1. The following lemma shows that the infinite sum in Equation (4.23) converges.

Lemma 4.2.3 [71] : The limit

tÑ8lim ÿt k“0

T_T^k_ÑT (4.24)

exists and is equal topI´TTÑTq^´¹.

Equation (4.22) with Lemma 4.2.3 allows the probability of absorption in any ϕ-feasible set to be computed as

Pr`

πÑϕ-RecSets^G|ι^{P M}_init^ϕ˘

“ ř

RkĎϕ-RecSets^G

ΛpRkq

“ ř

RkĎϕ-^RecSets^GPr`

πÑRk|ι^{P M}_init^ϕ˘ ,

(4.25)

which gives the LTL satisfaction probability as a function of the parameters ofG.

4.2.3.1 Complexity and Eﬃcient Approximation

Before going into the details of how Equation (4.22) is used to optimize the satisfaction probability of ϕ, it is worthwhile to look at the complexity and eﬃciency of computing the r.h.s. of that equation. One source of computational complexity arises from the need to compute the recurrent sets of the global Markov chain. The growth of this complexity is analyzed in Section 4.5.1. Here, the computational complexity arising due to the infinite sum in Term 2 of Equation (4.22) is presented.

Lemma 4.2.3 oﬀers one method of computing the r.h.s using the infinite sum 1`TTÑT`T_T²_Ñ_T`. . ., by invertingp1´TTÑTq, which has complexityOp|T|³q. A less computationally intensive method to compute Term 2 is by recognizing that this sum is finally multiplied by TTÑRk⃗1_|R_k_|ˆ1, which computes to a column vector of size|Rk| ˆ1. As suggested in [1], this allows for an iterative method with lower complexity as follows. Initialize variablesv0 andx0

v0 “ TTÑRk⃗1_|R_k_|ˆ1,

x0 “ 0 (4.26)

Then iterate the equations

v_n`1 “ TTÑTvn, x_n`1 “ xn`vn

(4.27) until convergence, i.e., for some toleranceεxą0,DNx s.t. ||xN´xN´1||₈ďεx @ N ěNx. The convergence of Equation (4.27) is guaranteed because it is well known thatT_Tⁿ_Ñ_T Ñ0 asnÑ 8[71].

Then, use the approximation

Term 2{ “´

⃗ι^{P M}_init^ϕ^,G¯T

x_N_x. (4.28)

This approximation method has the complexityOp|Rk|²Nxq. In fact, if the underlying POMDP’s transition distribution is sparse, then sparse matrix multiplication and addition can be used in the approximation method, whose practical complexity reduces to Opc|Rk|Nxq with c ! |Rk|. The sparsity assumption may appear in many engineering examples such as robotic systems in which only a few other states are reachable from a particular state. The constantc is dependent on the sparsity level.

4.2.4 Gradient of Probability of Absorption

Equation (4.22) provides an expression for the probability of absorption: an analytical expression for its gradient with respect to Φand Θcan also be derived. In Terms 1 and 2 of Equation (4.22), the expression⃗ι^{P M}_init^ϕ^,Gis a function only of the parametersΘvia the initial FSC node distributionκ.

In Term 2, the expressions limtÑ8řt

k“0T_T^k_ÑT andTTÑRkare a function only of theΦparameters.

It suﬃces to provide the derivative of Terms 1 and 2 w.r.t. Θ, and the derivative of Term 2 w.r.t.

Φ. The rest of this section computes these quantities.

From Equation (3.1) in the definition of a global Markov chain

ι^{P M}_init^ϕ^,Gprs, gsq “ι^{P M}_init^ϕpsqκpg|ι^{P M}_init^ϕ,Θq ùñ Bι^{P M}_init^ϕ^,Gprs, gsq Bθi

“ι^{P M}_init^ϕpsqBκpg|ι^{P M}_init^ϕ,Θq Bθi

, (4.29) where ^Bκpg|ι^{P M}

ϕ init ,Θq

Bθi can be computed using Equation (4.17).

Next, it is shown how to compute the gradient of a general entry in the matrixT^{P M}^ϕ^,G. From Equation (3.2)

T^{P M}^ϕ^,Gprs¹, g¹s|rs, gs,Φq “ ř

oPO

αPAct

Opo|sqωpg¹,α|g, oqTps¹|s,αq

ùñ ^BT

P Mϕ,Gprs¹,g¹s|rs,gs,Φq

Bφg¯¯α|¯¯¯g¯o “ ř

oPO

αPAct

Opo|sq^Bωpg_Bφ¹^,α|g,oq

¯¯

gα|¯¯¯go¯ Tps¹|s,αq, (4.30) where ^Bωpg_Bφ¹^,α|g,oq

¯¯

gα|¯¯¯g¯o is computed using Equation (4.15).

Finally, the following shows how to compute the gradient of the infinite sum in Term 2. From Lemma 4.2.3,

tÑ8lim ÿt k“0

TT^kÑT “ pI´TTÑTq^´¹. (4.31) This implies that,

∇Φ

´limtÑ8řt

k“0T_T^k_ÑT¯

“ ∇Φ

`pI´TTÑTq^´¹˘

“ ´pI´TTÑTq^´¹∇ΦpI´TTÑTqpI´TTÑTq^´¹

“ `pI´TTÑTq^´¹∇ΦTTÑTpI´TTÑTq^´¹

“ ´

I`TTÑT `T_T²_ÑT . . .¯

∇ΦTTÑT

I`TTÑT `T_T²_ÑT . . .¯ , (4.32) where line 1 implies line 2 using linear algebra identities [112] and can be derived easily by diﬀeren- tiating both sides of the equationpI´TTÑTqpI´TTÑTq^´¹ “I. Thus the computation has been reduced to computing∇ΦTTÑT which is done using Equation (4.30).

In closing, the aggregate of these computations yield the gradient of the probability of satisfaction ofϕwhen the structure of the FSC is fixed. For∇P t∇Θ,∇Φu

∇PrpPM(ϕq “ ∇Pr`

πÑϕ-RecSets^G|ι^{P M}_init^ϕ˘

“ ř

RkĎϕ-RecSets^G

∇Pr`

πÑR_k|ι^{P M}_init^ϕ˘

. (4.33)

From Equation (4.22)

∇ΘPr`

πÑRk|ι^{P M}_init^ϕ˘

“ ´

∇Θ⃗ι^{P M}_init^ϕ^,GpRkq¯T

⃗1_|R_k_|ˆ1

`´

∇Θ⃗ι^{P M}_init^ϕ^,GpTq¯T´

I`TTÑT `T_T²_Ñ_T `. . .¯

TTÑRk⃗1_|R_k_|ˆ1 ,

(4.34) and

∇ΦPr`

πÑRk|ι^{P M}_init^ϕ˘

“ ´

⃗ι^{P M}_init^ϕ^,GpTq¯T

∇Φ

I`TTÑT `T_T²_ÑT `. . .¯

TTÑRk⃗1_|R_k_|ˆ1

looooooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooooon Grad Term 1

`´

⃗ι^{P M}_init^ϕ^,GpTq¯T´

I`TTÑT `T_T²_Ñ_T `. . .¯

∇ΦTTÑRk⃗1_|R_k_|ˆ1

looooooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooooon Grad Term 2

. (4.35) 4.2.4.1 Complexity and Eﬃcient Computation

For the gradients of Terms 1 and 2, one source of computational complexity are the infinite sums

∇Φ

I`TTÑT `T_T²_Ñ_T `. . .¯ and´

I`TTÑT `T_T²_Ñ_T `. . .¯

respectively, where computing each power term successively has Op|T|³q complexity. However, this computation can be reduced to quadratic complexity by using a similar trick as in Section 4.2.3.1.

First, in gradient of Term 2, note that the infinite sum is pre-multiplied by a row vector

´⃗ι^{P M}_init^ϕ^,G¯T

. To compute this product the following iteration can be setup. Initialize variables v_o¹ andx¹0

v¹0 “ ´

⃗ι^{P M}_init^ϕ^,G¯T

, and x¹0 “ 0.

(4.36)

Then, carry out the iteration

v_n`¹ 1 “ v_n¹TTÑT, and

x¹_n`1 “ x¹_n`v_n¹ (4.37)

until||x¹_n`1´x¹_n||₈ďεx¹, whereεx¹ is a given tolerance. Next, for the gradient of Term 1, rewrite the second term in Equation (4.35) using Equation (4.32)

∇ΦTerm 1 “

⃗ι^{P M}_init^ϕ^,G¯T´

I`TTÑT `TT²ÑT `. . .¯ loooooooooooooooooooooooooomoooooooooooooooooooooooooon

Term A

∇ΦTTÑT

I`TTÑT `TT²ÑT `. . .¯

TTÑRk⃗1_|R_k_|ˆ1

looooooooooooooooooooooooooomooooooooooooooooooooooooooon Term B

. (4.38) Note that Terms A and B are the same quantities that are computed in the iterative schemes of Equations (4.36-4.37) and (4.26-4.27) respectively.

For the gradient of the absorption probability w.r.t. Φ, as shown in Equation (4.34), all that remains now is to establish the overall complexity of computing ∇ΦT. This is can be quite an expensive operation, since the gradient must be taken for each element of T with respect to each φ P Φ. In the worst case, the complexity is given by Op|S|²|G|²|Φ||Act||O|q. However, for systems that are described by sparse transition and observation functions, the practical complexity is Opc|S||G||Φ||Act|qwithc! |G||G||O|.

For the gradient w.r.tΘ, the complexity of evaluating∇Θ⃗ι^{P M}_init^ϕ^,GpRkqis given byOp|S|²|Θ|q.

4.2.4.2 Gradient Based Optimization

In the preceding sections, the analytical expression for the gradient of satisfaction probability was derived. This gradient can be used in first order methods [150] to search or optimize the following objective

maxΦ,Θ PrpPM(ϕq ùñ max

Φ,Θ Pr`

πÑϕ-RecSets^G|ι^{P M}_init^ϕ˘ ùñ max

Φ,Θ

RkĎϕ-^RecSets^GPr`

πÑRk|ι^{P M}_init^ϕ˘

(4.39)

over the parametersΦandΘ.

4.3 Ensuring Non-Infinitesimal Frequency of Visiting Repeat

^PM^ϕ

Dalam dokumen Formal Methods for Control Synthesis in Partially Observed Environments: Application to Autonomous Robotic (Halaman 62-68)