4.2 Maximizing Probability of LTL Satisfaction for an FSC
ordering ĺ1. That is, for any two Ri,Rj, either Ri ĺ1 Rj or Rj ĺ1 Ri, and ĺ1 also satisfies the transitivity and antisymmetry conditions of Definition 4.2.1. Let eachRk PRecSetsGpIqhave total order ĺkĎRkˆRk over its member global states. Further, since some states in the global state spaceSˆG are transient, collect all transient states in the setT and endow it with an arbitrary, but fixed total ordering,ĺTĎ T ˆT. Finally define the following relationĺ on the global state spaceSˆGas follows. For@rs, gs,rs1, g1s PSˆG,
rs, gsĺ“ s1, g1‰
if and only if
$’
’’
’’
’&
’’
’’
’’
%
rs, gs,rs1, g1s PRk and rs, gsĺk rs1, g1s, or, rs, gs PRk,rs1, g1s PRl, k‰l, andRk ĺ1Rl, or, rs, gs PRk ĎRecSetsG and rs1, g1s PT, or, rs, gs,rs1, g1s PT and rs, gsĺT rs1, g1s.
(4.19)
Proposition 4.2.2 The relation ĺin Equation (4.19) defines a total order over the set of states SˆG.
Proof Sketch : See Appendix B.2.
When the transition probabilities of the controlled system are written in matrix form, then the above ordering results in acanonical block diagonal form
TP Mϕ,G “
»
——
——
——
——
—–
TR1 0 . . . 0 0
0 TR2 . . . 0 0
... ... . .. ... ...
0 0 . . . TRN 0
TTÑR1 TTÑR2 . . . TTÑRN TTÑT
fi ffiffi ffiffi ffiffi ffiffi ffifl
|S||G|ˆ|S||G|
(4.20)
where the matrices TRk, corresponding to the recurrent sets, are stochastic (each row sums to 1), and can be directly used to represent the restriction of the global Markov chain toRk.
Similar to the canonical form in Equation (4.20), the same global state ordering in Equation (4.19) results in a block form for the initial distribution of the controlled system as follows
⃗ιP Minitϕ,G “
»
——
——
——
——
—–
⃗ιP Minitϕ,GpR1q
⃗ιP Minitϕ,GpR2q ...
⃗ιP Minitϕ,GpRNq
⃗ιP Minitϕ,GpTq fi ffiffi ffiffi ffiffi ffiffi ffifl
|S||G|ˆ1
. (4.21)
4.2.3 Probability of Absorption in ϕ-feasible Recurrent Sets
Recall that for a given FSCG PGpIqof fixed structure, the recurrent set ϕ-RecSetsG, which is a subset of the global state spaceS ˆG, denotes the union of all recurrent sets that areϕ-feasible, and is uniquely determined by the structure, I. This section aims to compute the probability of absorption into this set, given the initial distribution of the product-POMDP, PMϕ. It will be shown to be a function of parameters Φ and Θ and analytical expressions of the probability of absorption in terms of these parameters will be derived.
The probability of absorption into a recurrent setRkfor finite Markov chains is well known [71].
Let ⃗1Mˆ1 denote a column vector of size M with all entries equal to 1. Then using the block decomposition in Equations (4.20) and (4.21), the following holds.
ΛpRkq “ Pr`
πÑRk|ιP Minitϕ˘
“ ´
⃗ιP Minitϕ,GpRkq¯T
⃗1|Rk|ˆ1
loooooooooooooomoooooooooooooon Term 1
`´
⃗ιP Minitϕ,GpTq¯T´
I`TTÑT `TT2ÑT `. . .¯
TTÑRk⃗1|Rk|ˆ1
looooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooon Term 2
.
(4.22)
In the above equation, Term 1 is simply the probability that under the initial distribution,⃗ιP Minitϕ,G, the initial global stateps0, g0q PSˆGis in the recurrent setRk. If this is so, any resulting path of the controlled system is guaranteed to remain inRk forever. Next, Term 2 can be rewritten as
Term 2“ ÿ8 t“0
´
⃗ιP Minitϕ,GpTq¯T´ TTÑT
¯t
TTÑRk⃗1|Rk|ˆ1. (4.23)
For each t, the corresponding summand denotes the probability that the execution started and stayed in some transient state in T, until exactlyt-th time step before getting absorbed in Rk at time stept`1. The following lemma shows that the infinite sum in Equation (4.23) converges.
Lemma 4.2.3 [71] : The limit
tÑ8lim ÿt k“0
TTkÑT (4.24)
exists and is equal topI´TTÑTq´1.
Equation (4.22) with Lemma 4.2.3 allows the probability of absorption in any ϕ-feasible set to be computed as
Pr`
πÑϕ-RecSetsG|ιP Minitϕ˘
“ ř
RkĎϕ-RecSetsG
ΛpRkq
“ ř
RkĎϕ-RecSetsGPr`
πÑRk|ιP Minitϕ˘ ,
(4.25)
which gives the LTL satisfaction probability as a function of the parameters ofG.
4.2.3.1 Complexity and Efficient Approximation
Before going into the details of how Equation (4.22) is used to optimize the satisfaction probability of ϕ, it is worthwhile to look at the complexity and efficiency of computing the r.h.s. of that equation. One source of computational complexity arises from the need to compute the recurrent sets of the global Markov chain. The growth of this complexity is analyzed in Section 4.5.1. Here, the computational complexity arising due to the infinite sum in Term 2 of Equation (4.22) is presented.
Lemma 4.2.3 offers one method of computing the r.h.s using the infinite sum 1`TTÑT`TT2ÑT`. . ., by invertingp1´TTÑTq, which has complexityOp|T|3q. A less computationally intensive method to compute Term 2 is by recognizing that this sum is finally multiplied by TTÑRk⃗1|Rk|ˆ1, which computes to a column vector of size|Rk| ˆ1. As suggested in [1], this allows for an iterative method with lower complexity as follows. Initialize variablesv0 andx0
v0 “ TTÑRk⃗1|Rk|ˆ1,
x0 “ 0 (4.26)
Then iterate the equations
vn`1 “ TTÑTvn, xn`1 “ xn`vn
(4.27) until convergence, i.e., for some toleranceεxą0,DNx s.t. ||xN´xN´1||8ďεx @ N ěNx. The convergence of Equation (4.27) is guaranteed because it is well known thatTTnÑT Ñ0 asnÑ 8[71].
Then, use the approximation
Term 2{ “´
⃗ιP Minitϕ,G¯T
xNx. (4.28)
This approximation method has the complexityOp|Rk|2Nxq. In fact, if the underlying POMDP’s transition distribution is sparse, then sparse matrix multiplication and addition can be used in the approximation method, whose practical complexity reduces to Opc|Rk|Nxq with c ! |Rk|. The sparsity assumption may appear in many engineering examples such as robotic systems in which only a few other states are reachable from a particular state. The constantc is dependent on the sparsity level.
4.2.4 Gradient of Probability of Absorption
Equation (4.22) provides an expression for the probability of absorption: an analytical expression for its gradient with respect to Φand Θcan also be derived. In Terms 1 and 2 of Equation (4.22), the expression⃗ιP Minitϕ,Gis a function only of the parametersΘvia the initial FSC node distributionκ.
In Term 2, the expressions limtÑ8řt
k“0TTkÑT andTTÑRkare a function only of theΦparameters.
It suffices to provide the derivative of Terms 1 and 2 w.r.t. Θ, and the derivative of Term 2 w.r.t.
Φ. The rest of this section computes these quantities.
From Equation (3.1) in the definition of a global Markov chain
ιP Minitϕ,Gprs, gsq “ιP Minitϕpsqκpg|ιP Minitϕ,Θq ùñ BιP Minitϕ,Gprs, gsq Bθi
“ιP MinitϕpsqBκpg|ιP Minitϕ,Θq Bθi
, (4.29) where Bκpg|ιP M
ϕ init ,Θq
Bθi can be computed using Equation (4.17).
Next, it is shown how to compute the gradient of a general entry in the matrixTP Mϕ,G. From Equation (3.2)
TP Mϕ,Gprs1, g1s|rs, gs,Φq “ ř
oPO
ř
αPAct
Opo|sqωpg1,α|g, oqTps1|s,αq
ùñ BT
P Mϕ,Gprs1,g1s|rs,gs,Φq
Bφg¯¯α|¯¯¯g¯o “ ř
oPO
ř
αPAct
Opo|sqBωpgBφ1,α|g,oq
¯¯
gα|¯¯¯go¯ Tps1|s,αq, (4.30) where BωpgBφ1,α|g,oq
¯¯
gα|¯¯¯g¯o is computed using Equation (4.15).
Finally, the following shows how to compute the gradient of the infinite sum in Term 2. From Lemma 4.2.3,
tÑ8lim ÿt k“0
TTkÑT “ pI´TTÑTq´1. (4.31) This implies that,
∇Φ
´limtÑ8řt
k“0TTkÑT¯
“ ∇Φ
`pI´TTÑTq´1˘
“ ´pI´TTÑTq´1∇ΦpI´TTÑTqpI´TTÑTq´1
“ `pI´TTÑTq´1∇ΦTTÑTpI´TTÑTq´1
“ ´
I`TTÑT `TT2ÑT . . .¯
∇ΦTTÑT
´
I`TTÑT `TT2ÑT . . .¯ , (4.32) where line 1 implies line 2 using linear algebra identities [112] and can be derived easily by differen- tiating both sides of the equationpI´TTÑTqpI´TTÑTq´1 “I. Thus the computation has been reduced to computing∇ΦTTÑT which is done using Equation (4.30).
In closing, the aggregate of these computations yield the gradient of the probability of satisfaction ofϕwhen the structure of the FSC is fixed. For∇P t∇Θ,∇Φu
∇PrpPM(ϕq “ ∇Pr`
πÑϕ-RecSetsG|ιP Minitϕ˘
“ ř
RkĎϕ-RecSetsG
∇Pr`
πÑRk|ιP Minitϕ˘
. (4.33)
From Equation (4.22)
∇ΘPr`
πÑRk|ιP Minitϕ˘
“ ´
∇Θ⃗ιP Minitϕ,GpRkq¯T
⃗1|Rk|ˆ1
`´
∇Θ⃗ιP Minitϕ,GpTq¯T´
I`TTÑT `TT2ÑT `. . .¯
TTÑRk⃗1|Rk|ˆ1 ,
(4.34) and
∇ΦPr`
πÑRk|ιP Minitϕ˘
“ ´
⃗ιP Minitϕ,GpTq¯T
∇Φ
´
I`TTÑT `TT2ÑT `. . .¯
TTÑRk⃗1|Rk|ˆ1
looooooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooooon Grad Term 1
`´
⃗ιP Minitϕ,GpTq¯T´
I`TTÑT `TT2ÑT `. . .¯
∇ΦTTÑRk⃗1|Rk|ˆ1
looooooooooooooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooooooooooooon Grad Term 2
. (4.35) 4.2.4.1 Complexity and Efficient Computation
For the gradients of Terms 1 and 2, one source of computational complexity are the infinite sums
∇Φ
´
I`TTÑT `TT2ÑT `. . .¯ and´
I`TTÑT `TT2ÑT `. . .¯
respectively, where computing each power term successively has Op|T|3q complexity. However, this computation can be reduced to quadratic complexity by using a similar trick as in Section 4.2.3.1.
First, in gradient of Term 2, note that the infinite sum is pre-multiplied by a row vector
´⃗ιP Minitϕ,G¯T
. To compute this product the following iteration can be setup. Initialize variables vo1 andx10
v10 “ ´
⃗ιP Minitϕ,G¯T
, and x10 “ 0.
(4.36)
Then, carry out the iteration
vn`1 1 “ vn1TTÑT, and
x1n`1 “ x1n`vn1 (4.37)
until||x1n`1´x1n||8ďεx1, whereεx1 is a given tolerance. Next, for the gradient of Term 1, rewrite the second term in Equation (4.35) using Equation (4.32)
∇ΦTerm 1 “
´
⃗ιP Minitϕ,G¯T´
I`TTÑT `TT2ÑT `. . .¯ loooooooooooooooooooooooooomoooooooooooooooooooooooooon
Term A
∇ΦTTÑT
´
I`TTÑT `TT2ÑT `. . .¯
TTÑRk⃗1|Rk|ˆ1
looooooooooooooooooooooooooomooooooooooooooooooooooooooon Term B
. (4.38) Note that Terms A and B are the same quantities that are computed in the iterative schemes of Equations (4.36-4.37) and (4.26-4.27) respectively.
For the gradient of the absorption probability w.r.t. Φ, as shown in Equation (4.34), all that remains now is to establish the overall complexity of computing ∇ΦT. This is can be quite an expensive operation, since the gradient must be taken for each element of T with respect to each φ P Φ. In the worst case, the complexity is given by Op|S|2|G|2|Φ||Act||O|q. However, for sys- tems that are described by sparse transition and observation functions, the practical complexity is Opc|S||G||Φ||Act|qwithc! |G||G||O|.
For the gradient w.r.tΘ, the complexity of evaluating∇Θ⃗ιP Minitϕ,GpRkqis given byOp|S|2|Θ|q.
4.2.4.2 Gradient Based Optimization
In the preceding sections, the analytical expression for the gradient of satisfaction probability was derived. This gradient can be used in first order methods [150] to search or optimize the following objective
maxΦ,Θ PrpPM(ϕq ùñ max
Φ,Θ Pr`
πÑϕ-RecSetsG|ιP Minitϕ˘ ùñ max
Φ,Θ
ř
RkĎϕ-RecSetsGPr`
πÑRk|ιP Minitϕ˘
(4.39)
over the parametersΦandΘ.