Excursus on Markov chains - Formal Methods for Control Synthesis in Partially Observed Environm

Before proposing algorithms to solve the quantitative problems mentioned above, note that LTL formulas are verified over infinite executions of the POMDP, with the concern that certain states are visited infinitely often while others are avoided completely after a finite number of execution steps.

Since a finite state controller leads to a finite state space Markov chain when controlling a POMDP, the long term (steady state) behavior of finite Markov chains is the key to synthesizing controllers that satisfy LTL formulas. This observation also relates to the qualitative questions posed in Section 3.3.1. On the other hand, the short term (transient) behavior will be crucial to the analysis of the quantitative problems in Section 3.3.2.

In this section, a few well known properties of Markov chains, especially those with finite state space, are reviewed. A full mathematical background can be found in [56, 71, 101, 126, 127]. In the following, I will consider a Markov chainMwith state spaceS, transition probability defined as the conditional distributionTp.|sq:SÑ r0.1ssuch that

s¹PS

Tps¹|sq “1, @sPS, and the initial distributionι_init such that

sPS

ι_initpsq “1.

All probabilities and expectation operators assume the underlying probability space associated with pathsπ“s0s1. . ., as described in Section 2.2.4, via cylinder sets.

Note that the transition probabilities T form a linear operator which can be represented as a matrix. HenceforthTwill denote both the individual conditional distributions and the overall matrix representation. The conditional distribution form will be inferred when they appear to take explicit arguments, usually denoted asTp.|., .q. The matrix form will be inferred when expressions involving vectors and matrices are encountered.

T :“

——

—— –

T11 T12 . . . T1|S|

T21 T22 . . . T2|S|

... . .. ... T_|S|1 T_|S|2 . . . T_|S||S|

fi ffiffi ffiffi ffiffi fl

:MS ÑMS (3.14)

where

Tij “Tpsj|siq.

Next, a distribution or belief⃗bt over states of the Markov chain at some time t can be written

as a row vector

⃗bt“´

b_tps1q b_tps2q . . . b_tps_|S|q

. (3.15)

Then the operator T maps b_t to the state distribution or belief b_t`1 at time t`1. Using matrix representation this is⃗b_t`1“⃗btT.

Definition 3.4.1 (Occupation Time, First Return Time and Return Probability) Letπ“ s0s1. . . be a path in the Markov chain and AĎS,

(a) The variable

fA:“

ÿ8 t“1

pstPAq (3.16)

is called the occupation time of set Aand

pφq “

1 the mathematical statementφholds.

0 otherwise,

(3.17)

is the indicator function. ThusfA counts the number of times the set Ais visited after time step0.

(b) Next, the variable

τA:“minttě1|stPAu (3.18)

is called the first return time, denoting the first time after time0 that setAis visited.

Lps, Aq:“PrpτAă8|s0“sq (3.19)

as the return probability. It denotes the probability of set A being visited in finite time when the start state iss.

By abuse of notation, when A is a singleton set, i.e, A “ ts¹u for some s¹ P S, then fs¹, τs¹ and Lps, s¹qwill respectively denote the occupation time, first return time and return probability.

Definition 3.4.2 (Communicating Classes) The state s P S is said to lead to state s¹ P S, denoted s Ñ s¹, if Lps, s¹q ą 0. By convention s Ñ s. Next, distinct states s, s¹ are said to communicate, denoted sØs¹ when Lps, s¹qą0 andLps¹, sqą0. Moreover, the relation “Ø” is an equivalence relation, and equivalence classesCpsq “s¹:sØs¹ coverS, with sPCpsq[56].

Definition 3.4.3 (Irreducibility and Absorbing Sets) If Cpsq “S for some s P S, then the Markov chain,M, is called irreducible. This means that all states communicate. In addition, Cpsq

is absorbing if

s²PCpsq

Tps²|s¹q “1 @s¹PCpsq. (3.20) Definition 3.4.4 (Restriction of Mto an Absorbing Set) Let C Ď S be an absorbing set.

Then by Definition 3.4.3, if the initial state s0 lies in C, then for any path π “ s0s1. . ., s_t lies inC for alltě0. Hence, the Markov chain can be studied exclusively in the smaller state space C, and is called the restriction of MtoC. It is denoted by M_S|C.

An absorbing set is alternatively calledinvariant orstochastically closed. It is possible that some communicating classCpsq is not absorbing. In such a case there existss¹ RCpsq such thatsÑs¹. An absorbing set is said to be minimal if it does not contain a proper subset that is absorbing. A Markov chainMisindecomposable ifS does not contain two disjoint absorbing sets.

Definition 3.4.5 (Recurrence and Transience) The statesPSis calledrecurrentifErfs|s0“ss “ 8 and transientif Erfs|s0“ssă8, withfsgiven by Equation (3.16).

Recurrence and transience are class properties. In fact, recurrent classes coincide with minimally absorbing classes. Furthermore, let m_s “ Erτss. Then state s P S is called positive recurrent if msă8, andnull recurrent ifms“ 8. In a recurrent class either all states are positive recurrent or all null recurrent. In addition, for a finite discrete-time Markov chain, all recurrent classes are positive recurrent [56].

Definition 3.4.6 (Ergodic Markov Chain) A Markov chainMis said to beergodicif the whole state space S is a single unique recurrent class. Equivalently, it is ergodic if it is irreducible and positive recurrent.

Definition 3.4.7 (Invariant and Ergodic Probability Measures) LetνPMS be a probability measure (p.m.) on S. Then, ν is an invariant p.m. if it remains unchanged when operated upon by the transition operatorT. In vector/matrix representation, this can be written as

⃗

νT “⃗ν (3.21)

An invariant p.m. ν is ergodic if νpAq “0 or νpAq “ 1 for every invariant set A subseteqS.

Here

νpAq “ ÿ

sPA

νpsq.

Proposition 3.4.8 [56] LetT be the transition probability function of Markov chain M. IfT has a unique invariant p.m. ν, thenν is ergodic.

Definition 3.4.9 (Occupation Measures) Define the t-step expected occupation measure with initial states0 as

T^ptqpA|s0q:“ ÿ

sPA

1 t

t´1

k“0

T^kps|s0q, AĎS, t“1,2, . . . (3.22) Note that in the r.h.s. of Equation (3.22), T^k is the composition of T with itself k´1 times, i.e, T^k “ Tlooooomooooon˝ ¨ ¨ ¨ ˝T˝

k´1times

T. It has the eﬀect of transforming a belief or distribution time step t to the distribution at time step t`k. In matrix notation, this computation can be realized by taking the kth power of the matrixT, to get T^k.

Additionally, an emprical or pathwise occupation measure can be defined as follows

π^ptqpAq “ 1 t

ÿt k“1

pskPAq, AĎS, t“1,2, . . . . (3.23)

Proposition 3.4.10 [56] The expected value of the path wise occupation measure is the t´step expected occupation measure.

E”

π^ptqpAq|s0

ı“T^ptqpA|s0q, @tě1. (3.24)

Proposition 3.4.11 [56] (a) For everys, s¹PS the following limit exists:

tÑ8limT^ptqps¹|sq “ lim

tÑ8

1 t

t´1

k“0

T^kps¹|sq “

$’

’%

ρ_s1|s if s¹ is recurrent 0 if s¹ is transient .

(b) For every positive recurrent state sPS with period d_s

tÑ8limT^tps|sq “ d_s ms

wherems:“Espτsqis the expected time of the first return to states when starting ins.

c|sc “νpscq is independent ofs¹_c. In addition the collection νpsc₁q, vpsc₂q, . . . , vpsc_|C|qgives the unique invariant p.m. of the restriction ofMto the class C.

Definition 3.4.12 (Limiting Matrix) From Proposition 3.4.11, the matrix representation ofT^ptq is given by the Cesaro sum [71],

T^ptq“ 1 t

t´1

k“0

T^k, t“1,2, . . . (3.25)

and the limiting matrix

Π:“ lim

tÑ8T^ptq (3.26)

exists for all finite Markov chains.

Proposition 3.4.13 Given the limiting matrix Π, the limit of the Cesaro sum of transition matrix T, the quantity I´T`Πis non-singular and its inverse

Z :“ pI´T`Πq^´¹ (3.27)

is called the fundamental matrix [15, 56, 123].

Dalam dokumen Formal Methods for Control Synthesis in Partially Observed Environments: Application to Autonomous Robotic (Halaman 48-52)