• Tidak ada hasil yang ditemukan

Before proposing algorithms to solve the quantitative problems mentioned above, note that LTL formulas are verified over infinite executions of the POMDP, with the concern that certain states are visited infinitely often while others are avoided completely after a finite number of execution steps.

Since a finite state controller leads to a finite state space Markov chain when controlling a POMDP, the long term (steady state) behavior of finite Markov chains is the key to synthesizing controllers that satisfy LTL formulas. This observation also relates to the qualitative questions posed in Section 3.3.1. On the other hand, the short term (transient) behavior will be crucial to the analysis of the quantitative problems in Section 3.3.2.

In this section, a few well known properties of Markov chains, especially those with finite state space, are reviewed. A full mathematical background can be found in [56, 71, 101, 126, 127]. In the following, I will consider a Markov chainMwith state spaceS, transition probability defined as the conditional distributionTp.|sq:SÑ r0.1ssuch that

ÿ

s1PS

Tps1|sq “1, @sPS, and the initial distributionιinit such that

ÿ

sPS

ιinitpsq “1.

All probabilities and expectation operators assume the underlying probability space associated with pathsπ“s0s1. . ., as described in Section 2.2.4, via cylinder sets.

Note that the transition probabilities T form a linear operator which can be represented as a matrix. HenceforthTwill denote both the individual conditional distributions and the overall matrix representation. The conditional distribution form will be inferred when they appear to take explicit arguments, usually denoted asTp.|., .q. The matrix form will be inferred when expressions involving vectors and matrices are encountered.

T :“

»

——

——

—— –

T11 T12 . . . T1|S|

T21 T22 . . . T2|S|

... . .. ... T|S|1 T|S|2 . . . T|S||S|

fi ffiffi ffiffi ffiffi fl

:MS ÑMS (3.14)

where

Tij “Tpsj|siq.

Next, a distribution or belief⃗bt over states of the Markov chain at some time t can be written

as a row vector

⃗bt“´

btps1q btps2q . . . btps|S|q

¯

. (3.15)

Then the operator T maps bt to the state distribution or belief bt`1 at time t`1. Using matrix representation this is⃗bt`1“⃗btT.

Definition 3.4.1 (Occupation Time, First Return Time and Return Probability) Letπ“ s0s1. . . be a path in the Markov chain and AĎS,

(a) The variable

fA:“

ÿ8 t“1

pstPAq (3.16)

is called the occupation time of set Aand

pφq “

$&

%

1 the mathematical statementφholds.

0 otherwise,

(3.17)

is the indicator function. ThusfA counts the number of times the set Ais visited after time step0.

(b) Next, the variable

τA:“minttě1|stPAu (3.18)

is called the first return time, denoting the first time after time0 that setAis visited.

(c) Lastly, define

Lps, Aq:“PrpτAă8|s0“sq (3.19)

as the return probability. It denotes the probability of set A being visited in finite time when the start state iss.

By abuse of notation, when A is a singleton set, i.e, A “ ts1u for some s1 P S, then fs1, τs1 and Lps, s1qwill respectively denote the occupation time, first return time and return probability.

Definition 3.4.2 (Communicating Classes) The state s P S is said to lead to state s1 P S, denoted s Ñ s1, if Lps, s1q ą 0. By convention s Ñ s. Next, distinct states s, s1 are said to communicate, denoted sØs1 when Lps, s1qą0 andLps1, sqą0. Moreover, the relation “Ø” is an equivalence relation, and equivalence classesCpsq “s1:sØs1 coverS, with sPCpsq[56].

Definition 3.4.3 (Irreducibility and Absorbing Sets) If Cpsq “S for some s P S, then the Markov chain,M, is called irreducible. This means that all states communicate. In addition, Cpsq

is absorbing if

ÿ

s2PCpsq

Tps2|s1q “1 @s1PCpsq. (3.20) Definition 3.4.4 (Restriction of Mto an Absorbing Set) Let C Ď S be an absorbing set.

Then by Definition 3.4.3, if the initial state s0 lies in C, then for any path π “ s0s1. . ., st lies inC for alltě0. Hence, the Markov chain can be studied exclusively in the smaller state space C, and is called the restriction of MtoC. It is denoted by MS|C.

An absorbing set is alternatively calledinvariant orstochastically closed. It is possible that some communicating classCpsq is not absorbing. In such a case there existss1 RCpsq such thatsÑs1. An absorbing set is said to be minimal if it does not contain a proper subset that is absorbing. A Markov chainMisindecomposable ifS does not contain two disjoint absorbing sets.

Definition 3.4.5 (Recurrence and Transience) The statesPSis calledrecurrentifErfs|s0“ss “ 8 and transientif Erfs|s0“ssă8, withfsgiven by Equation (3.16).

Recurrence and transience are class properties. In fact, recurrent classes coincide with minimally absorbing classes. Furthermore, let ms “ Erτss. Then state s P S is called positive recurrent if msă8, andnull recurrent ifms“ 8. In a recurrent class either all states are positive recurrent or all null recurrent. In addition, for a finite discrete-time Markov chain, all recurrent classes are positive recurrent [56].

Definition 3.4.6 (Ergodic Markov Chain) A Markov chainMis said to beergodicif the whole state space S is a single unique recurrent class. Equivalently, it is ergodic if it is irreducible and positive recurrent.

Definition 3.4.7 (Invariant and Ergodic Probability Measures) LetνPMS be a probability measure (p.m.) on S. Then, ν is an invariant p.m. if it remains unchanged when operated upon by the transition operatorT. In vector/matrix representation, this can be written as

νT “⃗ν (3.21)

An invariant p.m. ν is ergodic if νpAq “0 or νpAq “ 1 for every invariant set A subseteqS.

Here

νpAq “ ÿ

sPA

νpsq.

Proposition 3.4.8 [56] LetT be the transition probability function of Markov chain M. IfT has a unique invariant p.m. ν, thenν is ergodic.

Definition 3.4.9 (Occupation Measures) Define the t-step expected occupation measure with initial states0 as

TptqpA|s0q:“ ÿ

sPA

1 t

1

ÿ

k“0

Tkps|s0q, AĎS, t“1,2, . . . (3.22) Note that in the r.h.s. of Equation (3.22), Tk is the composition of T with itself k´1 times, i.e, Tk “ Tlooooomooooon˝ ¨ ¨ ¨ ˝T˝

1times

T. It has the effect of transforming a belief or distribution time step t to the distribution at time step t`k. In matrix notation, this computation can be realized by taking the kth power of the matrixT, to get Tk.

Additionally, an emprical or pathwise occupation measure can be defined as follows

πptqpAq “ 1 t

ÿt k“1

pskPAq, AĎS, t“1,2, . . . . (3.23)

Proposition 3.4.10 [56] The expected value of the path wise occupation measure is the t´step expected occupation measure.

E”

πptqpAq|s0

ı“TptqpA|s0q, @tě1. (3.24)

Proposition 3.4.11 [56] (a) For everys, s1PS the following limit exists:

tÑ8limTptqps1|sq “ lim

tÑ8

1 t

1

ÿ

k“0

Tkps1|sq “

$’

&

’%

ρs1|s if s1 is recurrent 0 if s1 is transient .

(b) For every positive recurrent state sPS with period ds

tÑ8limTtps|sq “ ds ms

.

wherems:“Essqis the expected time of the first return to states when starting ins.

(c) LetC “ tsc1, sc2, . . . , sc|C|u ĎM be a recurrent class and sc, s1c P C. Then ρs1

c|sc “νpscq is independent ofs1c. In addition the collection νpsc1q, vpsc2q, . . . , vpsc|C|qgives the unique invariant p.m. of the restriction ofMto the class C.

Definition 3.4.12 (Limiting Matrix) From Proposition 3.4.11, the matrix representation ofTptq is given by the Cesaro sum [71],

Tptq“ 1 t

1

ÿ

k“0

Tk, t“1,2, . . . (3.25)

and the limiting matrix

Π:“ lim

tÑ8Tptq (3.26)

exists for all finite Markov chains.

Proposition 3.4.13 Given the limiting matrix Π, the limit of the Cesaro sum of transition matrix T, the quantity I´T`Πis non-singular and its inverse

Z :“ pI´T`Πq´1 (3.27)

is called the fundamental matrix [15, 56, 123].