Directory UMM :Data Elmu:jurnal:O:Operations Research Letters:Vol26.Issue4.2000:

(1)

Operations Research Letters 26 (2000) 155–158

www.elsevier.com/locate/dsw

A fast algorithm for the transient reward distribution in

continuous-time Markov chains

H.C. Tijms

a;∗

_{, R. Veldman}

b

a_{Department of Econometrics, Vrije Universiteit, De Boelelaan, 1081 HV Amsterdam, The Netherlands} b_{ORTEC Consultants, P.O. Box 490, 2800 AL Gouda, The Netherlands}

Received 1 July 1998; received in revised form 1 November 1999

Abstract

This note presents a generally applicable discretization method for computing the transient distribution of the cumulative reward in a continuous-time Markov chain. A key feature of the algorithm is an error estimate for speeding up the calculations. The algorithm is easy to program and is numerically stable. c2000 Elsevier Science B.V. All rights reserved.

Keywords:Transient reward distribution; Discretization algorithm

1. Introduction

A fundamental and practically important problem is the calculation of the transient probability distri-bution of the cumulative reward in a continuous-time Markov chain. Practical applications include computer systems and oil-production platforms with guaranteed levels of performability over nite time periods. In such systems high penalties must be paid when not meeting the guaranteed levels of performability over the specied time interval and thus the computation of the probability of not meeting the performability re-quirement becomes necessary. More specically, con-sider a continuous-time Markov chain {X(t); t¿₀_} with nite state space I and innitesimal transition ratesqij; j6=i. A reward at rateriis earned for each

∗_{Corresponding author.}

unit of time the system is in statei. Dening the ran-dom variable O(t) by

O(t) = the cumulative reward earned up to timet;

the problem is to calculate the transient reward distri-butionP{O(t)¿ x}for given value of t. An impor-tant special case of this general problem is the case in which the state space I splits up in two disjoint setsIo andIf of operational and failed states and the

reward functionri= 1 fori∈Io andri= 0 fori∈If.

In this case, the transient distribution of the cumula-tive reward reduces to the transient distribution of the cumulative operational time. A transparent and ele-gant algorithm for the 0 –1 reward case was given in De Souza e Silva and Gail [1] using the well-known idea of uniformization in continuous-time Markov chains. Recently, De Souza e Silva and Gail [2] ex-tended their algorithm for the 0 –1 case to the case of general reward rates. Unfortunately, this generalized

(2)

156 H.C. Tijms, R. Veldman / Operations Research Letters 26 (2000) 155–158

algorithm is quite complicated and, worse, it is not always numerically stable so its numerical answers may be unreliable. In this note, we present an alter-native algorithm that is both easy to program and numerically stable. This alternative algorithm is based upon discretization. The discretization algorithm it-self is a direct generalization of an earlier algorithm proposed by Goyal and Tantawi [3] for the 0 –1 case. However, we could considerably speed up this naive discretization algorithm by using a remarkably sim-ple estimate for the error made in the discretization. The discretization method has the additional advan-tage of being directly extendable to continuous-time Markov chains with time-dependent transition rates.

2. The discretization algorithm

Let us rst dene fi(t; x) as the joint probability

density of the cumulative reward O(t) up to time t

and the stateX(t) of the process at time t. In other

In the sequel, it is assumed that the reward rates

ri are nonnegative integers. It is no restriction to

make this assumption. Next, we discretize x and t

in multiples of , where is chosen suciently small (i.e. the probability of more than one state transition during a time period should be small). For xed ¿0, the probability density fi(t; x) is

approximated by a discretized function f

i (t; x). In

view of the probabilistic interpretation offi(t; x)x,

this discretized function is dened by the recursion scheme

where{i; i∈I}denotes the probability distribution

of the starting state at time 0. Using the simple-minded approximation

The computational complexity of this algorithm is O(|Nt(t−x)|=2) where N denotes the number of states. In practical applications, one is usually inter-ested inP{O(t)¿ x} forxsuciently close tormaxt

The advantage of this representation is that it requires fewer function evaluations of f

i (u; y). For xed x

and t, the computational eort of the algorithm is proportional to 1=2 _{and so it quadruples when}_is

halved. Thus, the computation time of the algorithm gets very large when the probability P{O(t)¿ x}

is desired in a high accuracy. Another drawback is that no estimate is given for the discretization error. Fortunately, both diculties can be overcome. De-note byP() the right-hand side of (1) (or (2)), and let e() be the dierence between the exact value of P{O(t)¿ x} and the approximate value P(). The following remarkable result was empirically found:

e()≈P()−P(2); (3) when is not too large. In other words, the esti-mateP() of the true value ofP{O(t)¿ x}is much improved when it is replaced by

˜

(3)

H.C. Tijms, R. Veldman / Operations Research Letters 26 (2000) 155–158 157 Table 1

Numerical results for the rst example

t= 75; x= 178 t= 75; x= 191

Numerical results for the second example

t= 25; x= 110 t= 25; x= 115

We could not nd in the literature on partial dierential equations a result covering (3) nei-ther we could prove (3) directly. However, on the basis of numerous examples tested, it is our conjecture that e()∼P() − P(2) for

→0.

The numerical results in Tables 1 and 2 demon-strate convincingly that replacing P() by ˜P() =

P() + (P() − P(2)) leads to great improve-ments in accuracy and thus to considerable reduc-tions in computation times. Table 1 refers to a production-reliability system with three operating units and a single repairman. The operation time of each unit is exponentially distributed with mean 1= = 10 and the repair time of any failed unit is exponentially distributed with mean 1= = 1. The repairman can repair only one unit at a time. The system earns a reward at rate ri=i for each unit of

time that i units are in operation. The system starts with all units in good condition. This example can be formulated as a continuous-time Markov chain with state space I ={0;1;2;3}, where state i means that

i units are in operation. The innitesimal transition rates are given byqi; i−1=iandqi; i+1=. In Table

2 we give some results for a second example dealing with a continuous-time Markov chain with state space

I ={1;2;3;4;5}, initial state i = 1, reward vector

Remark. The discretization algorithm needs only a minor modication when in addition to the reward ratesri a xed jump rewardFki is earned each time

(4)

158 H.C. Tijms, R. Veldman / Operations Research Letters 26 (2000) 155–158

We found empirically that the error estimate for speed-ing up the convergence also applies in the case with xed jump rewards.

References

[1] E. de Souza e Silva, H.R. Gail, Calculating cumulative operational time distributions of repairable computer systems, IEEE Trans. Comput. 35 (1986) 322–332.

[2] E. de Souza e Silva, H.R. Gail, An algorithm to calculate transient distributions of cumulative rate and impulse rate based rewards, Stochastic Models 14 (1998) 509–536.