getdocba1c. 198KB Jun 04 2011 12:04:54 AM

(1)

El e c t ro n ic

Jo ur

n a l o

f P

r o

b a b i_{l i t y}

Vol. 14 (2009), Paper no. 63, pages 1863–1883. Journal URL

http://www.math.washington.edu/~ejpecp/

Nonlinear filtering with signal dependent observation

noise

Dan Crisan

∗

, Michael Kouritzin

†

and Jie Xiong

‡

Abstract

The paper studies the filtering problem for a non-classical framework: we assume that the ob-servation equation is driven by a signal dependent noise. We show that the support of the conditional distribution of the signal is on the corresponding level set of the derivative of the quadratic variation process. Depending on the intrinsic dimension of the noise, we distinguish two cases: In the first case, the conditional distribution has discrete support and we deduce an explicit representation for the conditional distribution. In the second case, the filtering problem is equivalent to a classical one defined on a manifold and we deduce the evolution equation of the conditional distribution. The results are applied to the filtering problem where the observa-tion noise is an Ornstein-Uhlenbeck process.

Key words: Nonlinear Filtering, Ornstein Uhlenbeck Noise, Signal-Dependent Noise. AMS 2000 Subject Classification:Primary 60G35; Secondary: 60G30.

Submitted to EJP on October 9, 2007, ﬁnal version accepted August 10, 2009.

∗_Department _of _Mathematics, _Imperial _College _London, ₁₈₀ _Queen’s _Gate, _London _SW7 _2BZ, _UK

(d.crisan@imperial.ac.uk)

†_{Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, T6G 2G1, Canada} (mkouritz@math.ualberta.ca). Research supported by NSERC.

(2)

1 Introduction

Let(Ω,F,P)be a probability space on which we have deﬁned a homogeneous Markov process X. We can obtain information onX by observing an associated process y which is a function ofX plus random noisen:

y_t=h(X_t) +n_t. (1.1)

In most of the literaturent is modeled by white noise, which does not exist in the ordinary sense, but rather as a distribution of a generalized derivative of a Brownian motion. That is, the process W deﬁned formally as

W_t=

Z t

0

n_sds, t≥0,

is a Brownian motion and the (integrated) observation model turns into one of the (classical) form

Yt =

Z t

0 ysds

=

Z t

0

h(X_s)ds+W_t,

whereY is the accumulated observation. In this case, the observationσ-ﬁeld isF_tY =σAug{Y_s, s≤

t}and the desired conditional distribution

π_t(·)=. P(Xt∈ ·|FtY), t≥0,

satisﬁes the Kushner-Stratonovich or Fujisaki-Kallianpur-Kunita equation

dπ_t(f) =π_t(L f)d t+ (π_t(f hT)−π_t(f)π_t(hT))dνˆ_t, (1.2)

where Lis the generator ofX andνˆ, deﬁned as

ˆ

ν_t=Y_t−

Z t

0

π_s(h)ds, t≥0,

is a Brownian motion called the innovation process.

Balakrishnan (see Kallianpur and Karandikar[8], p. 3) states that this (integrated) approach is not suitable for application since the results obtained cannot be instrumented. Kunita[9], Mandal and Mandrekar [11] and Gawarecki and Mandrekar [4]studied the model (1.1) when nis a general Gaussian process. The most important example is the case whennis an Ornstein-Uhlenbeck process given by

dO_t=−βO_td t+βdW_t, (1.3)

andW is a standardm-dimensional Wiener process. The observation model becomes

y_t=h(X_t) +O_t, t≥0 (1.4)

or the integral form

Y_t=

Z t

0

h(X_s)ds+

Z t

0

(3)

To ﬁx the ideas, we let the signalX be aRd-valued process given by

d Xt =b(Xt)d t+c(Xt)d Bt, t≥0

where b, c areRd, respectively Rd×d-valued functions and B is a standard d-dimensional Wiener process independent ofW. The optimal ﬁlterπis given by

π_tf =E(f(X_t)|F_tY).

In the aforementioned papers, a Kallianpur-Striebel formula is given, the ﬁltering equation forπis derived and it is proved thatπconverges to the solution of the (classical) FKK equation asβ tends to inﬁnity. However, the conditions imposed in these papers[9],[11],[4]are very restrictive. Most notably, the authors assume that the map t 7→ h(Xt) is differentiable. To remove this restrictive condition, Bhatt and Karandikar[1]consider the variant observation model

y_t=α

Z t

(t−α−1₎_∨₀

h(X_s)ds+O_t

forα >0 and obtain the same results for this modiﬁed model.

In this paper, we deal with the original model (1.4) but no longer assume differentiability of the map t7→h(Xt). As we will see, this will lead to an observation model with signal dependent observation noise

d Y_t=h(X_t)d t+σ(X_t)dW_t.

In turn, the filtering problem with signal dependent observation noise will be converted into a clas-sical one (with signal independent observation noise) via a suitably chosen stochastic flow mapping. This article is organized as follows: In Section 2 we set the filtering problem with signal dependent observation noise and the framework for transforming this singular filtering problem into the classi-cal one. Then, we discuss this transformation in two cases. In Section 3 we consider the case when the signal dimension is small so (under mild regularity) level set Mz = {x ∈Rd : σ(x) = z} is discrete for each positive definite matrixz. In Section 4 we study the transformation when M_z are manifolds. In this case, we decompose the vectors inRd into their tangent and normal components, and study the signal according to this decomposition. In Section 5, we convert the filtering with OU-noise to a special case of the general singular filtering model.

The methods and results presented here beneﬁted a lot from the work of Ioannides and LeGland (see

[6]and[7]). In particular, in[7], they study the ﬁltering problem with perfect observations. That is, in their set-up, the observation processY is adeterministicfunction of the signal X. Here we show that the ﬁltering problem that we are interested in can be reduced to one where the observation process has two components: one that is perfect (in the language of Ioannides and LeGland ) and one that is of the classical form (see Lemma 2.3 below).

2 A general singular filtering model

Motivated by the ﬁltering problem with OU-process as noise, we will consider a general ﬁltering problem with signal and observation given by

¨

d Xt=b(Xt)d t+c(Xt)dWt+˜c(Xt)d Bt

(4)

where B andW are two independent Brownian motions in Rd andRm respectively, b, c, ˜c, h, σ

are functions deﬁned onRd with values inRd,Rd×m,Rd×d,Rm,Rm×m, respectively. We will assume thatY0=0 and thatX0is independent ofBandW. We will denote the law ofX0 byπ0 and assume

that it is absolutely continuous with respect to the Lebesgue measure onRd and will denote by ˜π₀

its density.

We will also assume thatσ(x)is a symmetric for each x and positive deﬁnite matrix for each x. If not, we can deﬁne them-dimensional process ˜W

dW˜_t =pσσT₍_X t)

−1

σ(X_t)dW_t, t≥0.

Then the pair process(_W˜_,_B)is a standardm+d-dimensional Brownian motion and the system (2.1) is equivalent to the following

¨

d X_t=b(X_t)d t+c(X_t)dW˜_t+˜c(X_t)d B_t d Y_t=h(X_t)d t+pσσT₍_X

t)dW˜t.

The analysis can be easily extended to cover the case when all terms in (2.1) depend on bothX and Y. We make the following assumptions throughout the rest of the paper.

Condition (BC): The functions b, c, ˜c are Lipschitz continuous and h is bounded and measurable. Condition (ND): For any x∈R, the m×m-matrixσ(x)in invertible.

Condition (S): The partial derivatives of the functionσup to order 2 are continuous.

Condition (X0): The law of X₀has a continuous densityπ˜₀with respect to the Lebesgue measure onRd.

Let〈Y〉be the quadratic covariation process of them-dimensional semimartingaleY. From (2.1) it follows that

〈Y〉_t =

Z t

0

σ2 Xt

ds.

Hence, the processσ2(X_t) = _{d t}d 〈Y〉_tisF_tY-measurable for anyt>0.

Remark 2.1. σ2(X₀)may not beF₀Y-measurable, but isF₀Y₊-measurable.

We decompose the observationσ-ﬁeldFY

t into two parts: One is generated byσ(Xt)and another by an observation process of the classical form.

Denote by S_m+ the set of symmetric positive deﬁnite m×m-matrices. Then for x ∈Rd, we have

σ(x) and σ2₍_x₎ _{∈ S}+

m. Next, we deﬁne the mapping a from S +

m to Rm˜ with ˜m =

m(m+1)

2 as

the list of the diagonal entries and those above the diagonal in lexicographical order, i.e., for any r∈ S_m+, a(r)is deﬁned as

a(r)1=r11, a(r)2=r12, ..., a(r)m=r1m, a(r)m+1=r22, ..., a(r)2m−1=r2m,

a(r)2m=r33, ..., a(r)m˜=rmm.

(5)

Lemma 2.2. a(S_m+)is open inRm˜.

Proof: Supposeσ∈Rm×m is symmetric. Then,σ∈ S_m+if and only ifd et(σ_k)>0, k=1, 2,· · ·,m, where σ_k is the k×k sub-matrix obtained from σ by removing the last m−k rows and m−k columns. Note thatd et(σ_k)is a polynomial of the entries inσ. Thus the image a(S_m+)consists of points inRm˜ such that these polynomials of its coordinates are positive. This implies thata(S_m+)is open.

Now letsbe the square root mappings:S_m+ → S_m+ such that for anyr ∈ S_m+, s(r)is the unique matrix belonging toS_m+such thats(r)2=r. It is easy to see thatsis a continuous and, in particular, a Borel measurable mapping. Hence, sinceσ2(X_t)isF_tY-measurable for anyt>0, then

Z_t =σ(X_t) =s(σ2(X_t)), t>0

isFY

t -measurable for anyt>0, too. We have the following Lemma 2.3. LetY be the stochastic process defined byˆ

dYˆ_t=σ−1(X_t)d Y_t=˜h(X_t)d t+dW_t,

where˜h=σ−1h. Then,

F_tY =F_tYˆ∨ F_tZ, t>0.

Proof: From the aboveFZ

t ⊂ FtY. Alsoσ

−1₍_X

t)isFtZ-measurable, henceYˆtisFtY-measurable, i.e.,

F_tYˆ⊂ F_tY. So

F_tYˆ∨ F_tZ ⊂ F_tY.

On the other hand, asd Yt =ZtdYˆt, we have indeed,

F_tY ⊂ F_tYˆ∨ F_tZ.

By Lemma 2.3,Z can be considered as part of the observations and the signal-observation pair can be written as







d Xt=b(Xt)d t+c(Xt)dWt+˜c(Xt)d Bt d Y_t=h(X_t)d t+Z_tdW_t

Z_t=σ(X_t)

. (2.2)

We see now that the framework is truly non-classical as part of the observation process is noiseless. It follows that, given the observation,X_ttakes values in the level setM_Z_t as deﬁned in the introduction. Hence,π_t has support onM_Z

t. Therefore,πt will not have total support (unlessσis constant) and

will be singular with respect to the Lebesgue measure onRd.

As was seen, only the diagonal entries and those above the diagonal of the process Zt (in other words,a Z_t

) are required to generateF_tZ. Hence, we only need to take into account the properties of the mappinga_σ

(6)

deﬁned as a_σ(x) = a(σ(x)) for all x ∈Rd. Then, deﬁning the usual gradient operator for Rm˜ -vectors f by

∇f

(x)i j=∂_jfi(x); i=1, ..., ˜m; j=1, ...,d; x ∈Rd;

q(x) =∇aσ(x) is a linear mapping fromRd toRm˜, i.e., an ˜m×dmatrix.

Definition 2.4. The vector x∈Rdis a regular point for aσif and only if the matrix q(x)has full rank min(d, ˜m). We shall denote the collection of all regular points byR.

We will study the optimal ﬁlterπ_t in the next two sections according to the type of the level setM_z.

3 The case

d

≤

m

˜

In this section, we consider the case when d ≤ m˜. We will assume that Mz consists of countably many points and that its connected components do not branch or coalesce. Namely, we assume

Condition (R1): There exists a countable set I such that for any z∈Rm˜,

Mz={xi(z): i∈I},

where xi:Rm˜ →Rd, i∈I are continuous functions such that xi(z)6= xj(z), for any i,j∈I , i6= j.

This condition holds true ifa_σsatisﬁes the following assumption

Condition (R): Every point inRd is regular for a_σ, in other words,R=Rd.

In this case, for x ∈ Rd, by the inverse function theorem (see, for example, Rudin [12]) there is a continuous bijection between an open neighborhood of w(a_σ(x))and an open neighborhood of x, where w : Rm˜ → Rd is the projection of Rm˜ onto Rd corresponding to those coordinates which give a minor of maximal rank for the matrixq(x). The composition between this continuous bijection and the projectionwis, in effect, one of the continuous functionsxi appearing in Condition (R1). In particular, if x ∈Mz, then there is no other element of Mz within the corresponding open neighborhood. In other words,M_zcontains only isolated points. Thus, Condition (R) implies that all the level sets ofσ(respectively,a_σ) arediscreteand must be ﬁnite on any compact set and therefore countable overall. Hence there is a countable set of continuous functions describing the level sets. These continuous functions do not coalesce or branch as that would contradict the existence of the bijection at the point of coalescence/branching (in topological language, the number of connected components of the level sets is locally constant). Hence condition (R) implies Condition (R1).

By Condition (R1) and the continuity of the process Xt, we see that if X0 is non-random and X0 = xi(Z0) for some i ∈ I, then Xt = xi(Zt) for the same value of i ∈ I (the process does not ‘jump’ between connected components. In other words, the noiseless component of the observation uniquely identiﬁes the conditional distribution of the signal given the observation:

π_t=δ_X

(7)

Next, we consider the case that X₀ is not constant. We need to take into account the additional information that may arise from observing the quadratic variation of the process a Zt

and the covariation process betweena(Z_t)andY_t. This will not inﬂuence the trajectory ofπ: From above we already know that it is deterministic given its initial valueπ₀ and the processa Z_t

. However, this additional information may restrictπ₀.

Applying Itô’s formula toa_σ(X_t)forX_t being given by (2.1), we get

where Lis the second order differential operator given by

L f = 1

It follows from (3.1) and (2.1) that the quadratic covariation process betweena(Zt)andYt is

Z t

The analysis that follows hinges upon an explicit characterization of the information contained in

F₀Y₊. Such a characterization may not be available in general. However, we present two cases where it is possible and note that other cases may be deduced in an inductive manner. From Remark 2.1 we know that σ2₍_X

0), which is the derivative of the quadratic variation of the process Y at 0, is F₀Y₊-measurable. Moreover, from the discussion following Lemma 2.2 σ(X₀) and aσ(X0) are also F₀Y₊-measurable. In Case 1, no more additional information is available from ˜Z₀. This means that the FY

0+-measurable random variables obtained by differentiating at zero the quadratic variation ofaσ(X)and the quadratic covariation process between aσ(X) andY are functions of aσ(X0). In

Case 2, these two new processes offer new information (they are not functions ofa_σ(X₀)), but their corresponding quadratic variations and quadratic covariation processes do not have informative derivatives. In subsequent cases, which can be treated in a similar manner as the ﬁrst two cases, more and more of the processes constructed by computing quadratic variations and quadratic covari-ations and differentiating at zero offer information (they are not functions of the already computed

F₀Y₊-measurable random variables).

Case 1: The matrices qccT+˜c˜cTqT and qc are functions of aσ, in other words there exist two Borel measurable functionsH₁ andH₂froma(S_m+)toRm˜×m˜ andRm˜×m respectively such that

qccT+˜c˜cTqT =H1 aσ

(8)

Let us note that x∈Rdis a regular point if and only if

brings no new knowledge, hence we can ignore it. Following the proof of Theorem 2.8 in[6], we let µ_z be the conditional probability distribution ofX₀ givenZ₀=z, i.e.,

where Hd _{is the} _d_{-dimensional Haussdorff measure (cf. Evans and Gariepy} _[₂_]_{, p60). Taking}

(9)

Thus

Following the case with constantX₀, we then have

π_t f

Before we consider the second case, we give an example for which the conditions of Theorem 3.1 are satisﬁed.

whereZdenotes the collection of integers and the continuous functions xi:R3→Rare defined as

xi(z) =xi(z₁,z₂,z₃) =

¨

2πi+arcsin(z1−2) z3≥2

2πi+π+arcsin(2−z1) z3<2

.

This proves Condition (R1). Condition (3.3) is also satisfied as

q(x) =

(10)

Now we consider the other case.

Case 2: IfqccT +˜c˜cTqT and qc are not functions of a_σ, then ˜Z₀ as deﬁned in (3.2) offers new information, that is F₀Y₊ is larger than the σ-ﬁeld generated by Z₀. Hence the support of the distribution ofX0 givenF0Y+may be smaller than givenσ(Z0). To handle the new information, we

need to impose additional constraints. Letσ₀ =σ,q₀ =qandσ_k be the following matrix valued function

σ_k(x) =

σ_k₋₁(x) cTq_kT₋₁(x)

qk−1c(x) qk−1

ccT+˜c˜cTq_kT₋₁(x)

, x ∈Rd, k=1, 2, 3..., (3.5)

˜

m_k be the dimension of the image of the mappingaσk andqk=∇aσk. We replace Conditions (S),

(R1) and (3.3) with the following:

Condition (˜S): The partial derivatives ofσ₁up to order 2 are continuous.

Condition (˜R1): There exists a countable set˜I such that for any z∈Rm˜1_,

M_z≡ {x ∈Rd : σ₁(x) =z}={xi(z): i∈˜_I},

where xi:Rm˜1_→_Rd_{, i}_∈˜_{I are continuous functions such that x}i₍_z₎₆₌_xj₍_z₎_{, for any i}_,_j_∈˜_{I , i}₆₌ _j.

Condition (INk): qkc and qk

ccT+˜c˜cTq_kT are functions of aσk.

Then, the following analogue of Theorem 3.1 holds true.

Theorem 3.3. Suppose that the conditions (˜R1, ˜S, BC, X0, IN₁) are satisfied. If P_x_∈_M˜

Z0

˜

π0(x) J(x) <∞, then

π_t= X

x∈M˜_Z 0

p_xδ_Xˆx

t, t>0,

whereXˆ_tx =xi(Z_t)with i∈˜I is such that x= xi(Z₀), and

p_x = ˜

π0(x) J(x)

P

x∈M˜_Z 0

˜

π0(x) J(x)

.

We now give an example for which the conditions of Theorem 3.3 are satisﬁed.

Example 3.4. Let d = m =1. The coefficients of the system are b(x) = x, c(x) = 1, ˜c = 1, h(x)

bounded, andσ(x) =2+sinx. Then, q(x) =cosx is not a function ofσ. However, q′is a function of

σas

q′(x) =−sinx =2−σ(x).

In this casem˜₁=3,

σ₁(x) =

2+sinx cosx cosx 2(cosx)2

(11)

q₁(x) =



 

cosx

−sinx 4 cosxsinx





, x ∈R

is a function of a_σ

1 so (IN1) holds and the level sets Mz are described by

M_z={xi(z): i∈Z},

whereZdenotes the collection of integers and the continuous functions xi:R3→Rare defined as

xi(z) =xi(z1,z2,z3) =

¨

2πi+arcsin(z₁−2) z₂≥0 2πi+π+arcsin(2−z₁) z₂<0 .

This proves Condition (˜R1). The other conditions are easy to verify.

The analysis can continue in this manner: if (IN1) is not satisﬁed by q1, we can deﬁne σ2 in a

similar manner withσ,q in (3.5) replaced by σ₁ andq₁ respectively; and the above procedure is then continued until Condition (IN_k) is satisﬁed.

4 The case

d

>

m

˜

In this section, we consider the case whend>m˜. We will show that M_z is no longer a discrete set but rather a surface (manifold) and the optimal ﬁlterπ_t is a probability measure on the manifold M_z and is absolutely continuous with respect to the surface measure. For this, we follow closely the analysis in[7].

Note that, ifd>m˜, then x∈Rd is a regular point if and only if

J(x) =pdetqqT(x)>0.

We shall useT_xM_zto denote the space consisting of all the tangent vectors to the manifold (surface) Mz at point x ∈Mz. TxMz is called the tangent space of Mz at x. We will useNxMz to denote the orthogonal complement of TxMz in Rd. We call NxMz the normal space of Mz at x. We list in the following lemma some well-known facts about the transformationaσwithout giving their proofs:

Lemma 4.1. i) For any u∈Rm˜, M_a−1₍_u₎is ad-dimensional manifold where˜ d˜=d−m. The Haussdorff˜ measureHd˜on M_a−1₍_u₎is the surface measure.

ii) For any x ∈Mz, the rows of the matrix q generate the normal space NxMz to the manifold Mz at point x.

iii) Let

ρ(x) =qT(qqT)−1(x)and p(x) =ρ(x)q(x).

Then p(x)is the orthogonal projection matrix fromRm˜ to the subspace N_xM_z.

For simplicity of the presentation, we make an assumption which is slightly stronger than (3.3).

Condition (IN): There exist two Borel measurable functions H₁and H₂from a(S+_m)toRm˜×mandRm˜×d

respectively such that

qc=H1 aσ

andq˜c=H2(aσ). (4.1)

(12)

Example 4.2. Let d = 2and m = 1. Define the coefficients by b(x) = x, c(x) =0, ˜c(x) = I and

σ(x) =ex1_{. Then J}₍_x_{) =}_ex1_>₀_{for all x}_∈_R2_{and hence, Condition (R) is satisfied. Further,}

M_z={(x₁,x₂): ex1=z}

is a (line) manifold of dimensiond˜=2−1=1. It is clear that the row vector of q generates the normal space

NxMz={λ(1, 0): λ∈R}.

In this example, p(x) =

1 0 0 0

is the orthogonal projection matrix fromR2 to NxMz. To verify

(IN), we note that qc=0and

q˜c= (ex1, 0)I= (σ(x), 0).

We also assume:

Condition (C): Both c and˜c are continuously differentiable.

Throughout the rest of this section, the assumptions (R, S, BC, X0, IN, ND, C) will be in force. The following theorem gives us the conditional distribution of X₀ given Z₀. Since, in this case, F₀Y₊

coincides with theσ-ﬁeld generated by Z₀, it gives, in effect,π₀₊in the case ofd>m˜.

We introduce λ_u to be the surface measure on the level set M_a−1₍_u₎ foru ∈Rm˜ and µ_z to be the conditional probability distribution ofX0 givenZ0=z, i.e.,

µ_z(d x) =P X₀∈d x|Z₀=z

.

The following theorem shows thatµ_z is absolutely continuous with respect toλ_u.

Theorem 4.3. Suppose that the densityπ˜₀is not identically zero on M_a−1₍_u₎and satisfies the following integrability condition:

Z

M_a−1₍_u₎ ˜

π₀(x)

J(x) λu(d x)<∞.

Then

µ_z(d x) =p(x)λ_a₍_z₎(d x),

where

p(x) = R π˜0(x)/J(x)

Mzπ˜0(y)/J(y)λu(d y)

.

Proof: For any test functionφdeﬁned onRd, and any Borel setDinRm×m, deﬁne

(13)

Then, by the co-area formula (cf. Evans and Gariepy[2], chapter 3), we have

The result follows from the deﬁnition of the conditional expectation.

We now decompose the vector fields in the SDE satisfied by the signal according to their components in the spacesTxMz andNxMz. It is more convenient to use Stratonovich form for the signal process. That is, the signalX satisfies the following SDE in Stratonovich form:

d X_t=˜b(X_t)d t+c(X_t)◦dW_t+˜c(X_t)◦d B_t, (4.2)

Recall from (3.1) that theF_tY-measurable processa(Z_t)satisﬁes

d a(Z_t) =Laσ(Xt)d t+d Vt, (4.3)

whereV_t is the ˜m-dimensional continuous martingale

(14)

Lemma 4.4. The second observation process of (2.2) a(Z_t)satisfies

d a(Z_t) =qc(X_t)◦dW_t+q˜c(X_t)◦d B_t+q˜b(X_t)d t. (4.4)

Proof: We have

d V_t = qc(X_t)◦dW_t+q˜c(X_t)◦d B_t−1

2 d

X

k,j=1

m

X

ℓ=1

∂_k(q·jcjℓ)ckℓ(Xt)d t

−1

2 d

X

k,j,ℓ=1

∂_k(q_·_j˜c_j_ℓ)˜c_k_ℓ(X_t)d t

= qc(Xt)◦dWt+q˜c(Xt)◦d Bt− 1 2

d

X

k,j=1

(ccT)_jk∂_{k j}2aσd t− 1 2

d

X

k=1

q∂_kcc_·T_kd t

−1

2 d

X

k,j=1

(˜c˜cT)_jk∂_{k j}2aσd t− 1 2

d

X

k=1

q∂_k˜c˜c_·T_kd t

= qc(X_t)◦dW_t+q˜c(X_t)◦d B_t−Laσ(Xt)d t+q˜b(Xt)d t.

Combining with (4.3), we see that (4.4) holds.

Finally, we arrive at the main decomposition result.

Theorem 4.5. Under Condition (BC, S, R, X₀, C), the filtering model (2.1) is rewritten as

d X_t= (I−p)˜b(X_t)d t+c(X_t)◦dW_t+˜c(X_t)◦d B_t+ρ(X_t)◦d a(Z_t), (4.5)

with observations

d a(Zt) =Laσ(Xt)d t+d Vt, (4.6)

and

d Y_t=h(X_t)d t+Z_tdW_t. (4.7)

Proof: From equation (4.2) we have

d X_t = ˜b(X_t)d t+c(X_t)◦dW_t+˜c(X_t)◦d B_t

= (I−p)˜b(X_t)d t+ (I−p)c(X_t)◦dW_t+ (I−p)˜c(X_t)◦d B_t

+p˜b(Xt)d t+pc(Xt)◦dWt+p˜c(Xt)◦d Bt. (4.8)

By (4.4), we get

ρ(X_t)◦d a(Z_t) =p˜b(X_t)d t+pc(X_t)◦dW_t+p˜c(X_t)◦d B_t.

Plugging back into (4.8), we see that (4.5) holds. The equality (4.6) is a rewriting of (4.3). The equation (4.7) is just the original observation model (2.1).

Let{ξ_t_,_s: 0≤s≤t}be the stochastic ﬂow associated with the SDE:

(15)

Lemma 4.6. The flowξ_t_,_s maps M_Z

s to MZt. Further, the processξt isF

Z

t -adapted.

Proof: Applying the Stratonovich form of Itô’s formula, we get

d a_σ(ξ_t) =∇Ta_σ(ξ_t)ρ(ξ_t)◦d a(Z_t) =d a(Z_t).

Thus forσ(ξ_s) =Z_s, we getσ(ξ_t) = Z_t. The second conclusion follows from the uniqueness of the solution to the SDE (4.9).

Denote the column vectors of(I−p)c and(I−p)˜cby g1, · · ·, gmand ˜g1, · · ·, ˜gd respectively. Let b0= (I−p)˜b. Then for each x ∈Mz, the vectors g1(x), · · ·, gm(x); ˜g1(x), · · ·, ˜gd(x); b0(x)are

all inT_xM_z. The signal processX_t satisﬁes

d X_t =ρ(X_t)◦d a(Z_t) +

m

X

i=1

g_i(X_t)◦dW_ti+

d

X

j=1

˜

g_j(X_t)◦d B_tj+b0(Xt)d t. (4.10)

It is well-known that the Jacobian matrixξ′_t_,_s of the stochastic ﬂowξ_t_,_s is invertible (cf. Ikeda and Watanabe[5]). The operator(ξ−_t_,1_s)∗deﬁned below pulls a vector atξt,s(x)back to a vector atx.

Definition 4.7. Let g be a vector field inRd. The random vector field(ξ−_t_,_s1)∗g is defined as

(ξ−_t_,1_s)_∗g(x)≡(ξ′_t_,_s)−1g(ξ_t_,_s(x))

for any aσregular point x∈Rd.

We consider the following SDE onRd:

dκ_t= (ξ−_t_,01)_∗b0(κt)d t+ m

X

i=1

(ξ−_t_,01)_∗g_i(κ_t)◦dW_ti+

d

X

j=1

(ξ−_t_,01)_∗˜g_j(κ_t)◦d B_tj. (4.11)

Lemma 4.8. The SDE (4.11) has a unique strong solution. Further, if Z₀ = z, thenκ_t ∈M_z for all t≥0a.s.

Proof: Note that(ξ′_t_,0)−1satisﬁes a diffusion SDE with bounded coefﬁcients (cf. Ikeda and Watanabe

[5]). By Gronwall’s inequality, it is easy to show that

E sup

0≤t≤T

k(ξ′_t_,0)−1kp<∞, ∀p>1.

Therefore, we can prove that there is a constantKsuch that

E (ξ

−1

t,0)∗b0(κ1)−(ξt−,01)∗b0(κ2)

2

≤K|κ₁−κ₂|2, ∀κ₁, κ₂∈Rd.

(16)

Applying Itô’s formula to (4.11), we get

d aσ(κt) = q(κt)(ξ−t,01)∗b0(κt)d t+ m

X

i=1

q(κ_t)(ξ−_t_,01)_∗gi(κt)◦dWti

+

d

X

j=1

q(κ_t)(ξ−_t_,01)_∗˜g_j(κ_t)◦d B_tj.

Sinceb0(ξt,0(κt))∈Tξt,0(κt)MZt, we have(ξ

−1

t,0)∗b0(κt)∈TκtMσ(κt). As the row vectors ofq(κt)are

inN_κ

tMσ(κt), we see that

q(κ_t)(ξ−_t_,01)_∗b₀(κ_t) =0.

Also the above equality holds withb0replaced bygi, 1≤i≤mor ˜gj, 1≤ j≤d. Thus,d aσ(κt) =0, and hence,a_σ(κ_t) =a_σ(κ₀),∀t≥0. This proves thatσ(κ_t) =z, and henceκ_t ∈M_z,∀ t≥0, a.s.

The next theorem gives the decomposition of the signal process.

Theorem 4.9. For almost allω∈Ω, we have

X_t(ω) =ξ_t_,0(κ_t(ω),ω), ∀t≥0. (4.12)

Proof: Denote the right hand side of (4.12) by ˜X_t(ω). Applying Itô’s formula, we get

dX˜_t =

d

X

j=1

∂_jξ_t_,0(κ_t)◦dκj_t+ρ_t(ξ_t_,0(κ_t))◦d a(Z_t)

= ξ′_t_,0(κ_t)(ξ′_t_,0(κ_t))−1b₀(ξ_t_,0(κ_t))d t+

m

X

i=1

ξ′_t_,0(κ_t)(ξ′_t_,0(κ_t))−1g_i(ξ_t_,0(κ_t))◦dW_ti

+

d

X

j=1

ξ′_t_,0(κ_t)(ξ′_t_,0(κ_t))−1˜gj(ξt,0(κt))◦d B j

t+ρt(X˜t)◦d a(Zt)

= ρ(_X˜_t)◦d a(Zt) + m

X

i=1

gi(X˜t)◦dWti+ d

X

j=1

˜

gj(X˜t)◦d B j

t+b0(X˜t)d t.

By the uniqueness of the solution to (4.10), we see that the representation (4.12) holds.

The optimal ﬁlter then satisﬁes

π_tf =E(f(ξ_t_,0(κ_t))|F_tYˆ∨ F_tZ), t>0.

Note that ξ_t_,0 is F_tZ-measurable. Thus, we may regard ξ_t_,0 as known and the singular ﬁltering problem can be transformed to a classical one as follows: For f ∈C_b(M_z), let

Utf ≡E[f(κt)|F ˆ Y t ∨ F

Z t ].

ThenU_t is the optimal ﬁlter with the signal processκ_t being given by (4.11) and the observation

( ˆYt,a(Zt))being given by

(17)

and

d a(Zt) =Laσ(ξt,0(κt))d t+H1(a(Zt))dWt+H2(a(Zt))d Bt. (4.14)

Note that the ﬁltering problem with signal (4.11) and observations (4.13) and (4.14) is classical. We sketch below the derivation of the Zakai equation for the unnormalized version ˜U_t of the process U_t, and leave the details to the reader. Once this is done, we note that

π_tf =Ut

Then, ˜Z_t is observable and

dZ˜_t=˜h(t,Y,κ_t)d t+d

Note that (4.11) can be written in the Itô form

(18)

for suitably deﬁned coefﬁcients ˜b, ˜σ₁ and ˜σ₂. Denote

ˆ

σ(t,Y,κ_t) =

˜

σ₁(t,Y,κ_t), ˜σ₂(t,Y,κ_t)h2(a(Zt))

H2(a(Zt))H2(a(Zt))T

1

2

and

Lt,Yf(κ) = 1 2

X

i j ˜

ai j(t,Y,κ)∂i jf(κ) +˜b(t,Y,κ)T∇f(κ), (4.17)

where ˜a(t,Y,κ) =P2_i₌₁σ˜_i(t,Y,κ)σ˜_i(t,Y,κ)T. Then, by Kallianpur-Striebel formula, we have

U_t=U˜_t1−1U˜_t, (4.18)

while the unnormalized ﬁlter ˜U_t satisﬁes the following Zakai equation:

dU˜_tf =_U˜_t(L_t,Yf)d t+U˜t

∇Tfσ(ˆ t,Y) +˜h(t,Y)TfdZ˜_t. (4.19)

A ﬁltering equation forUt can be derived by applying Itô’s formula to (4.18) and (4.19). Now we summarize the discussion above to a theorem.

Theorem 4.10. Suppose that the assumptions (R, S, BC, X0, IN, ND) are satisfied. Let Z be the˜ observation process given by (4.15). Let˜h and Lt,Y be given by (4.16) and (4.17) respectively. Let

ξ_t_,_s be the stochastic flow given by (4.9). Let U˜_t be the uniqueM_F(M_Z0)-valued solution to the Zakai equation (4.19) and U_t = (_U˜_t₁)−1_U_˜

t. Then, the optimal filterπtis given by

π_t(f) =Ut(f ◦ξt,0), ∀ f ∈Cb(Rd).

Finally, we indicate that an analogue of the Kalman ﬁlter for the singular ﬁltering setup is a part of a work in progress by Liu[10]. We indicate the model and the result anticipated. For simplicity, we takem=1.

Let the signal be given by the linear equation:

d X_t= bX_td t+d B_t,

and the observation is

d Y_t =hTX_td t+|QTX_t|dW_t,

where (B,W)is the d+1-dimensional Brownian motion, b is ad×d-matrix andh, Q∈Rd. It is anticipated thatπ_t will be a linear combination of two normal measures on planes{x ∈Rd: QTx = ±Z_t}, whereZ_t=|QTX_t|is observable. Equations for the condition means and variances, as well as the weight process in the linear combination, will be derived.

5 The filtering model with Ornstein-Uhlenbeck noise

(19)

to a singular ﬁltering problem of the form studied in the previous sections. To be more general, we assume that the OU process is given by

dOt=a1Otd t+b1dWt

and y is as in (1.4). Let

Yt =

Z t

0

e−a1s_d₍_ea1s_y s).

ThenFY t =F

y

t and, by Itô’s formula,

d Y_t= (Lh(X_t) +a1h(Xt))d t+b1dWt+∇hc(Xt)d Bt.

Deﬁne the pair process

d V_t = (b2₁I+∇hc(∇hc)T)−1/2(X_t) b₁dW_t+∇hc(X_t)d B_t

,

dV˜_t = (b2₁I+ (∇hc)T∇hc)−1/2(X_t)(∇hc)T(X_t)dW_t−b₁d B_t.

Then the pair process(V, ˜V)is an (m+d)-dimensional Brownian motion and

d Bt = (b21I+∇hc(∇hc)

T₎−1_(∇_hc₎T₍_b2

1I+∇hc(∇hc)

T₎−1/2₍_X

t)d Vt

−(b₁2I+ (∇hc)T∇hc)−1/2(X_t)dV˜_t.

Hence, if we deﬁne the following functions:

σ = (b2₁I+∇hc(∇hc)T)1/2

c1 = c(b21I+ (∇hc)

T_∇_hc₎−1_(∇_hc₎T₍_b2

1I+∇hc(∇hc)

T₎1/2

˜c₁ = −c(b₁2I+ (∇hc)T∇hc)−1/2,

then the pair signal/observation process is written as

¨

d Xt=b(Xt)d t+c1(Xt)d Vt+˜c1 Xt

dV˜t d Y_t= Lh+a₁h

(X_t)d t+σ(X_t)d V_t . (5.1)

Hence the ﬁltering model (5.1) is of the form (2.1).

Next, we give an example to demonstrate that for ﬁltering problem with OU observation noise, both discrete and continuous singularity can occur. This example is a special case of those considered by Liu[10], so we omit the details here.

Example 5.1. Let the signal Xi_t, i=1, 2, be governed by SDEs

d X_ti =σ_iÆX_tid Bi_t,

where σ₁ and σ₂ are two constants, and B1_t and B2_t are two independent one-dimensional Brownian motions. Suppose that the observation model is

yt=X1t +X

2

t +Ot,

(20)

Applying the quadratic variation procedure indicated above, we see that Z_t ≡ σ2₁X_t1+σ2₂X_t2 is observable. It can be proved that when σ₁ = σ₂, the singular filter is of continuous type. In this case, the additional condition (IN) in Section 4 is satisfied after transforming the filtering problem with OU noise to a singular filtering problem.

Whenσ₁6=σ₂, the condition (IN) is not satisﬁed. In this case,

Z_tp≡σ2₁pX1_t +σ₂2pX2_t, p=1, 2,· · ·

are all observable and hence the level sets are countable and the conditional distribution is of dis-crete type.

Finally, we would like to point out that the linear ﬁltering with OU process as the observation noise has been studied by Liu[10]. The model is given by

¨

d X_t= (b₁X_t+b₀)d t+C d B_t, y_t=H X_t+O_t,

whereO_t is them-dimensional OU-process independent ofB_t, H, b₁, b₀, C are matrices of dimen-sions m×d, d×d, d×1,d×d, respectively. It is classiﬁed to three classes (classical, continuous singular, discrete singular) according to the ranks of some matrices determined by the coefﬁcients H, b₁, b₀, C. We refer the interested reader to the working paper[10]for detail.

Acknowledgement. The authors would like to thank the referee for helpful suggestions.

References

[1] A. G. Bhatt and R. L. Karandikar (2003). On ﬁltering with Ornstein-Uhlenbeck process as noise.J. Indian Statist. Assoc.41, no. 2, 205–220. MR2101994

[2] L.C. Evans and R.F. Gariepy (1992). Measure Theory and Fine Properties of Functions. CRC Press, Boca Raton, FL. MR1158660

[3] A. Friedman (1975). Stochastic Differential Equations and Applications, Volume 1, Academic Press. MR0494490

[4] L. Gawarecki and V. Mandrekar (2000). On the Zakai equation of ﬁltering with Gaussian noise. Stochastics in finite and infinite dimensions, Volume in honor of Gopinath Kallianpur, eds. T. Hida et al, 145–151. MR1797085

[5] N. Ikeda and S. Watanabe (1989).Stochastic Differential Equations and Diffusions. North Hol-land Publishing Company. MR1011252

[6] M. Joannides and F. LeGland (1997). Nonlinear Filtering with Perfect Discrete Time Observa-tions,Proceedings of the 34th IEEE Conference on Decision and Control, New Orleans, December 13-15, 1995, pp. 4012-4017.

(21)

[8] G. Kallianpur and R.L. Karandikar (1988), White Noise Theory of Prediction, Filtering and Smoothing. Gordon and Breach. MR0966884

[9] H. Kunita (1993). Representation and stability of nonlinear ﬁlters associated with Gaussian noises. Stochastic Processes: A festschrift in honor of Gopinath Kallianpur, eds. Cambanis et al, Springer-Verlag, New York.201–210. MR1427316

[10] Z. Liu and J. Xiong (2009). Some computable examples for singular ﬁltering.Technical Report at http://www.math.utk.edu/~jxiong/publist.html.

[11] V. Mandrekar and P. K. Mandal (2000). A Bayes formula for Gaussian processes and its appli-cations.SIAM J. Control Optim.39_{, 852–871. MR1786333}