Mathematics authors titles recent submissions

(1)

arXiv:1711.06600v1 [math.DS] 17 Nov 2017

On optimal coding of non-linear dynamical systems

Christoph Kawan and Serdar Y¨uksel

Abstract—We consider the problem of zero-delay coding of a deterministic dynamical system over a discrete noiseless channel under three estimation criteria concerned with the low-distortion regime. For these three criteria, formulated stochastically in terms of a probability distribution for the initial state, we characterize the smallest channel capacity above which the estimation objectives can be achieved. The results establish further connections between topological and metric entropy of dynamical systems and information theory.

Keywords: Source coding; state estimation; non-linear systems; topological entropy; metric entropy; dynamical systems

AMS Classification: 93E10, 37A35, 93E99, 94A17, 94A29

I. INTRODUCTION

In this paper, we develop further connections between the ergodic theory of dynamical systems and information theory, in the context of zero-delay coding and state estimation for non-linear dynamical systems, under three different criteria. In the following, we first introduce the problems considered in the paper, and present a comprehensive literature review.

A. The Problem

Letf :X _→X be a homeomorphism (a continuous map with a continuous inverse) on a compact metric space (X, d). We consider the dynamical system generated byf, i.e.,

xt+1=f(xt), x0∈X, t∈Z.

The spaceX is endowed with a Borel probability measureπ0

which describes the uncertainty in the initial state.

Suppose that a sensor measures the state at discrete sampling times, and a coder translates the measurements into symbols from a finite coding alphabet _M and sends this information over a noiseless digital channel to an estimator. The input signal sent at time t through the channel is denoted by qt.

At the other end of the channel, the estimator generates an estimate xˆt∈X ofxt, using the information it has received

through the channel up to time t. We consider arbitrary (but causal / zero-delay) coding and estimation policies. That is,qt

is generated by a (measurable) map

qt=δt(x0, x1, . . . , xt), δt:Xt+1→ M, (1)

and the estimator output at time t is given by a map

ˆ

xt=γt(q0, q1, . . . , qt), γt:Mt+1→X. (2)

C. Kawan is with the Faculty of Computer Science and Mathematics, University of Passau, 94032 Passau, Germany (e-mail: christoph.kawan@uni-passau.de). S. Y ¨uksel is with the Department of Mathematics and Statistics, Queen’s University, Kingston, Ontario, Canada, K7L 3N6 (e-mail: yuk-sel@mast.queensu.ca).

Throughout the paper, we assume that the channel is noiseless with finite input alphabet_M. Hence, its capacity is given by

C= log2|M|.

Our aim is to characterize the smallest capacity C0 above

which one of the following estimation objectives can be achieved for everyε >0.

(E1) Eventual almost sure estimation: There exists T = T(ε)∈N_s.t.

P(d(xt,xˆt)≤ε) = 1 for allt≥T.

(E2) Asymptotic almost sure estimation:

P lim sup

t→∞ d(xt,xˆt)≤ε

= 1.

(E3) Asymptotic estimation in expectation:

lim sup

t→∞ E[d(xt,ˆxt)

p_]

≤ε, p >0.

The smallest channel capacity above which one of these objectives can be achieved for a fixed ε >0 will be denoted byCε. We are particularly interested in

C0:= lim

ε↓0Cε.

In most of our results we will additionally assume thatπ0 is

f-invariant. Sometimes we will need the stronger assumption thatπ0 is ergodic.

Given an encoding and decoding policy, the estimate xˆt at

time t is determined by the initial state x0, e.g., ˆx1 =

γ1(δ0(x0), δ1(x0, f(x0))). To express this, we sometimes

write xˆt(x0). In terms of the map f and the measure π0,

the estimation criteria described above can be reformulated as follows.

Eventual almost sure estimation:

π0({x0∈X :d(ft(x0),xˆt(x0))≤ε}) = 1 for allt≥T.

(3)

Asymptotic almost sure estimation:

π0({x0∈X: lim sup

t→∞ d(f

t_(x

0),xˆt(x0))≤ε}) = 1. (4)

Asymptotic estimation in expectation:

lim sup t→∞

Z

X

π0(dx0)d(ft(x0),xˆt(x0))p≤ε. (5)

(2)

B. Literature Review

The results developed and the setup considered in this paper are related to the following three general areas.

Relations between dynamical systems, ergodic theory, and

information theory. There has been a mutually beneficial

relation between the ergodic theory of dynamical systems and information theory (see, e.g., [35] for a comprehensive review). Information-theoretic tools have been very effective in answering many questions on the behavior of dynamical systems, for example the metric (also known as Kolmogorov-Sinai or measure-theoretic) entropy is crucial in the celebrated Shannon-McMillan-Breiman theorem [5] as well as the iso-morphism theorem [27], [28], [10], [3]. The concept of sliding block encoding [6] can be viewed as a stationary encoding of a dynamical system defined by the shift process, leading to fundamental results on the existence of stationary codes which perform as good as the limit performance of a sequence of optimal block codes. Entropy concepts, as is well-known in the information theory community, have extensive operational practical usage in identifying fundamental limits on source and channel coding for a very large class of sources [35]. We refer the reader to [6] and [7] for further relations and discussions which also include information-theoretic results for processes which are not necessarily stationary.

In this paper, we provide further connections by answering the problems posed in the previous section and relating the answers to the concepts of either metric or topological entropy.

Zero-delay coding.In our setup, we will impose causality as a

restriction in coding and decoding. Structural results for the fi-nite horizon coding problem have been developed in a number of important papers. The classic works by Witsenhausen [40] and Walrand and Varaiya [39] (with extensions by Teneketzis [37]) are crucial. The findings of [39] have been generalized to continuous sources in [43] (see also [17] and [2]). However, these contributions are only concerned with structural results for the optimal encoding policies. Related work include [41], [19], [1], [8] which have primarily considered the coding of discrete sources.

Causal coding under a high rate assumption for stationary sources and individual sequences was studied in [18]. The authors, among many other results, established asymptotic quantitative relations between the differential entropy rate and the entropy rate of a uniformly and memorylessly quantized stationary process, and through this analysis established the near optimality of uniform quantizers in the low distortion regime. This result, unfortunately, does not apply to the system we consider, since this system has a differential entropy rate of_−∞.

Networked control.In networked control, there has been an

interest in identifying limitations on state estimation under information constraints. The results in this area have typ-ically involved linear systems, and in the non-linear case the studies have only been on deterministic systems over estimated/controlled over deterministic channels, with few exceptions. State estimation over discrete noiseless channels

was studied in [20], [23] for linear discrete-time systems in a stochastic framework with the objective to bound the estimation error in probability. In these works, the inequality

C_≥H(A) := X λ∈σ(A)

max_{0, nλlog|λ|} (6)

for the channel capacity C was obtained as a necessary and almost sufficient condition. HereAis the dynamical matrix of the system and the summation is over its eigenvaluesλwith multiplicitiesnλ. This result is also in agreement with related

data-rate theorem bounds established earlier in the literature [42], [36], [25]. Some relevant studies that have considered non-linear systems are the following. The papers [16], [30] and [21] studied state estimation for non-linear deterministic systems and noise-free channels. In [16], Liberzon and Mitra characterized the critical data rate C0 for exponential state

estimation with a given exponentα_≥0for a continuous-time system on a compact subsetKof its state space. As a measure for C0, they introduced a quantity called estimation entropy

hest(α, K), which equals the topological entropy on K in

caseα= 0, but forα >0 is no longer a purely topological quantity. The paper [12] provided a lower bound onhest(α, K)

in terms of Lyapunov exponents under the assumption that the system preserves a smooth measure. In [21], Matveev and Pogromsky studied three estimation objectives of increasing strength for discrete-time non-linear systems. For the weakest one, the smallest bit rate was shown to be equal to the topological entropy. For the other ones, general upper and lower bounds were obtained which can be computed directly in terms of the linearized right-hand side of the equation generating the system. A further related paper is [34], which studies state estimation for a class of non-linear systems over noise-free digital channels. Here also connections with topological entropy are established.

In recent work [14] and [13], we considered the state esti-mation problem for non-linear systems driven by noise and established negative bounds due to the presence of noise by viewing the noisy system as an infinite-dimensional random dynamical system subjected to the shift operation. In this paper, we relax the presence of noise, and are able to obtain positive results involving both the metric and the topological entropy.

(3)

Contributions.In view of this literature review, we make the following contributions. We establish that for (E1), the topo-logical entropy on the support of the measure π0 essentially

provides upper and lower bounds, and for (E3) the metric entropy essentially provides the relevant figure of merit for both upper and lower bounds. For (E2), the lower bound is provided by the metric entropy, whereas the upper bound can be given by either the topological entropy or by the metric entropy depending on the properties of the dynamical system. Through the analysis, we provide further connections between information theory and dynamical systems by identifying the operational usage of entropy concepts for three different esti-mation criteria. We obtain explicit bounds on the performance of optimal zero-delay codes; and regarding the contributions in networked control, we provide refined upper and lower bounds when compared with the existing literature.

Organization of the paper. In Section II, we introduce

notation and recall relevant entropy concepts and related results from ergodic theory. Sections III, IV and V contain our main results on lower and upper estimates of the channel capacity for the estimation objectives (E1)–(E3). Section VI provides a discussion of our results and relates them to similar results in the literature. In Section VII, we formulate some corollaries for systems in the differentiable category, relating the critical channel capacities to Lyapunov exponents, and we also present several examples. Finally, some fundamental theorems in ergodic theory used in our proofs are collected in the appendix, Section VIII.

II. NOTATION AND REVIEW OF RELEVANT RESULTS FROM DYNAMICAL SYSTEMS

Throughout the paper, all logarithms are taken to the base 2. We define log_∞:=_∞. By _|A_| we denote the cardinality of a setA. If(X, d)is a metric space and_{∅ 6}=A_⊂X, we write

diamA:= sup_{d(x, y) :x, y _∈A_} for the diameter of A. If

(xt)t∈Z₊ is a sequence, we writex_[0_,t_]= (x0, x1, . . . , xt)for

anyt_≥0.

Let f :X →X be a uniformly continuous map on a metric space (X, d). Forn_∈N_and_{ε >} ₀_{, a subset}_F _⊂_X _{is said}

to (n, ε)-span another subset K _⊂X provided that for each

x_∈Kthere isy_∈F withd(fi_{(x), f}i_(y))

≤εfor0_≤i < n. Alternatively, we can describe this in terms of the metric

dn,f(x, y) := max

0≤i<nd(f

i_{(x), f}i_(y)),

which is topologically equivalent tod. Anydn,f-ball of radius ε >0is called aBowen-ball of ordernand radiusε. In these terms, a set F (n, ε)-spans K if the closed Bowen-balls of ordernand radiusεcentered at the points inF form a cover ofK. IfK is compact, we writerspan(n, ε;K)to denote the

minimal cardinality of a set which (n, ε)-spansK, observing that this number is finite. The topological entropy off onK

is then defined by

htop(f;K) := lim

ε↓0lim supn→∞

1

nlogrspan(n, ε;K).

By the monotonicity properties of rspan(n, ε;K), the limit

for ε _↓ 0 can be replaced by the supremum over ε > 0. An alternative definition can be given in terms of (n, ε) -separated sets. A subset E _⊂ X is called (n, ε)-separated

if for any x, y _∈ E with x ₆= y one has dn,f(x, y) > ε. Writing rsep(n, ε;K) for the maximal cardinality of an

(n, ε)-separated subset of a compact set K, one finds that

rspan(n, ε;K)≤rsep(n, ε;K)≤rspan(n, ε/2;K), and hence

htop(f;K) = lim

ε↓0lim supn→∞

1

nlogrsep(n, ε;K).

IfX is compact, we also writehtop(f) =htop(f;X)and call

this number thetopological entropy off.

Now assume thatX is compact andµis anf-invariant Borel probability measure onX, i.e.,µ◦f−1=µ. Let_P be a finite Borel partition ofX and put

hµ(f;P) := lim n→∞

1 nHµ(P

n_),

Pn_:= n−1

_

i=0

f−i

P, (7)

where Hµ(Pn) = −PP∈Pnµ(P) logµ(P) denotes the en-tropy of the partition _Pn and _∨ is the usual join operation for partitions, i.e., Wn−1

i=0 f−iP is the partition of X whose

members are the sets

Pi0∩f−

1_(P

i1)∩. . .∩f−

n+1_(P

in−1), Pij ∈ P.

The existence of the limit in (7) follows from subadditivity. The metric entropy off w.r.t. µis defined by

hµ(f) := sup

P

hµ(f;P),

the supremum taken over all finite Borel partitions_P. Ifµ is ergodic, i.e.,

f−1(A) =A ⇒ µ(A)∈ {0,1},

there is an alternative characterization of hµ(f) in terms of (n, ε)-spanning sets, due to Katok [9]. For a fixed δ_∈(0,1)

put

rspan(n, ε, δ) :=

min_{|F_|:F (n, ε)-spans a set ofµ-measure_≥1₋δ_}.

Then the metric entropy off satisfies

hµ(f) = lim ε↓0lim supn→∞

1

nlogrspan(n, ε, δ),

and this limit is independent ofδ.

Note that bothhµ(f)are htop(f)are nonnegative quantities,

which can assume the value +∞. They are related by the variational principle, i.e.,

htop(f) = sup

µ hµ(f),

(4)

III. EVENTUAL ALMOST SURE ESTIMATION

For the estimation objective (E1) we can characterize C0 as

follows.

III.1 Theorem: Assume that π0 is f-invariant and

htop(f; suppπ0) < ∞. Then the smallest channel capacity above which (E1) can be achieved for everyε >0satisfies

htop(f; suppπ0)≤C0≤log(1 +⌊2htop(f;suppπ0)⌋).

Proof: The lower bound was proved in [13, Thm. 3.3].

The upper bound follows by applying the coding and es-timation scheme described in [21, Thm. 8] for the system

(suppπ0, f|suppπ0), using thatsuppπ0is a compactf-invariant

set. This scheme works as follows. Assume that the channel alphabet_Msatisfies

|M| ≥1 +⌊2htop(f;suppπ0)_⌋_,

implyingC > htop(f; suppπ0). Letε >0and fixk∈Nsuch

that

1

klogrspan(k, ε; suppπ0)> C, (8)

which is possible by the definition of htop. Let S be a set

which (k, ε)-spans suppπ0 with |S| = rspan(k, ε; suppπ0).

The coder at time tj := jk computes fk(xjk) and chooses

an element y ∈ S such that d(ft_(fk_(x

jk)), ft(y)) ≤ ε for 0 _≤ t < k provided thatfk_(x

jk) ∈ suppπ0. By (8), y can

be encoded and sent over the channel in the forthcoming time interval of lengthk. In the case whenfk_(x

jk)∈/suppπ0, it is

not important what the coder sends. In the first case, at time

tj+1, the estimator has y available and can use ft(y) as an

estimate for x(j+1)k+t. In this way, the estimation objective

is achieved for allx0∈suppπ0 andt≥k. Hence,

P(d(xt,xˆt)≤ε) =π0 x0∈X :d(ft(x0),xˆt(x0))≤ε

=π0(suppπ0) = 1

for allt_≥k, which completes the proof.

IV. ASYMPTOTIC ALMOST SURE ESTIMATION

In this section, we study the asymptotic estimation objective (E2). As it turns out, the associated C0 is always bounded

below by the metric entropy hπ0(f) when π0 is f-invariant.

Since (E1) implies (E2), the topological entropy on suppπ0

still provides an upper bound on C0. Under an additional

mixing condition for the measure-theoretic dynamical system

(X, f, π0), we are able to improve the upper bound by putting

hπ0(f)in the place ofhtop(f; suppπ0).

IV.1 Theorem: Assume thatπ0isf-invariant. Then the

small-est channel capacity above which (E2) can be achieved for everyε >0satisfies

C0≤log(1 +⌊2htop(f;suppπ0)⌋).

Proof: This follows immediately from Theorem III.1, since

(E1) is stronger than (E2).

Under an additional strong mixing assumption, we can im-prove the bound given in the above theorem by replacing

htop(f; suppπ0)withhπ0(f).

IV.2 Theorem: Assume thatπ0 is an ergodic measure forf.

Furthermore, assume that there exists a finite Borel partition_P ofX of arbitrarily small diameter such thathπ0(f) = h := hπ0(f;P)and

lim sup n→∞

1 nlogπ0

x:

− 1

nlogπ0(P n_(x))

−h

> δ <0,

(9)

for all sufficiently smallδ >0, wherePn_(x)_{denotes the unique}

element ofPn_containing_x_{. Then the smallest channel capacity} C0above which (E2) can be achieved for everyε >0satisfies

C0≤log(1 +⌊2hπ0(f)⌋).

Proof:The proof is subdivided into two steps.

Step 1. Without loss of generality, we may assume that

hπ0(f) < ∞, since otherwise the statement trivially holds.

Consider a channel with input alphabetMsatisfying

|M| ≥1 +⌊2hπ0(f)⌋,

which impliesC > hπ0(f), and fixε >0. Choose a number δ >0 satisfying

C≥hπ0(f) +δ. (10)

Using uniform continuity of f on the compact space X, we findρ_∈(0, ε)so that

d(x, y)< ρ _⇒ d(f(x), f(y))< ε. (11)

Let_P be a finite Borel partition ofXsatisfying the assumption (9) whose elements have diameter smaller thanρand put

h:=hπ0(f;P) = lim

n→∞

1 nHπ0(P

n_).

Recall that by ergodicity of π0, the

Shannon-McMillan-Breiman Theorem states that

−1_nlogπ0(Pn(x))→h for a.e. x∈X,

implying that

π0 x:

−

1

nlogπ0(P n_(x))

−h

> δ →0

as n→ ∞. The assumption (9) tells us that this convergence is exponential.

Let _P = _{P1, . . . , Pr} and write in(x) for the length-n itinerary of x w.r.t. _P, i.e., in_(x)

∈ {1, . . . , r_}n _with fj_(x)

∈Pin₍_x₎_j for0≤j < n. Consider the sets

Σn,δ ={a∈ {1, . . . , r}n: 2−n(h+δ)

≤π0({x:in(x) =a})≤2−n(h−δ)}.

We define sampling times by

τ0:= 0, τj+1 :=τj+j+ 1, j≥0

and put

(5)

for allj_∈Z₊_{. We obtain}

where we used f-invariance of π0 in the last equality. Then

(9) implies the existence of a constant α >0 so that for all sufficiently largej we have

ζj =π0 x∈X :−

Consequently, the Borel-Cantelli lemma implies

π0({x∈X :∃j0 withij(fτj(x)))∈Σj,δ, ∀j≥j0}) = 1.

(12)

Step 3. Using Corollary VIII.2 in the appendix,h=hπ0(f)

and (10), we obtain

|Σj,δ| ≤2j(h+δ)≤2jC =|M|j. (13)

Hence, j channel uses are sufficient to transmit an encoded element of Σj,δ. Now we specify the coding and estimation

policies. In the time interval from τj toτj+1−1, which has

j+1,δ this information can

be sent through the channel in the time interval of length

j + 1 by (13). At time τj+1+k, 0 ≤ k ≤ j, the estimator

outputxˆτj+1+k is an arbitrary element of the partition setPsk with sk being the symbol at position k in the transmitted

string ij+1_(fτj+1_(x))_{. At time} _τ by the choice of the partition_P and

d(ˆxτj+2−1, f

τj+2−1_{(x)) =}_d(f_(ˆ_x

τj+2−2), f(f

τj+2−2_(x)))_{< ε}

by (11). According to (12), for almost every x we have

ij_(fτj_(x)) _∈ _Σ

j,δ for all sufficiently large j so that the

estimation accuracy ε will eventually be achieved for those

x, implying

π0({x∈X : lim sup

t→∞ d(f

t_(x),_x_ˆ

t(x))≤ε}) = 1,

which completes the proof.

An example satisfying the assumptions of the above theorem will be given in Section VII.

To provide a lower bound on C0, we will use the following

lemma which can be found in [26, Prop. 1]. For completeness, we provide its proof.

Proof: Throughout the proof, we use the notation _Pn_|_X_˜ ₌

{P∩X˜ :P ∈ Pn_}_{. Given the partition}_P_{, choose}_δ_{so that}

Now let X˜ _⊂ X be an arbitrary measurable set with

π0( ˜X)≥1−α and consider the partitionQ:={X, X˜ \X˜}.

Let k, N _∈ N _with _{(log 2)/N < ρ/2}_{. Let} _E _⊂ _X˜ _{be a}

maximal(k, δ)-separated subset for the map fN_{. Maximality}

guarantees that for eachz ∈ X˜ there is ϕ(z)∈E such that

d(fjN_{(z), f}jN_(ϕ(z)))_≤_δ _for₀ _≤_{j < k}_{. Using elementary}

properties of conditional entropy, we obtain

Hπ0(P

Since each ballBδ(z) intersects at most two elements ofP,

we have_|ϕ−1_(z)

using that E is a (kN, δ)-separated (though not necessarily maximal) subset ofX˜ for the map f. This yields

Hπ0(P

kN₎

≤log 2 +klog 2 +rsep(kN, δ; ˜X, f) +

ρ 2kN.

Dividing by kN and letting k _{→ ∞} leads to the desired

(6)

IV.4 Theorem: Assume thatπ0isf-invariant. Then the small-est channel capacity above which (E2) can be achieved for everyε >0satisfies

C0≥hπ0(f). (14)

Ifπ0is not invariant, but there exists an invariant measureπ∗

which is absolutely continuous w.r.t.π0, then

C0≥hπ∗(f). (15)

Here the caseshπ0(f) =∞andhπ∗(f) =∞, respectively, are

included. They correspond to the situation, when the estimation objective cannot be achieved over a finite-capacity channel.

Proof: Consider a channel of capacity C over which the

objective (E2) can be achieved for everyε >0. Fix coding and estimation policies achieving the objective for a fixed ε >0. Pickδ > ε and define forx_∈X,

T(x, δ) := min

k_∈N_:_d(ft_(x),_x_ˆ_t_(x))_≤_δ _{for all}_t_≥_k _.

For every K ∈ N _put _BK_{(δ) :=} _{_x_∈ _X _: _T_{(x, δ)} _≤ _K_}_.

Since x7→d(ft(x),xˆt(x))is a measurable map,BK(δ)is a

measurable set. From (4) it follows that

lim K→∞π0(B

K_{(δ)) =}_π

0

[

K∈N

BK(δ)= 1. (16)

Now, for a given α_∈(0,1), pickK_∈N_{such that}

π0(BK(δ))≥1−α.

We consider the set X˜ :=fK_(BK_(δ))_{. Observe that}_x

∈X˜,

x=fK_(y)_{, implies}

d(ft(x),xˆt+K(y))≤δ for allt≥0.

Now consider a maximal (n,2δ)-separated set En ⊂ X˜ for

somen_∈N_{. Recall that}_x_ˆ_t₌_γ_t_(q₀_{, q}₁_{, . . . , q}_t₎_{. For arbitrary}

integers m_≥m0, write

ˆ

Em0,m:={(γm0(q[0,m0]), γm0+1(q[0,m0+1]), . . . , γm(q[0,m])) :q[0,m]∈ Mm+1}.

Define a mapα:En→EˆK,K+nby assigning to eachx∈En

the unique element ofEˆK,K+nthat is generated by the coding

and estimation policy when the system starts in the initial state

x0=f−K(x)∈BK(δ). We claim thatαis injective. Indeed,

assume thatα(x1) =α(x2)for somex1, x2∈En. Then, for 0≤j≤n,

d(fj(x1), fj(x2))

≤d(fj(x1),xˆK+j) +d(ˆxK+j, fj(x2))≤δ+δ= 2δ,

implying x1=x2, because En is(n,2δ)-separated. Hence,

log_|En| ≤log|EˆK,K+n| ≤log|M|K+n= (K+n)C.

We thus obtain

C_≥lim sup n→∞

1

K+nlog|En|

= lim sup n→∞

1

nlog|En|=h(f; ˜X,2δ),

whereπ0( ˜X)≥1−αandX˜ depends on the choice ofδ > ε.

By Lemma IV.3, for a partition _P = _{P1, . . . , Ps, X\SPi}

andρ >0, we can chooseε >0,δ > εandα >0such that the above construction yields the inequality

hπ0(f;P)≤h(f; ˜X,2δ) +ρ≤C+ρ.

Since the entropy of any finite measurable partition ofX can be approximated by hπ0(f;P) with P as above (see, e.g.,

[24]), andρis arbitrary, the inequality (14) follows.

Now assume that π_∗ is an f-invariant probability measure which is absolutely continuous w.r.t.π0. Then for everyα >0

there is β > 0 such that π0(B) < β implies π∗(B) < α.

Looking at the complements of the sets BK_(δ)_{, we see that}

the measureπ0 can be replaced byπ∗ in (16), which implies

the inequality (15).

IV.5 Remark: Observe that in the whole proof of Theorem

IV.4 (including Lemma IV.3) we do not use invariance ofπ0.

That is, without the assumption of invariance, we still have

C0≥sup

P

lim sup n→∞

1 nHπ0(P

n_),

the supremum taken over all finite measurable partitions_P.

V. ASYMPTOTIC ESTIMATION IN EXPECTATION

Finally, we consider the estimation objective (E3). As we will see, under mild assumptions on the admissible coding and estimation policies,C0 is characterized by the metric entropy

hπ0(f).

V.1 Theorem: Assume that π0 is an ergodic measure for f.

Then the smallest channel capacity above which (E3) can be achieved for everyε >0satisfies

C0≤log(1 +⌊2hπ0(f)⌋).

Proof: Without loss of generality, we may assume that

hπ0(f) < ∞, since otherwise the statement trivially holds.

Consider a channel with input alphabet_Msatisfying

|M| ≥1 +⌊2hπ0(f)⌋,

which impliesC > hπ0(f), and fixε >0. Choose a number δ >0 satisfying

C_≥hπ0(f) +δ. (17)

Let_P =_{P1, . . . , Pr}be a finite Borel partition ofX whose

elements have diameter smaller than(ε/2)1/p _{and put}

h:=hπ0(f;P) = lim_n

→∞

1 nHπ0(P

n_).

Using the assumption that π0 is ergodic, the

Shannon-McMillan-Breiman Theorem states that

−_n1logπ0(Pn(x))→h for a.e.x∈X, (18)

where_Pn(x)is the element of_Pn containingx. Writein_(x)

for the lengthnitinerary ofxw.r.t._P (see the appendix), and consider the sets

Σn,δ ={a∈ {1, . . . , r}n: 2−n(h+δ)

(7)

By Corollary VIII.2 in the appendix, we have

π0({x∈X :∃m∈Nwithin(x)∈Σn,δ, ∀n≥m}) = 1

(19) and_|Σn,δ| ≤2n(h+δ). By (19), we can fixk large enough so

that

π0({x∈X :ik(x)∈Σk,δ})≥1− ε

2(diamX)p. (20)

Using that h_≤hπ0(f), we also obtain

|Σk,δ| ≤2k(h+δ)≤2kC =|M|k. (21)

Thus, k channel uses are sufficient to transmit an encoded element of Σk,δ. Now we specify the coding and estimation

policy. For any j _∈ Z₊_{, in the time interval from} _jk _to

(j+ 1)k₋1, the coder encodes the information regarding the orbit in the time interval from (j+ 1)k to(j+ 2)k₋1, i.e., the itinerary ik_(f(j+1)k_(x))_{. For all}_x_with_ik_(f(j+1)k_(x))

∈

Σk,δ this information can be sent through the channel in a

time interval of length k by (21). At time (j + 1)k +i,

0 ≤ i < k, the estimator output ˆx(j+1)k+i is an arbitrary

element of the partition set Psi with si being the symbol at position i in the transmitted string ik(f(j+1)k(x)). Provided thatik_(f(j+1)k_(x))

∈Σk,δ, the estimation error satisfies

d(ˆx(j+1)k+i, f(j+1)k+i(x))≤diamPsi <

ε 2

1/p

for 0 _≤ i < k, by the choice of the partition _P. Putting

X1 := {x: ik(f(j+1)k(x)) ∈ Σk,δ} andX2 := X\X1, and

using (20) together with invariance ofπ0 underf, we obtain

E[d(x(j+1)k+i,xˆ(j+1)k+i)p]

= Z

X1

π0(dx)d(f(j+1)k+i(x),xˆ(j+1)k+i(x))p

+ Z

X2

π0(dx)d(f(j+1)k+i(x),xˆ(j+1)k+i(x))p

≤π0(X1)ε

2+π0(X2)(diamX) p

≤₂ε+ε 2 =ε.

Hence, for all t _≥ k, we have E[d(xt,xˆt)p] ≤ ε, implying

that the estimation objective is achieved.

A stationary source can be expressed as a mixture of er-godic components; if these can be transmitted separately, the proof would also hold without the condition of ergodicity [6, Thm. 11.3.1]. If the process is not ergodic, but stationary, and if there only exist finitely many ergodic invariant measures in the support of π0, then the proof above can be trivially

modified such that the encoder sends a special message to inform the estimator which ergodic subsource is transmitted, and the coding scheme for the ergodic sub-source could be applied.

To obtain the analogous lower bound, we need to introduce some notation. Let µ be an ergodicf-invariant Borel proba-bility measure onX. Forx_∈X,n_∈N_,_{ε >}₀_and_r_∈_(0,₁₎

define the sets

B(x, n, ε, r) := n

y_∈X : 1 n

0_≤k < n:d(fk(x), fk(y))_≤ε >1−r o

.

A setF _⊂X is called(n, ε, r, δ)-spanning forδ_∈(0,1)if

µ [ x∈F

B(x, n, ε, r)≥1−δ.

We writerspan(n, ε, r, δ)for the minimal cardinality of such

a set and define

hr(f, µ, ε, δ) := lim sup n→∞

1

nlogrspan(n, ε, r, δ).

According to [31, Thm. A], for everyρ_∈(0,1) we have

lim

r↓0limε↓0hr(f, µ, ε, ρ) =hµ(f).

Sincehr(f, µ, ε, ρ)is increasing asr↓0and asε↓0, we can

write this as

hµ(f) = sup

r>0supε>0hr(f, µ, ε, ρ) =(r,εsup)∈R2

>0

hr(f, µ, ε, ρ).

In particular, this implies

hµ(f) = lim

ε↓0hϕ1(ε)(f, µ, ϕ2(ε), ρ) (22)

for any functionsϕ1, ϕ2 withlimε↓0ϕi(ε) = 0,i= 1,2.

In the following theorem, we will restrict ourselves to periodic coding and estimation schemes using finite memory. The result of Theorem V.2 is aligned with the celebrated result of Ornstein [27], which states that for two stationary dynamical systems to be metrically isomorphic (i.e., be related with a bijective map on a set of measure 1) to each other through a stationary map, it must be that their entropies are identical. It is then plausible to claim that, for (E3) to hold, the entropy of

ˆ

xt, which is upper bounded by the channel capacity, must be

at least the metric entropy ofxt. Note though that in our result,

we do not require the encoder to be a stationary mapping or the processxtto be stationary; but we do restrict the encoder

to be periodic.

V.2 Theorem: Assume that π0 is an ergodic measure for f.

Additionally, assume that there existsτ >0so that the coder mapδtis of the form

qt=δt(x[t−τ+1,t])

andδt+τ ≡δt. Further assume that the estimator map is of the

form

ˆ

xt=γt(q[t−τ+1,t])

and alsoγt+τ ≡γt. Then the smallest channel capacity above

which (E3) can be achieved for everyε >0satisfies

C0≥hπ0(f).

Proof:The proof is subdivided into three steps.

Step 1.Fixε >0and consider a coding and estimation policy that achieves (5) forε/2. Then we findT ∈N_{so that}

E[d(xt,xˆt)p]≤ε for allt≥T. (23)

For eacht≥T we decomposeX into the two sets

X1t:=

z∈X :d(z,xˆt(f−t(z)))p<√ε , Xt

2:=

(8)

Now assume to the contrary that for some t_≥T,

π0(X2t)>

√

ε. (24)

In contradiction to (23), this implies

E[d(xt,xˆt)p] =

where we used thatπ0 isf-invariant. Hence, we obtain

π0(X2t)≤

√

ε. (25)

Step 2. By Birkhoff’s Ergodic Theorem, the measure of a set determines how frequently this set is visited over time. We want to know the frequency of how often the sets Xt

2 are

visited by the trajectories of f. More precisely, we want to achieve that

about the coding and estimation policies. Using the notation

αt(z) :=d(z,xˆt(f−t(z)))2, the setX2t can be written as

Hence, we have the same periodicity for the setsXt

2, i.e.,

Hence, Birkhoff’s Ergodic Theorem applied to fτ _implies

lim

The cardinality of En is dominated by the largest possible

number of coding symbols the estimator has available at time

T+n−1, i.e.,

place of y. Hence, the number if i’s, where both holds, is ≥n(1₋4√ε). In this case, (28) implies

d(fT+i_{(y), f}T+i_(x))

≤22√p_ε. Hence,

(9)

Since π0(fT( ˜Xm0)) =π0( ˜Xm0)≥1−ρ, we thus obtain

π0

[

x∈Fn

B(fT(x), n,22√p_ε,₄√_ε)

≥1−ρ,

showing that the set fT_(F

n) is (n,22√pε,4√ε, ρ)-spanning.

Consequently,

h4√ε(f, π0,22√pε, ρ)≤lim sup

n→∞

1

nlog|M|

T+n₌_C.

Since ε >0 was chosen arbitrarily, by (22) this implies

C_≥lim ε↓0h4

√_ε_{(f, π}₀_,₂2√p_{ε, ρ) =}_h

π0(f),

completing the proof.

VI. DISCUSSION

Hierarchy of estimation criteria. Our results together with

the results in the paper [21] by Matveev and Pogromsky yield a hierarchy of estimation criteria for a deterministic nonlinear dynamical system

xt+1=f(xt)

with a continuous map (or a homeomorphism)f : X → X

on a compact metric space (X, d). These estimation criteria, ordered by increasing strength, are the following.

(1) Asymptotic estimation in expectation (criterion (E3)):

lim sup t→∞

Z

X

π0(dx0)d(ft(x0),xˆt(x0))p≤ε

with a Borel probability measureπ0 onX.

(2) Asymptotic almost sure estimation (criterion (E2)):

π0({x0∈X : lim sup

t→∞

d(ft_(x

0),xˆt(x0))≤ε}) = 1

(3) Eventual almost sure estimation (criterion (E1)):

π0({x0∈X :d(ft(x0),xˆt(x0))≤ε}) = 1, t≥T(ε).

(4) Deterministic estimation (called observabilityin [21]):

d(xt,xˆt)≤ε for allt≥T(ε).

Finally, two even stronger estimation criteria are considered in [21], which do not fit exactly into the above hierarchy, since they are based on an initial error d(x0,xˆ0) ≤ δ, which is

known to both coder and estimator, and can be much smaller than the desired accuracy ε. In terms of this maximal initial error δ, the criteria can be formulated as follows:

(5) There areδ_∗>0andG_≥1such that for allδ_∈(0, δ_∗),

d(xt,xˆt)≤Gδ for allt≥0.

This is calledregular observabilityin [21].

(6) There are δ_∗ >0, G_≥1 andg _∈ (0,1) such that for allδ_∈(0, δ_∗),

d(xt,xˆt)≤Gδgt for allt≥0.

This is calledfine observabilityin [21].

Let us denote byC₀(i),1_≤i_≤6, the smallest channel capacity for the above criteria. Then, modulo specific assumptions such as invertibility of f, ergodicity of π0 and finiteness of

entropies, our results together with those in [21] essentially yield the following chain of inequalities:

hπ0(f)≤C

(1)

0 ≤hπ0(f)

≤C₀(2)_≤htop(f; suppπ0)

≤C0(3)≤htop(f; suppπ0)

≤htop(f)≤C0(4)≤htop(f)

≤C0(5)

=C0(6).

For the last three lines, see [21, Thm. 8 and Lem. 14], and note that the second line can be replaced byC0(2) ≤hπ0(f)under

the assumptions of Theorem IV.2. Recently, in [22] the notion of restoration entropy was proposed as an intrinsic quantity of a system which characterizesC0(5) =C

(6)

0 . The results in

[21], [22] show that this quantity can be strictly larger than

htop(f).

Open questions. Since in most of our results we had to use

conditions more specific thanf-invariance ofπ0, the question

remains to which extent those results also hold without these specific assumptions. Particularly strong assumptions have been used in Theorem IV.2 and in Theorem V.2. We have not been able to find examples where these assumptions are violated and the results do not hold, so this is definitely a topic for future research. Finally, we remark that the assumption of invertibility of f was only used to obtain lower bounds for each of the objectives (E1)–(E3), while we did not use this assumption to estimate C0 from above. Hence, another open

question is whether the lower bounds also hold whenf is not invertible.

VII. COROLLARIES AND EXAMPLES

For differentiable maps, smooth ergodic theory provides a re-lation between the metric entropy and the Lyapunov exponents of a map, whose existence follows from Oseledec’s theorem (see Theorem VIII.3 in the appendix). In particular, we have the following corollary of our main theorems.

VII.1 Corollary: Assume thatfis aC1_{-diffeomorphism on a}

smooth compact manifold preserving an ergodic measure π0. Then, in the setting of Theorem IV.2 or Theorem V.1,

C0≤log(1 +⌊2

R

π0(dx)λ+(x)

⌋),

whereλ+_(x)_{is a shortcut for the sum of the positive Lyapunov} exponents atx. If f is a C2_{-diffeomorphism and}_π

0 an SRB measure (not necessarily ergodic), then, in the setting of Theo-rem IV.4 or TheoTheo-rem V.2,

C0≥

Z

π0(dx)λ+(x).

Proof:The first statement follows from the Margulis-Ruelle

(10)

(Theorem VIII.4 in the appendix). The second statement follows from the characterization of SRB measures for C2

-diffeomorphisms in terms of metric entropy (Theorem VIII.5

in the appendix).

VII.2 Example: Consider the map

f(θ, x, y) = (2θ, x/4 + cos(2πθ)/2, y/4 + sin(2πθ)/2),

regarded as a map on X := S1_×_D2_{, where} _S1 _{is the unit}

circle andD2_{the unit disk, respectively. It is known that}_f _has

an Axiom A attractorΛ, which is called thesolenoid attractor. In this case, there exists a unique SRB measureπ0 supported

on Λ. It is known that f is topologically transitive on Λ, implying that π0 is ergodic (cf. [32, Sec. 7.7]). The metric

entropyhπ0(f)is known to belog 2, cf. [4, Ex. 6.3]. Hence,

in this case

C0= log 2 = 1,

i.e., the channel must support a transmission of at least one bit at each time instant in order to make the state estimation objectives (E2) and (E3) achievable.

VII.3 Example: We consider the map

f(x, y) = (5−0.3y−x2, x), f :R2_→R2_.

The nonwandering set off, i.e., the set of all points(x, y)so that for every neighborhood U of (x, y) there isn_≥1 with

fn_(U₎

∩U ₆=_∅, can be shown to be uniformly hyperbolic, but it is not an attractor. Hence, there exists no SRB measure on this set. However, we can takeπ0to be the unique equilibrium

state of the potentialϕ(x) :=₋log_|Df(x)_|Eu x|(E

u

x being the

unstable subspace at x), i.e., the unique probability measure

π0 satisfying

Ptop(f;ϕ) =hπ0(f) + Z

ϕdπ0,

wherePtop(·)is the topological pressure.

The restriction of f to its nonwandering set is an Axiom A system. A numerical approximation of the metric entropy

hπ0(f)is given in [4, Ex. 6.4], namely

hπ0(f)≈0.655.

VII.4 Example: Consider a volume-preserving C2_-Anosov

diffeomorphism f : Tn _→ Tn _{of the} _n_{-dimensional torus.}

It is known that the volume measuremis ergodic for suchf. In this case,

where Ju_f _{stands for the unstable determinant of} _Df_{, i.e.,} Ju_f_{(x) =}

|det Df(x)_|Eu

x| with E

u

x being the unstable

sub-space at x. Observe that, in general, hm(f) 6= htop(f). In

fact, by [11, Cor. 20.4.5], in the case n = 2 the equality

hm(f) = htop(f) implies thatf is C1-conjugate to a linear

automorphismg:T2_→T2_{. Since} _C1_{-conjugacy implies that}

the eigenvalues of the corresponding linearizations along any

periodic orbits coincide, the existence of a _C1-conjugacy is a very strict condition. Hence, most area-preserving Anosov diffeomorphisms on T2 _{satisfy the strict inequality}_h_m_(f₎_<

htop(f).

VII.5 Example: An example for an area-preserving Anosov

diffeomorphism onT2₌R2_/Z2 _{is Arnold’s Cat Map, which}

is the group automorphism induced by the linear map onR2

associated with the integer matrix

A=

The fact that this matrix has determinant 1 implies that the induced map fA : T2 → T2, x+Z2 7→ Ax+Z2, is

area-preserving and invertible. The Anosov property follows from the fact that A has two distinct real eigenvalues λ1 and λ2

with|λ1|>1 and|λ2|<1, namely

At each point inT2_{, the Lyapunov exponents of}_f_A _{exist and}

their values arelog|λ1| andlog|λ2|. Pesin’s entropy formula

(see Theorem VIII.4 in the appendix) thus implies

hm(fA) = log|λ1|= log

The formula for the topological entropy given in the preceding example shows thathtop(fA) =hm(fA).

Finally, we present an example taken from [38] in which the assumptions of Theorem IV.2 are satisfied.

VII.6 Example: Letf : S1

→S1_{be an expanding circle map.}

(Observe that we did not use invertibility of f in Theorem IV.2.) Write S1 _{= [0,}_1]/

∼, where the equivalence relation ∼identifies the extremal points of the interval. Assume that there exists an open intervalI_⊂(0,1) so thatf_|I is injective

andf(I) = (0,1). Then the set

Λ := \ n≥0

f−n(S1\I)

is an f-invariant Cantor set and f_|Λ is expanding. By [38,

Ex. 5.2], we find ergodic measures µ of f_|Λ and partitions

P ofΛ (of arbitrarily small diameter) such that there exists

δ_∗>0 with

those measures, Theorem IV.2 yields

C0≤log(1 +⌊2hµ(f|Λ)⌋),

whereC0 is the smallest channel capacity above which (E2)

(11)

VIII. APPENDIX

In this appendix, we review some classial results of ergodic theory that are used in the paper. We start with the Shannon-McMillan-Breiman theorem, cf. [6].

VIII.1 Theorem: Let (X,_F, µ, f) be an ergodic

measure-preserving dynamical system, i.e., (X,_F, µ) is a probability space andµis ergodic w.r.t. the mapf : X _→X. Let_P be a finite measurable partition ofX. Then, for almost everyx_∈X, asn_{→ ∞},

−_n1logµ(_Pn(x))_→hµ(f;P),

where_Pn(x)is the element of_Pncontainingx.

If _P = _{P1, . . . , Pr}, for n ∈ N and x ∈ X, let in_{(x) = (i}n

0(x), . . . , inn−1(x)) be the element of {1, . . . , r}n

withfj_(x)_∈_P in

j(x)for0≤j < n. We calli

n_(x)_the_length-_n itinerary of xw.r.t. _P. Puttingh:=hµ(f;P), we also define

for each ε >0andn_∈N_{the set}

Σn,ε:=a∈ {1, . . . , r}n : 2−n(h+ε)≤ µ(_{x:in(x) =a_})_≤2−n(h−ε) .

As an immediate corollary of the Shannon-McMillan-Breiman theorem we obtain the following.

VIII.2 Corollary: Under the assumptions of Theorem VIII.1,

it holds that

µ(_{x_∈X :_∃m_∈N_with_in_(x)_∈_Σ_n,ε_, _∀_n_≥_m_}_{) = 1}

(29)

and

|Σn,ε| ≤2n(h+ε). (30)

Proof: The identity (29) follows from the observation that

µ(_Pn_{(x)) =}_µ(

{y_∈X :in_{(y) =}_in_(x)

}). From the estimates

1_≥µ(_{x:in(x)_∈Σn,ε})

= X

a∈Σn,ε

µ({x:in(x) =a})≥ |Σn,ε| ·2−n(h+ε)

the inequality (30) immediately follows.

In the ergodic theory of smooth dynamical systems, the most fundamental result is Oseledec’s theorem, also known as the Multiplicative Ergodic Theorem.

VIII.3 Theorem: Let(Ω,_F, µ)be a probability space andθ:

Ω_→Ωa measurable map, preservingµ. LetT : Ω_→Rd×d_be a measurable map such thatlog+_kT(_·)_{k ∈}L1_{(Ω, µ)}_{and write}

Txn:=T(θn−1(x))· · ·T(θ(x))T(x).

Then there isΩ˜ ⊂Ωwithµ( ˜Ω) = 1so that for allx∈ Ω˜ the following holds:

lim n→∞[(T

n x)∗(Txn)]

1/(2n)

=: Λx

exists and, moreover, if expλ(1)x < · · · < expλsx(x)

de-note the eigenvalues ofΛxandUx(1), . . . , Uxs(x)the associated

eigenspaces, then

lim n→∞

1 nlogkT

n

xvk=λ(xr)ifv∈Vx(r)\Vx(r−1),

whereVx(r)=Ux(1)+· · ·+Ux(r)andr= 1, . . . , s(x).

The numbers λ(xi) are called (µ-)Lyapunov exponents. We

apply the theorem to the situation, whenθis a diffeomorphism on a compact Riemannian manifold andT(x)is the derivative ofθ at x. In this situation, the Lyapunov exponents measure the rates at which nearby orbits diverge (or converge). Despite the fact this is a purely geometric concept, the Lyapunov ex-ponents are closely related to the metric entropy. Fundamental results here are the Margulis-Ruelle inequality [33] and Pesin’s entropy formula [29], which we summarize in the following theorem.

VIII.4 Theorem: LetM be a compact Riemannian manifold

andf : M → M aC1_{-diffeomorphism preserving a Borel} probability measureµ. Then

hµ(f)≤ Z

µ(dx)X i

max{0,dimUx(i)λ(xi)}

with summation over all Lyapunov exponents atx. Iff is a

C2_{-diffeomorphism and}_µ_{is absolutely continuous w.r.t.} Rie-mannian volume, then equality holds. If µ is ergodic, then the Lyapunov exponents are constant almost everywhere, and hence the integration can be omitted.

If a diffeomorphism does not preserve a measure absolutely continuous with respect to volume, it might still preserve a measure that is well-behaved with respect to volume in the sense that the measure determines the behavior of all trajecto-ries with initial values in a set of positive volume. This is made precise in the definition of an SRB measure (Sinai-Ruelle-Bowen measure). We will not give the technical definition here, but just mention the following result of Ledrappier and Young [15], which can in fact be regarded as one possible definition of an SRB measure.

VIII.5 Theorem: Letf : M _→ M be aC2_{-diffeomorphism}

of a compact Riemannian manifold, preserving a Borel proba-bility measureµ. Then Pesin’s entropy formula forhµ(f)holds

if and only ifµis an SRB measure.

REFERENCES

[1] H. Asnani, T. Weissman.Real-time coding with limited lookahead. IEEE Trans. Inform. Theory 59 (2013), no. 6, 3582–3606.

[2] V. S. Borkar, S. K. Mitter, and S. Tatikonda.Optimal sequential vector quantization of Markov sources. SIAM J. Control Optim. 40 (2001), no. 1, 135–148.

[3] T. Downarowicz.Entropy in dynamical systems. Cambridge University Press, 2011.

[4] G. Froyland. Using Ulam’s method to calculate entropy and other dynamical invariants. Nonlinearity 12 (1999), no. 1, 79–101. [5] R. M. Gray.Probability, Random Processes, and Ergodic Properties.

2nd edition, Springer, Dordrecht, 2009.

(12)

[7] R. M. Gray, D. L. Neuhoff. Quantization. Information theory: 1948– 1998. IEEE Trans. Inform. Theory 44 (1998), no. 6, 2325–2383. [8] T. Javidi, A. Goldsmith. Dynamic joint source-channel coding with

feedback. InInformation Theory Proceedings (ISIT), 2013 IEEE Inter-national Symposium on, pp. 16–20. IEEE, 2013.

[9] A. Katok.Lyapunov exponents, entropy and periodic orbits for diffeo-morphisms. Inst. Hautes ´Etudes Sci. Publ. Math. No. 51 (1980), 137– 173.

[10] A. Katok.Fifty years of entropy in dynamics: 1958–2007. J. Mod. Dyn. 1 (2007), no. 4, 545–596.

[11] A. Katok, B. Hasselblatt.Introduction to the Modern Theory of Dynam-ical Systems. Cambridge University Press, Cambridge, 1995.

[12] C. Kawan. Exponential state estimation, entropy and Lyapunov expo-nents. To be published in: Systems & Control Lett., arXiv:1605.03210, 2016.

[13] C. Kawan, S. Y ¨uksel.Entropy bounds on state estimation for stochastic non-linear systems under information constraints. Submitted, 2016. arXiv:1612.00564

[14] C. Kawan, S. Y ¨uksel. Metric and topological entropy bounds on state estimation for stochastic non-linear systems. Proc. International Symposium on Information Theory, 2017.

[15] F. Ledrappier, L.–S. Young.The metric entropy of diffeomorphisms. I. Characterization of measures satisfying Pesin’s entropy formula. Ann. of Math. (2) 122 (1985), no. 3, 509–539.

[16] D. Liberzon, S. Mitra. Entropy and minimal data rates for state estimation and model detection. InProceedings of the 19th International Conference on Hybrid Systems: Computation and Control, pp. 247–256. ACM, 2016.

[17] T. Linder, S. Y ¨uksel. On optimal zero-delay quantization of vector Markov sources. IEEE Trans. Inform. Theory 60 (2014), no. 10, 5975– 5991.

[18] T. Linder, R. Zamir.Causal coding of stationary sources and individual sequences with high resolution. IEEE Trans. Inform. Theory 52 (2006), no. 2, 662–680.

[19] A. Mahajan, D. Teneketzis. Optimal design of sequential real-time communication systems. IEEE Trans. Inform. Theory 55 (2009), no. 11, 5317–5338.

[20] A. S. Matveev.State estimation via limited capacity noisy communica-tion channels. Math. Control Signals Systems 20 (2008), no. 1, 1–35. [21] A. Matveev, A. Pogromsky.Observation of nonlinear systems via finite

capacity channels: constructive data rate limits. Automatica J. IFAC 70 (2016), 217–229.

[22] A. Matveev, A. Pogromsky.Two Lyapunov methods in nonlinear state estimation via finite capacity communication channels. IFACPapersOn-Line, 50(1):41324137, 2017.

[23] A. S. Matveev, A. V. Savkin.Estimation and control over communication networks. Control Engineering. Birkh¨auser Boston, Inc., Boston, MA, 2009.

[24] M. Misiurewicz.Topological entropy and metric entropy. Ergodic theory (Sem., Les Plans-sur-Bex, 1980) (French), pp. 61–66, Monograph. Enseign. Math., 29, Univ. Gen`eve, Geneva, 1981.

[25] G. N. Nair and R. J. Evans.Stabilizability of stochastic linear systems with finite feedback data rates. SIAM J. Control and Optimization 43 (2004), 413–436.

[26] S. E. Newhouse. Continuity properties of entropy. Ann. of Math. (2) 129 (1989), no. 2, 215–235.

[27] D. Ornstein. Bernoulli shifts with the same entropy are isomorphic. Advances in Math. 4 (1970), 337–352.

[28] D. Ornstein.An application of ergodic theory to probability theory. Ann. Probability 1 (1973), no. 1, 43–65.

[29] Ja. B. Pesin. Characteristic Ljapunov exponents, and smooth ergodic theory. (Russian) Uspehi Mat. Nauk 32 (1977), no. 4 (196), 55–112. [30] A. Pogromsky, A. Matveev.A topological entropy approach for

obser-vation via channels with limited data rate. IFAC Proceedings Volumes, 44(1):14416–14421, 2011.

[31] Y. Ren, L. He, J. L ¨u, G. Zheng.Topological r-entropy and measure-theoretic r-entropy of a continuous map. Sci. China Math. 54 (2011), no. 6, 1197–1205.

[32] C. Robinson. Dynamical Systems. Stability, Symbolic Dynamics, and Chaos. Studies in Advanced Mathematics. CRC Press, Boca Raton, FL, 1995.

[33] D. Ruelle.An inequality for the entropy of differentiable maps. Bol. Soc. Brasil. Mat. 9 (1978), no. 1, 83–87.

[34] A. V. Savkin. Analysis and synthesis of networked control systems: Topological entropy, observability, robustness and optimal control. Au-tomatica J. IFAC 42 (2006), no. 1, 51–62.

[35] P. C. Shields.The interactions between ergodic theory and information theory. Information theory: 1948–1998. IEEE Trans. Inform. Theory 44 (1998), no. 6, 2079–2093.

[36] S. Tatikonda.Control under Communication Constraints. PhD disserta-tion, Massachusetts Institute of Technology, Cambridge, MA, 2000. [37] D. Teneketzis. On the structure of optimal real-time encoders and

decoders in noisy communication. IEEE Trans. Inform. Theory 52 (2006), no. 9, 4017–4035.

[38] P. Varandas, Y. Zhao.Weak Gibbs measures: speed of convergence to entropy, topological and geometrical aspects. Ergodic Theory Dynam. Systems 37 (2017), no. 7, 2313–2336.

[39] J. C. Walrand, P. Varaiya.Optimal causal coding-decoding problems. IEEE Trans. Inform. Theory 29 (1983), no. 6, 814–820.

[40] H. S. Witsenhausen.On the structure of real-time source coders. Bell Syst. Tech. J, 58:1437–1451, July/August 1979.

[41] R. G. Wood, T. Linder, and S. Y ¨uksel. Optimal zero delay coding of Markov sources: stationary and finite memory codes. IEEE Trans. Inform. Theory 63 (2017), no. 9, 5968–5980.

[42] W. S. Wong and R. W. Brockett. Systems with finite communication bandwidth constraints - part ii: Stabilization with limited information feedback. IEEE Trans. on Automatic Control 42 (1997), 1294–1299, [43] S. Y ¨uksel. On optimal causal coding of partially observed Markov

sources in single and multi-terminal settings. IEEE Trans. Inform. Theory 59 (2013), no. 1, 424–437.