Controller Architecture for MJSs Based on Pattern-Learning

Chapter III: Learning Recurrent Patterns in a Jump Process

3.3 Controller Architecture for MJSs Based on Pattern-Learning

The controller architecture we propose is visualized in Figure 3.2. It consists of three main parts: 1) Mode Process Identification (ID), 2) Pattern-Learning for Prediction (PLP) on the mode process, and 3) Control Law Design for the system dynamics. In this section, we begin with a brief description of each part–including an introduction of the main notations used–

to provide a coherent view of the architecture (Figure 3.2) as a whole. The details of each individual part and the choice of algorithms used to implement them are discussed in the next sections: Mode Process ID in Section 3.3.1, PLP in Section 3.3.2, and Control Law Design in Section 3.3.3. We emphasize that our choice of algorithm to implement each component is unique to the uncertain linear discrete-time MJS setup described in Section 3.2 and that alternative implementations can be made for other dynamics; this will be demonstrated in the applications described in the subsequent chapters.

Cost Function Constraints

Pattern- Occurrence

Formulas

Plant Dynamics Mode Process

Mode Process ID

Control Law Design

Pattern- Learning for

Prediction (PLP)

Figure 3.2: A flow diagram representation of the proposed controller architecture specifically for linear MJS dynamics of the form (3.34). Circles represent inputs to the algorithm; user- defined inputs are colored blue and unknown/unobservable parameters are colored gray. The architecture consists of three main parts (violet boxes): 1) Mode Process ID (Section3.3.1), 2) Pattern-Learning for Prediction (Section3.3.2), 3) and Control Law Design (Section3.3.3).

3.3.1 Mode Process Identification

For each time t∈N and corresponding mode-index n,N[t], the system maintains the following estimated statistics about the mode process {ξ_n} and system dynamics (3.34): an estimate ˆP^(t) of the true TPM P, and an estimate ˆϕ^(t)n of the current mode ϕ_n. The first part of our architecture, Mode Process Identification (ID) is responsible for learning these unknown statistics of the mode process. We use hats and superscripts (t) to emphasize that these quantities are estimates which change over time.

Definition 27 (Mode Process Estimates). For each time t∈N and corresponding mode- index n,N[t], the system maintains the following estimated statistics about the mode process {ξ_n} and system dynamics (3.34): an estimate ˆP^(t) of the true TPM P, and an estimate ˆϕ^(t)n of the current mode ϕn. We use hats and superscripts (t) to emphasize that these quantities are estimates which change over time.

3.3.2 Pattern-Learning in the Mode Process

Once ˆP^(t) and ˆϕ^(t)n are obtained from Mode Process ID (Section 3.3.1) for each t∈N and n,N[t], Pattern-Learning for Prediction (PLP) in Figure 3.2 computes additional statistics about the mode process that facilitate the creation of predictions, which will be used in the Control Law Design component. These additional statistics are precisely the pattern- occurrence quantities described in Section 3.1, which can be solved using the method of Sec- tion 3.1.2for Markov chains. In the context of the uncertain linear MJS (3.34), however, we require some additional adjustments, detailed below.

Definition 28 (Prediction Horizon). Define the constantL∈Nto be the prediction horizon on the mode process, i.e., the length of the sequences of modes.

In the setup of the Markovian jump system from Section 3.2, “patterns” refer to finite- length sequences of modes in the mode process underlying the system (3.34), formalized in Definition13. Because we consider a fixed future horizon of lengthL, however, the length of each pattern ψ_k is d_k≡L for each k∈ {1,· · · , K}.

Definition 29 (Pattern-Occurrence Times for Uncertain MJS). Denote n,N[t]∈N to be the current mode-index at current time t∈N, and suppose the estimated current mode is ξ_n= ˆϕ^(t)n . Then for each of the patterns in the collection Ψ from Definition 13, define the following stopping times for each k∈ {1,· · · , K}:

τ_k|n^(t),min{i∈N|ξn= ˆϕ^(t)_n , ξn+i−L+1:n+i=ψk}. (3.35) Definition 30 (Time and Probability of First Occurrence for Uncertain MJS). Under the setup of Definition29and given the collection Ψ, the 1)minimum time of occurrence of any pattern in Ψ and 2) the first-occurrence probabilities (i.e., the probability that each pattern ψ_k will be the first observed) are given by:

τ_n^(t), min

k∈{1,···,K}τˆ_k|n^(t), qˆ_k^(t),P(ˆτ_n^(t)= ˆτ_k|n^(t)). (3.36) This means ξ_n+ˆ_τ(t)

n −L+1:n+ˆτn^(t) =ψ_k if patternψ_k will be the first observed.

To generate predictions from the mode process, we are interested in characterizing the pattern-occurrence quantities defined by Problem 1. In the context of the MJS (3.34), these are the following. First, we want the estimate E[ˆτn^(t)] of the mean minimum occurrence time, which counts the number of mode-indices to observe the occurrence of any pattern from

Ψ, given the estimated current mode ˆϕ^(t)n . Second, we want the estimated first-occurrence probabilities {ˆq_k^(t)}, where ˆq_k^(t)∈[0,1] is the probability that pattern ψk∈Ψ is the first to be observed among all of Ψ. Because the statistics of the mode process are estimates instead of true values, all the quantities from Section 3.1 are expressed with hats and superscript (t)s. Moreover, it becomes necessary to consider a pattern collection Ψ (from Definition 13) which varies with time.

Definition 31 (Time-Varying Collection). LetL∈Nbe the prediction horizon from Defini- tion 28. We construct the collection of patterns Ψ[t], with time-varying cardinality K[t]∈N, to be a subset of feasible length-L future sequences of modes given the estimated current mode ˆϕ^(t)n :

Ψ[t],{ψ₁^(t),· · · ,ψ_K[t]^(t) } ⊆ {feasible (α₁,· · · , α_L)|Pˆ^(t)[ ˆϕ^(t)_n , α₁]>0, α_i∈ X }. (3.37) That is, each patternψ_k∈Ψ[t] is feasible with respect to ˆP^(t) in the sense of Definition 20.

A key difference between here and Problem 1is that we keep the hat and superscript (t) in the τ and q_k quantities because we emphasize they are dependent on Ψ[t] and ˆP^(t), ˆϕ^(t)n from Section3.3.1, which change over time. Thus, we can obtain these formulas by extending the results derived from the Markov chain case of the previous Section3.1 to uncertain statistics Pˆ^(t) and ˆϕ^(t)n .

3.3.3 Control Law Design from Memory and Prediction

Let g:R⁺× X ×Rⁿ^x →Rⁿ^u be a generic function representing the mode-dependent state- feedback control law designed by the Control Law Design component in Figure 3.2. The Control Law Design component uses the expected occurrence time E[ˆτn^(t)] and probabilities {ˆq_k}^K[t]_k=1 computed from PLP (Section 3.3.2) to store the control policies of previously- occurred patterns and to schedule control policies in advance. This procedure is described more carefully in the following two propositions.

Proposition 1 (Scheduling Future Control Inputs). Suppose we are given the estimated pattern-occurrence quantities E[ˆτn^(t)] and {ˆq^(t)_k }_k from PLP. Let τ≡E[ˆτn^(t)] be the shorthand notation (with a temporary abuse of notation) for the estimated expected minimum occurrence time for the specific pattern collection Ψ[t] given estimated current mode ˆϕ^(t)n . To schedule a control law in advance, we simply choose the pattern ψ_k^(t)∈Ψ[t] corresponding to the largest occurrence probability ˆq_k^(t). Then, until mode-index τ, the future sequence of

Mode-Index

2 4 1 3 ... 3 1 4

Time

... 2

... ...

4 1 3 ... 3 1 4

... ... ...

... ...

Figure 3.3: A visualization of PLP with the pattern collection of three patterns described in Example 2. NoteL = 3. The middle row of circles shows the evolution of ˆϕ^(t)n over time, while the bottom row of boxes shows the process on the mode timescale (see Assumption 7).

The red circle and box indicate the estimated mode at current time t∈N. The blue circles indicate the expected pattern which is first to occur; in this example, (3,1,4) has the highest first-occurrence probability among any pattern in the collection Ψ[t]. Control input sequences are then scheduled according to Proposition 1, shown in the green box.

control inputs u[t:T_n+bτc+1−1] is

u[s] =g(s, ψ_k,1^(t),x[s]), s ∈[t:T_n+1−1] (3.38) ...

u[s] =g(s, ψ_k,L^(t),x[s]), s∈[T_n+bτc:T_n+bτc+1−1].

Aside from operating on a longer timescale (mode process instead of system dynamics), Proposition 1 is similar in principle to standard model predictive control (MPC): only the first control law in the sequence (3.38), corresponding to the first mode ψ_k,1^(t), is applied at the next mode-indexn+bτc.

Proposition 2(Storing Past Control Inputs in Memory). DefineU to be a table which maps mode patterns ψ_k^(t) to control policies {g(t, ψ_k,1^(t),·),· · · , g(t, ψ_k,L^(t),·)} and the accumulated state and control trajectories over each occurrence time. Whenψ^(t)_k ∈Ψ[t] is first observed, a new entry U[ψ_k^(t)](t), defined by (3.38) for the specificψ_k^(t), is created. For anticipated future occurrences ofψ^(t)_k , the system predicts future control inputs using control gain U[ψ_k^(t)](t) in the form of (3.38). The table entry for ψ_k^(t) is then updated at every occurrence time after its first.

Dalam dokumen Control and State-Estimation of Jump Stochastic Systems by Learning Recurrent Spatiotemporal Patterns (Halaman 96-101)