Dynamic Mode Decomposition - Mathematical Analysis on Spatio-Temporal Criminal Patterns

the level. This might seems to be the connection between the job and the substance as the common perception of society found in movies, TV, and novels.

III Data analysis using Dynamic Mode Decomposition

When we observe real-world phenomena, the measurement of given phenomena is abundant. Using this huge amount of data, we consider some functions or, governing equations of the system of phenomena in the evolution of time. Dynamical System is a tool for describing phenomena with function or equation i.e, a mathematical framework. Generally, dynamical system is used for analyzing real-world phenomena to obtain some purpose related to the situation such as prediction of the future state, estimation of state, control of feedback, and so on. Generally, dynamical system is described by the following form

dt = f(x,t,µ) (3.1)

wherex(t)∈Rⁿ are state vectors at timet,µ is parameters of the system, and f is a vector field that describes the dynamics of the phenomena. Here, This form(3.1)is continuous in time. We also have a discrete-time version of dynamical system

x_k+1=F(xk,t) (3.2)

This form(3.2)is also known as a map or, flow.

In the real-world problem, we have abundant data while the dynamics f or flowF remain elusive such as finance, epidemiology, and neuroscience. In other words, we face the problem that the dynamics of dynamical systems may not be known. We more focus on the data to drive the dynamics of given system. Even for the classical problems like fluid and turbulence, we focused on the abundant data of the phenomena because of the lack of the principles in the natural to describe the phenomena, and many techniques are related to data-driven approaches.

To identify the unknown dynamics with abundant data, regression is the most common technique that can be considered in this case. The method that will be introduced in this subsection is the dimensional reduction method related to the regression technique, called Dynamic Mode Decomposition. DMD was first introduced by Schmid [95, 96] in applied fluid dynamics. DMD identifies spatial patterns associated with frequencies, and growth, decay rate, or oscillation movement related to the behavior of the given system. The approach of DMD developed by Schmid [95, 96] is very close to Proper Orthogonal Decomposition (POD) which is the reduction of complexity in fluid dynamics. In [96], DMD algorithms are first introduced and developed by Singular Value Decomposition (SVD). Later then, Tu et al [109] developed theexactDMD. Also, DMD has a connection to the spectral analysis of Koopman Operator [92]. We will introduce each type of DMD algorithm and, propose our algorithm using to analyze the spatio-temporal crime data.

generally, we set these in complex value. But, it’s beyond our scope. So, we just assume that data is real-valued since it’s the occurrence of crime. For the application of Dynamical Mode Decomposition such as fluid dynamics or non-linear dynamics which are very usually used, the number of the data point is much greater than the number of snapshots i.e,nm. The key idea of DMD algorithm is that there exists a linear operatorAthat connects the data

x_k+1≈Axk (3.3)

Here matrixAis a related matrix of the discrete-time system of locally approximate linear dynamics of dynamical system with form(3.1)collected the given data snapshots with dynamics f may or may not be known. This matrixAis often called a system matrix. The matrixAcan be obtained by following the least-square minimization problem [57, 109]

A∈minR^n×n m−1

∑

j=0

x_j+1−Ax_j

2 (3.4)

wherek·k₂is euclidean norm. For matrix form, we define data matrixX₁andX₂by

X₁=







| | |

x₀ x₁ · · · xm−1

| | |







, X₂=







| | |

x₁ x₂ · · · xm

| | |







For convention, we denote the related matrix asX₁andX₂, sometimes we more precisely denote following notation

V₀^m−1=







| | |

x₀ x₁ · · · xm−1

| | |







But, we just simply use the notationX₁andX₂. Recall that(3.3)may be written by the matrix form

X₂≈AX₁ (3.5)

And, we have the equivalent minimization problem of(3.4)

A∈minR^n×n

kX₂−AX₁k_F (3.6)

wherek·k_F is Frobenius norm of matrix defined bykAk_F =

∑ ∑ ai j

21/2

. The best fit of this problem (3.6)is

A=X₂X₁^† (3.7)

where † is the Moore-Penrose inverse. Note that this(3.7)is a linear regression of data onto the dynamics given byA.

At this point, this idea is a simple regression method of data. However, the difference between DMD and regression is dimension reduction. As we mentioned before, we considernmfor the case

of application of DMD. This meansX₁ andX₂ are tall and skinny form. For that reason, Ais a high- dimensional matrix that might be difficult to calculate explicitly and decompose. From the condition nm,

rank(A)≤min{rank(X₂),rank(X₂)}

Thus, the rank ofAis not more thanm. Therefore, at leastn−meigenvalues are zero. Therefore, we reduce the dimension to obtain the projection of data onto m−1 POD mode which means, we apply the calculation for low dimensional matrix ˜Aof the original matrixA. To analyze this matrix ˜A, we will apply the eigendecomposition. In sum, we have the following definition of DMD

Definition 3.1 (Dynamic Mode Decomposition [57, 109]) Suppose we have a dynamical system dx

dt =f(x,t) (3.8)

where x(t)∈Rⁿand F is the discrete-time flow map obtained by evolving(3.8)of time step∆t

x_k+1=F(xk) (3.9)

Let x₀,· · ·,xm∈Rⁿ are data collected from a dynamical system (3.8) and satisfying flow-map(3.9).

Define data matrix X₁and X₂associated with x₀,· · ·,xm

X₁=







| | |

x₀ x₁ · · · xm−1

| | |







, X₂=







| | |

x₁ x₂ · · · xm

| | |







Then, the dynamic mode decomposition is the eigendecomposition of the best-fit linear operator A of relation X₂≈AX₁i.e, A=X₂X₁^†. And, the eigenvector of A is called the DMD mode.

Also, note that ifAis diagonalizable, there exists eigen-pair(λj,λj)ofA, the solution of(3.3)can be written by the following form.

xk=Axk−1=A²xk−2=· · ·=A^kx₀=ΦΛ^kΦ⁻¹x₀=Φλ^kb=

∑

^λ^k^j^b^j^φ^j

whereΦa matrix whose columns are eigenvectorφj andΛare diagonal matrix whose diagonal entries are eigenvaluesλj. Here,bis coefficient of initial conditionx₀with eigenvector basis,b=Φ⁻¹x₀. With this idea, the digonalizability ofAis an important part of the reconstruction or prediction of future state from the initial conditionx₀. Moreover, the existence of matrixAis also important since DMD method is meaningless without its existence. Thus, the existence ofAsatisfyingX₂=AX₁is another important part of the reconstruction and DMD.

After the application of DMD, we have DMD eigenvalue λj and DMD mode θj. These pair will help to understand the behavior of dynamic collected by data. Moreover, to emphasize the specific DMD mode, we need to calculate DMD amplitudeaj. With this amplitude, we can pick the dominant mode by checking

a_j

. Traditionally, DMD amplitudes is given bya=Θ^†x₀ wherea= (a₁,· · ·,am) andΘis matrix whose columns are DMD modes.

The Arnoldi approach

We first check the first approach of DMD, the Arnoldi-like method, introduced in [95, 96]. This method was studied and focused on the connection to Koopman operator theory [91]. From the definition of the Data matrix, the columns ofX₁ andX₂ share the same columns. So, each column inX₂can be written in the linear combination of the columns inX₁ except for the last columnx_m. So, there might be error terms forxm

xm=c₀x0+c₁x1+· · ·+cm−1xm−1+r (3.10) From the assumption that(3.3), others have the relationxk+1=Axk. In matrix form,

xm=X₁c+r (3.11)

wherecis coefficients(c1,· · ·,cm−1)andris residual part. Combining these results, we have

AX₁=X₂=X₁C_c+re^T_m (3.12)

whereC_cis companion matrix defined by

C_c=







0 0 · · · 0 a₀ 1 0 · · · 0 a₁ 0 1 · · · 0 a₂ ... ... . .. ... ... 0 0 · · · 1 am−1







andem−1is unit vector with m-th entry is 1 inR^m−1. The main purpose of this approach is minimizing the error r and approximating matrixA using companion matrixC_c. Using the Arnoldi method, we find the decompositionAQ≈AHwith QR decomposition toX₁i.e,X₁=QRand,H=RC_cR⁻¹. Then, eigenvalues of H approximates some eigenvalues of A [96]. To compute H, we first calculated the companion matrixC_c, more precisely, coefficientc. This is least square solution of(3.11)given by

c=R⁻¹Q^Hxm=X₁^†x_m (3.13)

whereX₁=QRis economy size QR-decomposition. Letλj andφj are eigenvalue and eigenvector of C_c. Then,λj are DMD eigenvalue. And, DMD modeθjand the amplitudeajare obtained byθj=X₁φj

anda=Θ^†x₀ wherea= (a1,· · ·,a_m)andΘis matrix whose columns are DMD modes [1]. Also, the residualrcan be calculated byr=xm−Xc=xm−X₁X₁^†xm.

This approach is mathematically correct. But, practical implementation shows that this algorithm is ill-condition and the method is developed using another method [96].

Singular Value Decomposition Based Approach

To achieve the more numerically stable method, we consider Singular Value Decomposition (SVD). Re- call that we have conditionnm. So,n−mnumbers of eigenvalues ofAare equal to zero. So, at most

mnon-zero singular values. Thurs, we only focused on the economy SVD. Moreover, as we mentioned at the start of this section, the difference between DMD and regression algorithms is dimension reduction. To state that, we consider the truncated SVD of the matrix. The Eckart-Young theorem state the optimal rankr approximation to the matrix in the Frobenius norm is the SVD form of rankr. We will use this method for the dimensional reduction on the data matrix.

Consider X₁ andX₂ are data matrix associated with datax0,· · ·,xm∈Rⁿ. We apply the truncated SVD with rankr

X₁=UrΣrV_r^∗ (3.14)

whereU_r∈R^n×r,Σr∈C^r×randV ∈C^m×rand∗denote conjugate transpose. Here, rankris important since it’s the dimensional reduction of DMD method. This threshold of truncation is developed by Gavish and Donoho from noisy data [38]. Also, the columns of the matrixUrare POD modes, and the columns of the matrixV_rare orthonormal. Then, from the solution of(3.6),

A=X₂V_rΣ⁻¹_r U_r^∗ (3.15)

Now, we consider the projection ofAonto POD modes since we only focus on the r eigenvalues and eigenvector ofA.

A˜ :=U_r^∗AUr∈R^r×r (3.16)

More precisely, the matrix ˜A defines a low-dimensional of our dynamical system(3.3)on Proper Or- thogonal Decomposition related withUr.

x_k+1=A˜x˜_k x_k=U_rx˜_k (3.17)

So, we compute the eigendecomposition of ˜A,

AW˜ =WΛ (3.18)

whereΛis a diagonal matrix whose entries are eigenvalueλj of ˜Aand column ofW are corresponding eigenvectorsw_j. Then, the DMD modes corresponding to DMD eigenvalueλjare given by

θj=U w_j Θ=U_rW (3.19)

Note that DMD modes are eigenvectors of matrixAand corresponding eigenvalues areΛ AΘ= X₂VrΣ⁻¹_r U_r^∗

UrW

=X₂VrΣ⁻¹_r W

=UrU_r^∗X₂VrΣ⁻¹_r W

=UrAW˜

=UrWΛ

=ΘΛ

The DMD amplitudeacan be obtained by same waya=Θ^†x0. DMD mode(3.19)is introduced by Schmid [96]. We called this DMD mode byprojected modesince this is the projection onto POD modes.

From(3.19), DMD mode lies in column space of left singular ofU. This implies that DMD mode lies in the image of X i.e, column space ofX₁. However,exacteigenvector ofAlies in column space ofX₂ sinceA=X₂X₁^†. So, there might be some gaps. But, in practical implementation,X₁ andX₂ have the same column space. The more delicate result is proven in [109].

In many cases of DMD, we consider nm. Then, the computation ofA directly is inefficient.

However, calculating(3.16)is related to that computation. So, we use(3.15)to(3.16) A˜=U_r^∗AUr=U_r^∗ X₂VrΣ⁻¹_r U_r^∗

Ur=U_r^∗X₂VrΣ⁻¹_r (3.20) Using this, we reduce the computational cost. Moreover, computational efficiency for SVD ofXis also studied in [24, 103]. More exact method is developed in [109]. Here, define the matrixΦby

Φ=X₁V S⁻¹WΛ⁻¹, φj= 1 λj

X₂VrΣ⁻¹_r wj (3.21)

The formula (3.21) is called exact DMD modes since these are the exact eigenvector of the matrix A[109]. Now, check that(3.21)is a eigenvector ofAwith eigenvalueλ.

Aφj=A1 λj

X₂VrΣ⁻¹_r wj= 1 λj

X₂VrΣ⁻¹_r U_r^?X₂VrΣ⁻¹_r wj= 1 λj

X₂VrΣ⁻¹_r Aw˜ j=X₂VrΣ⁻¹_r wj=λjφj

Note that this method identifies nonzero eigenvalues of A. Also, the DMD amplitude a is slightly different for exact DMD [109].

a=Λ⁻¹Φ^†x₀ (3.22)

Now, we introduce further DMD for analyzing spatio-temporal data. In this subsection, we now have the modern algorithm of DMD as stated in [109]. We will insert more concepts for visualization of DMD and error-free reconstruction introduced in [55, 56].

Before introducing more concepts, we go back to the definition of DMD. DMD is a dimensional reduction, data-driven method. For this reason, the procedure of DMD does not require the explicit form of the dynamic f and the computational cost is more than full matrix computation. Also, after the decomposition, we have spectral components of dynamics collected from the data. From the output of DMD;

DMD mode, eigenvalues, and amplitude tells the spatial identity of dynamics by frequencies and growth (or decay, oscillation). But, Krake et al [55] suggest the problems of DMD components, misunderstand- ing of visualization of DMD, and the relevance of the component to a given system and compares with Discrete Fourier Transformation (DFT) to handle this problem. When dealing with functions that are very difficult to analyze over time domain, Fourier transform decomposes them into frequency domains and is used to facilitate analysis. DFT is an approximation of the Fourier transform on the discrete data.

The output of DFT is a form of ˆ xk=

n−1

∑

j=0

Wjxj=

n−1

∑

j=0

xje^−2π^j/n

Here,W_j=e^−2π^j/nare root of unity. This form describes that DFT decomposes the datax_kinto complex numbers with the root of unity. Note that these roots of unity can be expressed by exponential form with

magnitude 1 and angleωj. This implies the result of DFT is determined by a real frequency fj=ωj/2π in the complex domain. Then, DMD computes the DMD eigenvalues, modes, and amplitudes which are related to complex frequency and growth(decaying) rate. This indicates that DMD outputs are more complicated while DFT output is determined by one real frequency. With this idea, Krake et al [55]

propose the following outputs. With same definition ofX₁,X₂, ˜A,W andφj,

a=Λ⁻¹Φ^†x₁ (3.23)

First, the traditional amplitudes are computed by the second snapshot, not the first one. Since in the application of DMD, we havemn. This effect on the zero eigenvalues as we mentioned before. So, this missing component will affect the reconstruction of the initial data. Also, define ascaled modeθj

θj=ajφj (3.24)

This scaled mode has a similar position ofxj in DFT and contains spatial decomposition and influence on the system.(3.24)adjusts the orientation and computation of influence by amplitudea_j.

Since DMD mode has a huge influence on the system, picking the dominant, the most influential modes are important to describe and reflect the system correctly. Traditionally, the norm of amplitude is the picking method, but the more great suggestion is given by [55]

λ_j^pajθj

(3.25)

wherepis time steps i.e, some number less than equal to the number of data snapshots. With thisp−th power, we can check that

λj

>1 or λj

<1 Thurs, our algorithm for DMD is the following.

Prediction Of Future State Using Dynamic Mode Decomposition

Now, we go back to the basic idea of DMD(3.3). If Ais diagonalizable, we can obtain thexk in the form of related eigenvalues and eigenvectors ofA. As we did in the DMD, we have DMD eigenvalueλj

and DMD modeφj which are eigenvalue and eigenvector ofA. So, we have the following future state prediction/reconstruction of data holds

x_k=

∑

j=1

λ^k_jajφj=ΦΛ^ka (3.26)

whereaare amplitude. Traditionally, calculated bya=Φ^†x₀. But, this takes some computational cost.

So, sometimes use this computation is will POD projection [15]. As we mentioned before, for the situationmnleads to some missing component, we adapta=Λ⁻¹Φ^†x₁. Also, the problem with this prediction is close to the following

1. ker(X1)⊂ker(X2) 2. The digonalizibility ofA 3. the exactness ofAi.e,X₂=AX₁

The result of these statements is prove in [56, 109].

Algorithm 1Dynamic Mode Decomposition Algorithm Input is datax₀,· · ·,xm∈R^b.

1: Define data matrixX₁andX₂associated with input data

X₁=







| | |

x₀ x₁ · · · xm−1

| | |







, X₂=







| | |

x₁ x₂ · · · xm

| | |







2: Compute the reduced SVD ofX₁with rankr.

X₁=UrΣrV_r^?

3: Define DMD matrix ˜A

A˜ =U_r^?X₂V_rΣr 4: Compute the DMD eigenvalueλjand eigenvectorw_j of ˜A

5: Calculate the DMD modeφj

Φ=X₁VrΣ⁻¹_r W, φj=X₁VrΣ⁻¹_r wj 6: Compute the DMD Amplitudea

a=Φ^†x₁

7: Compute the Scaled Modeθj

θj=a_jφj

8: Compute the dominant structure index wherekis specific time step (Simply picking by p=m)

λ_j^pa_jθj

Output is DMD eigenvalueλj∈C, (Scaled) DMD modeθj∈Cⁿand DMD amplitudea_j∈C.

Dalam dokumen Mathematical Analysis on Spatio-Temporal Criminal Patterns (Halaman 50-58)