Eigendecomposition and Diagonalization

Matrix Decompositions

4.4 Eigendecomposition and Diagonalization

Comparing the left-hand side of (4.45) and the right-hand side of (4.46) shows that there is a simple pattern in the diagonal elementslii:

l11=√

a11, l22= q

a22−l²₂₁, l33=^qa33−(l²₃₁+l₃₂² ). (4.47) Similarly for the elements below the diagonal (lij, wherei > j), there is also a repeating pattern:

l21= 1 l11

a21, l31= 1 l11

a31, l32= 1 l22

(a32−l31l21). (4.48) Thus, we constructed the Cholesky decomposition for any symmetric, positive definite 3×3 matrix. The key realization is that we can backward calculate what the componentslij for theL should be, given the values aij forAand previously computed values oflij.

The Cholesky decomposition is an important tool for the numerical computations underlying machine learning. Here, symmetric positive definite matrices require frequent manipulation, e.g., the covariance matrix of a multivariate Gaussian variable (see Section 6.5) is symmetric, positive definite. The Cholesky factorization of this covariance matrix allows us to generate samples from a Gaussian distribution. It also allows us to perform a linear transformation of random variables, which is heavily exploited when computing gradients in deep stochastic models, such as the varia- tional auto-encoder (Jimenez Rezende et al., 2014; Kingma and Welling, 2014). The Cholesky decomposition also allows us to compute determinants very efficiently. Given the Cholesky decomposition A = LL^⊤, we know thatdet(A) = det(L) det(L^⊤) = det(L)². SinceLis a triangular matrix, the determinant is simply the product of its diagonal entries so that det(A) = ^Q_il²_ii. Thus, many numerical software packages use the Cholesky decomposition to make computations more efficient.

4.4 Eigendecomposition and Diagonalization

Adiagonal matrix is a matrix that has value zero on all off-diagonal ele- diagonal matrix

ments, i.e., they are of the form D =







c1 · · · 0 ... . .. ...

0 · · · cn





. (4.49)

They allow fast computation of determinants, powers, and inverses. The determinant is the product of its diagonal entries, a matrix powerD^k is given by each diagonal element raised to the power k, and the inverse D⁻¹is the reciprocal of its diagonal elements if all of them are nonzero.

In this section, we will discuss how to transform matrices into diagonal

form. This is an important application of the basis change we discussed in Section 2.7.2 and eigenvalues from Section 4.2.

Recall that two matricesA,Dare similar (Definition 2.22) if there exists an invertible matrixP, such thatD =P⁻¹AP. More specifically, we will look at matricesAthat are similar to diagonal matricesD that con- tain the eigenvalues ofAon the diagonal.

Definition 4.19(Diagonalizable). A matrixA ∈ R^n×n isdiagonalizable

diagonalizable

if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix P ∈R^n×n such thatD=P⁻¹AP.

In the following, we will see that diagonalizing a matrix A ∈ R^n×n is a way of expressing the same linear mapping but in another basis (see Section 2.6.1), which will turn out to be a basis that consists of the eigenvectors ofA.

LetA∈R^n×n, letλ1, . . . , λn be a set of scalars, and letp₁, . . . ,p_nbe a set of vectors inRⁿ. We defineP := [p₁, . . . ,p_n]and letD ∈R^n×n be a diagonal matrix with diagonal entriesλ1, . . . , λn. Then we can show that

AP =P D (4.50)

if and only ifλ1, . . . , λn are the eigenvalues ofAandp₁, . . . ,p_n are cor- responding eigenvectors ofA.

We can see that this statement holds because

AP =A[p₁, . . . ,p_n] = [Ap₁, . . . ,Ap_n], (4.51) P D= [p₁, . . . ,p_n]







λ1 0

. ..

0 λn





= [λ1p₁, . . . , λnp_n]. (4.52) Thus, (4.50) implies that

Ap₁=λ1p₁ (4.53)

...

Ap_n =λnp_n. (4.54)

Therefore, the columns ofP must be eigenvectors ofA.

Our definition of diagonalization requires thatP ∈R^n×n is invertible, i.e., P has full rank (Theorem 4.3). This requires us to have n linearly independent eigenvectorsp₁, . . . ,p_n, i.e., thep_iform a basis ofRⁿ. Theorem 4.20(Eigendecomposition). A square matrixA∈R^n×n can be factored into

A=P DP⁻¹, (4.55)

where P ∈ R^n×n and D is a diagonal matrix whose diagonal entries are the eigenvalues ofA, if and only if the eigenvectors ofAform a basis ofRⁿ.

4.4 Eigendecomposition and Diagonalization 117

Figure 4.7 Intuition behind the

eigendecomposition as sequential transformations.

Top-left to bottom-left:P⁻¹ performs a basis change (here drawn inR²and depicted as a rotation-like operation) from the standard basis into the eigenbasis.

Bottom-left to bottom-right:D performs a scaling along the remapped orthogonal eigenvectors, depicted here by a circle being stretched to an ellipse. Bottom-right to top-right:P undoes the basis change (depicted as a reverse rotation) and restores the original coordinate frame.

p₁ p₂

e2 p₁

p₂

λ1p₁ λ2p₂

Ae1

Ae2

P⁻¹

P A

Theorem 4.20 implies that only non-defective matrices can be diagonalized and that the columns ofP are theneigenvectors ofA. For symmetric matrices we can obtain even stronger outcomes for the eigenvalue decomposition.

Theorem 4.21. A symmetric matrixS∈R^n×ncan always be diagonalized.

Theorem 4.21 follows directly from the spectral theorem 4.15. More- over, the spectral theorem states that we can find an ONB of eigenvectors ofRⁿ. This makesP an orthogonal matrix so thatD =P^⊤AP.

Remark. The Jordan normal form of a matrix offers a decomposition that works for defective matrices (Lang, 1987) but is beyond the scope of this

book. ♢

Geometric Intuition for the Eigendecomposition

We can interpret the eigendecomposition of a matrix as follows (see also Figure 4.7): LetAbe the transformation matrix of a linear mapping with respect to the standard basis ei (blue arrows). P⁻¹ performs a basis change from the standard basis into the eigenbasis. Then, the diagonal D scales the vectors along these axes by the eigenvalues λi. Finally, P transforms these scaled vectors back into the standard/canonical coordi- nates yieldingλip_i.

Example 4.11 (Eigendecomposition)

Let us compute the eigendecomposition ofA= ¹₂

5 −2

−2 5

Step 1: Compute eigenvalues and eigenvectors. The characteristic

polynomial ofAis

det(A−λI) = det 5

2−λ −1

−1 ⁵₂−λ

(4.56a)

= (⁵₂ −λ)²−1 =λ²−5λ+²¹₄ = (λ−⁷2)(λ−³2). (4.56b) Therefore, the eigenvalues ofAareλ1 = ⁷₂ andλ2 = ³₂ (the roots of the characteristic polynomial), and the associated (normalized) eigenvectors are obtained via

Ap₁ = 7

2p₁, Ap₂= 3

2p₂. (4.57)

This yields

p₁= 1

√2 1

−1

, p₂= 1

√2 1

. (4.58)

Step 2: Check for existence.The eigenvectorsp₁,p₂form a basis ofR². Therefore,Acan be diagonalized.

Step 3: Construct the matrixP to diagonalizeA.We collect the eigenvectors ofAinP so that

P = [p₁, p₂] = 1

√2

1 1

−1 1

. (4.59)

We then obtain

P⁻¹AP = ₇

2 0

0 ³₂

=D. (4.60)

Equivalently, we get (exploiting that P⁻¹ = P^⊤ since the eigenvectors

Figure 4.7 visualizes the

eigendecomposition ofA=

5 −2

−2 5

as a sequence of linear

transformations.

p₁andp₂in this example form an ONB) 1

5 −2

−2 5

| {z }

= 1

√2

1 1

−1 1

| {z }

2 0

0 ³₂

| {z }

√1 2

1 −1 1 1

| {z }

P⁻¹

. (4.61)

Diagonal matrices D can efficiently be raised to a power. Therefore, we can find a matrix power for a matrixA ∈^R^n×n via the eigenvalue decomposition (if it exists) so that

A^k = (P DP⁻¹)^k =P D^kP⁻¹. (4.62) ComputingD^k is efficient because we apply this operation individually to any diagonal element.

Assume that the eigendecompositionA=P DP⁻¹exists. Then, det(A) = det(P DP⁻¹) = det(P) det(D) det(P⁻¹) (4.63a)

Dalam dokumen MATHEMATICS FOR MACHINE LEARNING (Halaman 121-125)