Large Scale Data Analysis Using Deep Learning

(1)

U Kang 1

Large Scale Data Analysis Using Deep Learning

Linear Algebra U Kang

Seoul National University

(2)

In This Lecture



Overview of linear algebra (but, not a comprehensive survey)



Focused on the subset most relevant to deep learning



For details on linear algebra, refer to Linear

Algebra and Its Applications by Gilbert Strang

(3)

U Kang 3

Scalar



A scalar is a single number



Integers, real number, rational numbers, etc.



We denote it with italic font



a, n, x

(4)

Vectors



A vector is a 1-D array of numbers



Can be real, binary, integer, etc.



Notation for type and size

(5)

U Kang 5

Matrices



A matrix is a 2-D array of numbers



Example notation for type and shape

(6)

Tensors



A tensor is an array of numbers, that may have

 Zero dimensions, and be a scalar

 One dimension, and be a vector

 Two dimensions, and be a matrix

 Three dimensions or more

 Matrix over time

Knowledge base (subject, verb, object)

(7)

U Kang 7

Matrix Transpose

 (A^T)_i,j = A_j,I

 (AB)^T = B^TA^T

(8)

Matrix Product

 C = AB

(9)

U Kang 9

Matrix Product

 Matrix product as sum of outer product

 Product with a diagonal matrix

(10)

Identity Matrix

 Example identity matrix: I₃

(11)

U Kang 11

Systems of Equations

expands to

(12)

Solving Systems of Equations



A linear system of equations can have:

 No solution

 3x + 2y = 6, 3x + 2y = 12

 Many solutions

 3x + 2y = 6, 6x + 4y = 12

 Exactly one solution: this means multiplication by the matrix is an invertible function

 𝐴𝐴𝐴𝐴 = 𝑏𝑏  𝐴𝐴 = 𝐴𝐴⁻¹𝑏𝑏

(13)

U Kang 13

Matrix Inversion

 Matrix inverse: an inverse A^-1 of an nxn matrix A satisfies

𝐴𝐴𝐴𝐴

⁻¹

= 𝐴𝐴

⁻¹

𝐴𝐴 = 𝐼𝐼

_𝑛𝑛

 Solving a system using an inverse

(14)

Invertibility

 Matrix A cannot be inverted if

 More rows than columns

 More columns than rows

 Redundant rows/columns (“linearly dependent”, “low rank”)  full rank

 The number 0 is an eigenvalue of A

 A non-invertible matrix is called a singular matrix

 An invertible matrix is called non-singular

(15)

U Kang 15

Linear Dependence and Span

 Linear combination of vectors {𝑣𝑣 ¹ , … , 𝑣𝑣 ^𝑛𝑛 } : ∑_𝑖𝑖 𝑐𝑐_𝑖𝑖𝑣𝑣^(𝑖𝑖)

 The span of a set of vectors is the set of all points

obtainable by linear combination of the original vectors

 Matrix-vector product Ax can be viewed as a linear combination of column vectors of A: 𝐴𝐴𝐴𝐴 = ∑_𝑖𝑖 𝐴𝐴_𝑖𝑖𝐴𝐴_:,𝑖𝑖

 The span of columns of A is called column space or range of A

 A set of vectors is linearly independent if no vector in the set is a linear combination of the other vectors

  linearly dependent

(16)

Rank of a Matrix

 Rank of a matrix A: Number of linearly independent columns (or rows) of A

 For example:

 Matrix A = ’s rank ?

 Why?

(17)

U Kang 17

Norms

 Functions that measure how “large” a vector is

 Similar to a distance between zero and the point represented by the vector

 Formally, a norm is any function f that satisfies the following:

(18)

Norms

 L^p norm

 Most popular norm: L2 norm, p = 2

 Euclidean distance

 For matrix/tensor, Frobenius norm (L_F) does the same thing

 L1 norm, p = 1:

 Called ‘Manhattan’ distance

 Max norm, infinite p:

(19)

U Kang 19

Special Matrices and Vectors

 Diagonal matrix

 2 by 2 : 𝑎𝑎

𝑏𝑏

 We use diag(v) to denote the diagonal matrix where v is the vector containing the diagonal elements

 Inverse of diagonal matrix is computed easily

(20)

Special Matrices and Vectors

 Unit vector

 Symmetric matrix

 Orthogonal matrix

(21)

U Kang 21

Eigenvector and Eigenvalue

 Eigenvector and eigenvalue of A

 Eigenvalue/eigenvector pairs are defined only for square matrix 𝐴𝐴 ∈ 𝑅𝑅^{𝑛𝑛×𝑛𝑛}

 # of eigenvalues: n

 There may be duplicates

 Eigenvalues and eigenvectors can contain real or imaginary numbers

(22)

Intuition



A as vector transformation

2 1 1 3

A

1 0

x

2 1

x’

= x

x’

(23)

U Kang 23

Intuition



By defn., eigenvectors remain parallel to them selves (‘fixed points’)

2 1 1 3

A

0.52 0.85

v₁ v₁

=

0.52

3.62 * 0.85

λ₁

(24)

Eigendecomposition

 Eigendecomposition of a matrix

 Let A be a square (n x n) matrix with n linearly independent eigenvectors (=diagonizable)

 Then A can be factorized as

where V is an (nxn) matrix whose i th column is the i th

(25)

U Kang 25

Eigendecomposition

 Every real symmetric matrix has a real, orthogonal eigendecomposition

 A real orthogonal matrix is a square matrix with real entries whose columns and rows are orthogonal unit vectors (orthonormal vectors)

 𝑄𝑄^𝑇𝑇𝑄𝑄 = 𝑄𝑄 𝑄𝑄^𝑇𝑇 = 𝐼𝐼

 𝑄𝑄^𝑇𝑇 = 𝑄𝑄⁻¹

(26)

Eigendecomposition

 Interpreting matrix-vector multiplication Ax using eigendecomposition

(27)

U Kang 27

Eigendecomposition

 Understanding optimization of 𝑓𝑓 𝐴𝐴 = 𝐴𝐴^𝑇𝑇𝐴𝐴𝐴𝐴, s.t.

| 𝐴𝐴 |₂ = 1, using eigenvalues and eigenvectors

(28)

Eigendecomposition

 Positive definite matrix: a real symmetric matrix whose eigenvalues are all positive

 Positive semidefinite matrix

 Negative definite matrix

 Negative semidefinite matrix

 For positive semidefinite matrix A, ∀𝐴𝐴, 𝐴𝐴^𝑇𝑇𝐴𝐴𝐴𝐴 ≥ 0.

 For positive definite matrix A, ∀𝐴𝐴 ≠ 0, 𝐴𝐴^𝑇𝑇𝐴𝐴𝐴𝐴 > 0

(29)

U Kang 29

Singular Value Decomposition (SVD)



Similar to eigendecomposition



More general; matrix need not be square

𝐴𝐴 = 𝑈𝑈Λ𝑉𝑉

^𝑇𝑇

(30)

SVD - Definition

A

_{[n x m]}

= U

_{[n x r]}

Λ

_[_{r x r]}

( V

_{[m x r]}

)

^T



A: n x m matrix (eg., n documents, m terms)



U : n x r matrix (n documents, r concepts)

 Left singular vectors



Λ : r x r diagonal matrix (strength of each

‘concept’) (r : rank of the matrix)



V : m x r matrix (m terms, r concepts)

(31)

U Kang 31

SVD - Definition

r x r r x m

n x r n x m

A U

Λ V

^T

A

_{[n x m]}

= U

_{[n x r]}

Λ

_{[ r x r]}

(V

_{[m x r]}

)

^T

= x x

(32)

SVD - Properties

THEOREM [Press+92]: always possible to

decompose matrix A into A = U Λ V^T, where

 U, Λ, V: unique (*)

 U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other)

 U^T U = I; V^T V = I (I: identity matrix)

 Λ: singular are positive, and sorted in decreasing order

(33)

U Kang 33

𝐴𝐴 ≈ 𝑈𝑈Λ𝑉𝑉

^𝑇𝑇

= �

𝑖𝑖

𝜎𝜎

_𝑖𝑖

(𝑢𝑢

_𝑖𝑖

𝑣𝑣

_𝑖𝑖^𝑇𝑇

)

Best rank-k approximation in L_F

A

m

n

m 𝚲𝚲

n

U

V^T

≈

SVD - Properties

(34)

SVD and eigendecomposition

 The left-singular vectors of A are the eigenvectors of AA^T

 The right-singular vectors of A are the eigenvectors of A^TA

 The non-zero singular values of A are the square root of the eigenvalues of A^TA

(35)

U Kang 35

Moore-Penrose Pseudoinverse

 Assume we want to solve A x = y for x

 What if A is not invertible?

 What if A is not square?

 Still, we can find the ‘best’ x by using pseudoinverse

 For an (n x m) matrix A, the pseudoinverse A⁺ is an (m x n) matrix

(36)

Moore-Penrose Pseudoinverse

 If the equation has:

 Exactly one solution: this is the same as the inverse

 No solution: this gives us the solution with the smallest error ||Ax – y||₂

 Over-specified case

(37)

U Kang 37

Pseudoinverse: Over-specified case

 No solution: this gives us the solution with the smallest error ||Ax – y||₂

 [3 2]^T [x] = [1 2]^T (i.e., 3x = 1, 2x = 2)

 ([3 2]^T)⁺ [1 2] = … = 7/13

 This method is called the ‘least square’

1 2 3 4 1

2

reachable points (3x, 2x) desirable point y

(38)

Pseudoinverse: Under-specified case

 Many solutions: this gives us the solution with the smallest norm of x

 [1 2] [w z]^T = 4 (i.e., 1 w + 2 z = 4)

1

2 x₀ all possible solutions z shortest-length solution

(39)

U Kang 39

Computing the Pseudoinverse

 The SVD allows the computation of the pseudoinverse:

(40)

Trace

 𝑇𝑇𝑇𝑇 𝐴𝐴 = ∑_𝑖𝑖 𝐴𝐴_{𝑖𝑖,𝑖𝑖}

 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝐴𝐴)

 𝑇𝑇𝑇𝑇 𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴^𝑇𝑇)

 | 𝐴𝐴 |_𝐹𝐹 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴^𝑇𝑇)

 For scalar a, 𝑇𝑇𝑇𝑇 𝑎𝑎 = 𝑎𝑎

 The trace of a matrix is also computed by the sum of all eigenvalues

(41)

U Kang 41

What you need to know



Linear algebra is crucial for understanding

notations and mechanisms of many ML/deep learning methods



Important concepts

 Matrix product, identity, invertibility, norm,

eigendecomposition, singular value decomposition, pseudo inverse, and trace

(42)

Questions?