• Tidak ada hasil yang ditemukan

Large Scale Data Analysis Using Deep Learning

N/A
N/A
Protected

Academic year: 2024

Membagikan "Large Scale Data Analysis Using Deep Learning"

Copied!
42
0
0

Teks penuh

(1)

U Kang 1

Large Scale Data Analysis Using Deep Learning

Linear Algebra U Kang

Seoul National University

(2)

In This Lecture

Overview of linear algebra (but, not a comprehensive survey)

Focused on the subset most relevant to deep learning

For details on linear algebra, refer to Linear

Algebra and Its Applications by Gilbert Strang

(3)

U Kang 3

Scalar

A scalar is a single number

Integers, real number, rational numbers, etc.

We denote it with italic font

a, n, x

(4)

Vectors

A vector is a 1-D array of numbers

Can be real, binary, integer, etc.

Notation for type and size

(5)

U Kang 5

Matrices

A matrix is a 2-D array of numbers

Example notation for type and shape

(6)

Tensors

A tensor is an array of numbers, that may have

Zero dimensions, and be a scalar

One dimension, and be a vector

Two dimensions, and be a matrix

Three dimensions or more

Matrix over time

Knowledge base (subject, verb, object)

(7)

U Kang 7

Matrix Transpose

(AT)i,j = Aj,I

(AB)T = BTAT

(8)

Matrix Product

C = AB

(9)

U Kang 9

Matrix Product

Matrix product as sum of outer product

Product with a diagonal matrix

(10)

Identity Matrix

Example identity matrix: I3

(11)

U Kang 11

Systems of Equations

expands to

(12)

Solving Systems of Equations

A linear system of equations can have:

No solution

3x + 2y = 6, 3x + 2y = 12

Many solutions

3x + 2y = 6, 6x + 4y = 12

Exactly one solution: this means multiplication by the matrix is an invertible function

𝐴𝐴𝐴𝐴 = 𝑏𝑏 𝐴𝐴 = 𝐴𝐴−1𝑏𝑏

(13)

U Kang 13

Matrix Inversion

Matrix inverse: an inverse A-1 of an nxn matrix A satisfies

𝐴𝐴𝐴𝐴

−1

= 𝐴𝐴

−1

𝐴𝐴 = 𝐼𝐼

𝑛𝑛

Solving a system using an inverse

(14)

Invertibility

Matrix A cannot be inverted if

More rows than columns

More columns than rows

Redundant rows/columns (“linearly dependent”, “low rank”) full rank

The number 0 is an eigenvalue of A

A non-invertible matrix is called a singular matrix

An invertible matrix is called non-singular

(15)

U Kang 15

Linear Dependence and Span

Linear combination of vectors {𝑣𝑣 1 , … , 𝑣𝑣 𝑛𝑛 } : ∑𝑖𝑖 𝑐𝑐𝑖𝑖𝑣𝑣(𝑖𝑖)

The span of a set of vectors is the set of all points

obtainable by linear combination of the original vectors

Matrix-vector product Ax can be viewed as a linear combination of column vectors of A: 𝐴𝐴𝐴𝐴 = ∑𝑖𝑖 𝐴𝐴𝑖𝑖𝐴𝐴:,𝑖𝑖

The span of columns of A is called column space or range of A

A set of vectors is linearly independent if no vector in the set is a linear combination of the other vectors

linearly dependent

(16)

Rank of a Matrix

Rank of a matrix A: Number of linearly independent columns (or rows) of A

For example:

Matrix A = ’s rank ?

Why?

(17)

U Kang 17

Norms

Functions that measure how “large” a vector is

Similar to a distance between zero and the point represented by the vector

Formally, a norm is any function f that satisfies the following:

(18)

Norms

Lp norm

Most popular norm: L2 norm, p = 2

Euclidean distance

For matrix/tensor, Frobenius norm (LF) does the same thing

L1 norm, p = 1:

Called ‘Manhattan’ distance

Max norm, infinite p:

(19)

U Kang 19

Special Matrices and Vectors

Diagonal matrix

2 by 2 : 𝑎𝑎

𝑏𝑏

We use diag(v) to denote the diagonal matrix where v is the vector containing the diagonal elements

Inverse of diagonal matrix is computed easily

(20)

Special Matrices and Vectors

Unit vector

Symmetric matrix

Orthogonal matrix

(21)

U Kang 21

Eigenvector and Eigenvalue

Eigenvector and eigenvalue of A

Eigenvalue/eigenvector pairs are defined only for square matrix 𝐴𝐴 ∈ 𝑅𝑅𝑛𝑛×𝑛𝑛

# of eigenvalues: n

There may be duplicates

Eigenvalues and eigenvectors can contain real or imaginary numbers

(22)

Intuition

A as vector transformation

2 1 1 3

A

1 0

x

2 1

x’

= x

x’

(23)

U Kang 23

Intuition

By defn., eigenvectors remain parallel to them selves (‘fixed points’)

2 1 1 3

A

0.52 0.85

v1 v1

=

0.52

3.62 * 0.85

λ1

(24)

Eigendecomposition

Eigendecomposition of a matrix

Let A be a square (n x n) matrix with n linearly independent eigenvectors (=diagonizable)

Then A can be factorized as

where V is an (nxn) matrix whose i th column is the i th

(25)

U Kang 25

Eigendecomposition

Every real symmetric matrix has a real, orthogonal eigendecomposition

A real orthogonal matrix is a square matrix with real entries whose columns and rows are orthogonal unit vectors (orthonormal vectors)

𝑄𝑄𝑇𝑇𝑄𝑄 = 𝑄𝑄 𝑄𝑄𝑇𝑇 = 𝐼𝐼

𝑄𝑄𝑇𝑇 = 𝑄𝑄−1

(26)

Eigendecomposition

Interpreting matrix-vector multiplication Ax using eigendecomposition

(27)

U Kang 27

Eigendecomposition

Understanding optimization of 𝑓𝑓 𝐴𝐴 = 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴, s.t.

| 𝐴𝐴 |2 = 1, using eigenvalues and eigenvectors

(28)

Eigendecomposition

Positive definite matrix: a real symmetric matrix whose eigenvalues are all positive

Positive semidefinite matrix

Negative definite matrix

Negative semidefinite matrix

For positive semidefinite matrix A, ∀𝐴𝐴, 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴 ≥ 0.

For positive definite matrix A, ∀𝐴𝐴 ≠ 0, 𝐴𝐴𝑇𝑇𝐴𝐴𝐴𝐴 > 0

(29)

U Kang 29

Singular Value Decomposition (SVD)

Similar to eigendecomposition

More general; matrix need not be square

𝐴𝐴 = 𝑈𝑈Λ𝑉𝑉

𝑇𝑇
(30)

SVD - Definition

A

[n x m]

= U

[n x r]

Λ

[ r x r]

( V

[m x r]

)

T

A: n x m matrix (eg., n documents, m terms)

U : n x r matrix (n documents, r concepts)

Left singular vectors

Λ : r x r diagonal matrix (strength of each

‘concept’) (r : rank of the matrix)

V : m x r matrix (m terms, r concepts)

(31)

U Kang 31

SVD - Definition

r x r r x m

n x r n x m

A U

Λ V

T

A

[n x m]

= U

[n x r]

Λ

[ r x r]

(V

[m x r]

)

T

= x x

(32)

SVD - Properties

THEOREM [Press+92]: always possible to

decompose matrix A into A = U Λ VT , where

U, Λ, V: unique (*)

U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other)

UT U = I; VT V = I (I: identity matrix)

Λ: singular are positive, and sorted in decreasing order

(33)

U Kang 33

𝐴𝐴 ≈ 𝑈𝑈Λ𝑉𝑉

𝑇𝑇

= �

𝑖𝑖

𝜎𝜎

𝑖𝑖

(𝑢𝑢

𝑖𝑖

𝑣𝑣

𝑖𝑖𝑇𝑇

)

Best rank-k approximation in LF

A

m

n

m 𝚲𝚲

n

U

VT

SVD - Properties

(34)

SVD and eigendecomposition

The left-singular vectors of A are the eigenvectors of AAT

The right-singular vectors of A are the eigenvectors of ATA

The non-zero singular values of A are the square root of the eigenvalues of ATA

(35)

U Kang 35

Moore-Penrose Pseudoinverse

Assume we want to solve A x = y for x

What if A is not invertible?

What if A is not square?

Still, we can find the ‘best’ x by using pseudoinverse

For an (n x m) matrix A, the pseudoinverse A+ is an (m x n) matrix

(36)

Moore-Penrose Pseudoinverse

If the equation has:

Exactly one solution: this is the same as the inverse

No solution: this gives us the solution with the smallest error ||Ax – y||2

Over-specified case

(37)

U Kang 37

Pseudoinverse: Over-specified case

No solution: this gives us the solution with the smallest error ||Ax – y||2

[3 2]T [x] = [1 2]T (i.e., 3x = 1, 2x = 2)

([3 2]T)+ [1 2] = … = 7/13

This method is called the ‘least square’

1 2 3 4 1

2

reachable points (3x, 2x) desirable point y

(38)

Pseudoinverse: Under-specified case

Many solutions: this gives us the solution with the smallest norm of x

[1 2] [w z]T = 4 (i.e., 1 w + 2 z = 4)

1

2 x0 all possible solutions z shortest-length solution

(39)

U Kang 39

Computing the Pseudoinverse

The SVD allows the computation of the pseudoinverse:

(40)

Trace

𝑇𝑇𝑇𝑇 𝐴𝐴 = ∑𝑖𝑖 𝐴𝐴𝑖𝑖,𝑖𝑖

𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝐴𝐴)

𝑇𝑇𝑇𝑇 𝐴𝐴 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝑇𝑇)

| 𝐴𝐴 |𝐹𝐹 = 𝑇𝑇𝑇𝑇(𝐴𝐴𝐴𝐴𝑇𝑇)

For scalar a, 𝑇𝑇𝑇𝑇 𝑎𝑎 = 𝑎𝑎

The trace of a matrix is also computed by the sum of all eigenvalues

(41)

U Kang 41

What you need to know

Linear algebra is crucial for understanding

notations and mechanisms of many ML/deep learning methods

Important concepts

Matrix product, identity, invertibility, norm,

eigendecomposition, singular value decomposition, pseudo inverse, and trace

(42)

Questions?

Referensi

Dokumen terkait

The results of applying the deep learning approach to the classification of Indonesian telematics MLE based on business prospects using 5 target classes can be seen

Since ∅ is an open set in any topological space ( S, O ) and any union of open sets is an open set, it follows that the topology itself is an interior system on S. In addition,

CONCLUSION To classify positive and negative sentiments on Indonesian-language tweets about large-scale social restrictions PSBB by comparing 3 types of distance calculations, namely:

In order to investigate the correlations among the response variables at multiple time points and a set of explanatory variables a generalized linear model GLM is considered based on

43 Figure 3.1 The study area of this research with optimal interpolation sea surface temperature OISST, upper across the Pacific Ocean and eight zones over East Asia below 3.2.2 Data

5.3 Performance analysis of data set Using the decision tree , in its J48 weka implementation we want to predict the class attribute based on attributes level and term wise GPA.. The

After that, the four Machine learning models chosen for this experiment are: • Linear Regression • Polynomial Regression • Random Forest • Decision Tree The above mentioned

Lutfor Rahman, Sheikh Abujar, Ohidujjaman, Syed Akhter Hossain Abstract: Text summarization is an approach by which the size of one or more document is shortened and the shorten