Linear Algebra - Lecture 23

(1)

Linear Algebra for Computer Science

Lecture 23

Introduction to Machine Learning

and learning from data

(2)

Machine Learning

Model

input output

(3)

Classification

Classifier ^Apple

(4)

Classification

Classifier ^Orange

(5)

Object detection

Detector

(6)

Speech Recognition

Model .دوﺑﻧ ﯽﮑﯾ دوﺑ ﯽﮑﯾ

(7)

Segmentation

Model

(8)

Stock Market Prediction

Predictor

(9)

Learning from data

https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/

(10)

Supervised Learning

http://seansoleyman.com/effect-of-dataset-size-on-image-classification-accuracy/

(11)

Supervised Learning

Training data:

X₁, y₁ X₂, y₂ X₃, y₃

:X_n, y_n

(12)

Training data:

Apple

Orange

⋮

Orange

(13)

Supervised Learning

Training data:

0

1

⋮

1

(14)

Classifier/

Regressor

input output

(15)

Classification

Classifier

input features

y ∈ {Class₁, Class₂, …, Class_n}

(16)

Classification

Classifier ^Apple

(17)

Classification

Classifier ^Orange

(18)

Regression

Regressor ^y^∈^R

input features

(19)

Regression

Regressor ^y^∈^Rⁿ

input features

(20)

Regression

(21)

Learnable Models

Classifier/

Regressor

input output

(22)

Learnable Models: Example

Classifier 0

(23)

Classifier 1

(24)

Learnable Models: Input-output map

f

x ^∈ ^R^m y ^∈ ^Rⁿ

y = f(x) f: ^R^m^→ ^Rⁿ

(25)

f

x =

I^.flatten() y^{= 0}

y = f(x) f: ^R^m^→ ^Rⁿ I

(26)

f

x =

features(I⁾ y^{= 0}

y = f(x) f: ^R^m^→ ^Rⁿ I

(27)

Learnable Models: parameters

f θ

x ^∈ ^R^m y ^∈ ^Rⁿ

y = f(^θ,x)

θ: model parameters

(28)

Learnable Models: parameters

(29)

Learnable Models: parameters

f θ

x ^∈ ^R^m y⁼f(^θ,x)

● Parameter Learning:

○ A collection of input-output paris (x₁, y₁), (x₂, y₂), …, (x_N, y_N),

○ choose θ such that y = f(θ, x) is a reasonable output for any input x.

(30)

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ choose θ such that y = f(θ, x) is a reasonable output

■ for training data (x₁, y₁), (x₂, y₂), …, (x_N, y_N)

■ for unseen data (generalization)

(31)

Learning from data

f θ

x ^∈ ^R^m y⁼f(^θ,x)

■ for unseen data (generalization)

(32)

f θ

x ^∈ ^R^m y⁼f(^θ,x)

● Training data (x₁, y₁), (x₂, y₂), …, (x_N, y_N)

○ choose θ such that f(θ, x_i) is close to y_i

(33)

Learning from data: Cost function

f θ

x ^∈ ^R^m y⁼f(^θ,x)

Cost function

(34)

Learning from data: Cost function

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ cost function:

C(θ) = 𝚺_i=1..N d( f(θ, x_i), y_i )

(35)

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ cost function:

C(θ) = 𝚺_i=1..n d( f(θ, x_i), y_i )

data output

(36)

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ cost function:

C(θ) = 𝚺_i=1..n d( f(θ, x_i), y_i )

model output given x_i

(37)

Learning from data: Cost function

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ cost function:

C(θ) = 𝚺_i=1..n d( f(θ, x_i), y_i )

distance

(38)

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ cost function:

C(θ) = 𝚺_i=1..n

ǁ

_{f(θ, x}_i_{) - y}_i

ǁ

²

distance

(39)

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ cost function:

C(θ) = 𝚺_i=1..n d( f(θ, x_i), y_i )

(40)

Learning from data: Cost function

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ cost function:

C(θ) = 𝚺_i=1..n d( f(θ, x_i), y_i ) choose θ such that C(θ) is small

(41)

f θ

x ^∈ ^R^m y⁼f(^θ,x)

○ cost function:

C(θ) = 𝚺_i=1..n d( f(θ, x_i), y_i ) θ^* = argmin_θ C(θ)

(42)

Cost function

(43)

Example: Linear Regression

f θ

x ^∈ ^R^m y⁼A x + b ^∈ Rⁿ

(44)

Example: Linear Regression

f θ

x ^∈ ^R^m y⁼A x + b ^∈ Rⁿ

A: ? by ? matrix

b: ?-D vector

(45)

f θ

x ^∈ ^R^m y⁼A x + b ^∈ Rⁿ

A: n by m matrix

b: n-D vector

(46)

f θ

x ^∈ ^R^m y⁼A x + b ^∈ Rⁿ

y =

f(

^θ,

x)

θ = ?

(47)

f θ

x ^∈ ^R^m y⁼A x + b ^∈ Rⁿ

y =

f(

^θ,

x)

θ = (A,b)

(48)

(49)

Affine maps

(50)

f θ

x ^∈ ^R y⁼a x + b ^∈ R

y =

f(

^θ,

x)

θ = ?

(51)

f θ

x ^∈ ^R y⁼a x + b ^∈ R

y =

f(

^θ,

x)

θ = (a,b)

(52)

(53)

Training data (x₁, y₁), (x₂, y₂), …, (x_N, y_N)

(54)

Example: Linear Regression

f

_θ

x ^∈ ^R y⁼a x + b ^∈ R

(55)

f

_θ

x ^∈ ^R y⁼a x + b ^∈ R

cost function:

C(θ) = 𝚺_i=1..N d( f(θ, x_i), y_i )

(56)

f

_θ

x ^∈ ^R y⁼a x + b ^∈ R

cost function:

C(a,b) = 𝚺_i=1..n d( f(a,b, x_i), y_i )

(57)

f

_θ

x ^∈ ^R y⁼a x + b ^∈ R

cost function:

C(a,b) = 𝚺_i=1..n d( f(a,b, x_i), y_i ) = 𝚺_i=1..n d( a x_i + b, y_i )

(58)

f

_θ

x ^∈ ^R y⁼a x + b ^∈ R

cost function (sum of squared errors):

C(a,b) = 𝚺_i=1..n ( a x_i + b - y_i )²

(59)

f

_θ

x ^∈ ^R y⁼a x + b ^∈ R

C(a,b) = 𝚺_i=1..n ( a x_i + b - y_i )²

a^*,b^*= argmin_a,b 𝚺_i=1..n ( a x_i + b - y_i )²

(60)

Example: Linear Regression

C(a,b) = 𝚺_i=1..n ( a x_i + b - y_i )²

How to find a*,b*?

(61)

Solution 1: Least squares

(62)

(63)

(64)

C(a,b) = 𝚺_i=1..n ( a x_i + b - y_i )²

How to find a*,b*?

(65)

Solution 2: partial derivatives

C(a,b) = 𝚺_i=1..n ( a x_i + b - y_i )²

∂ C(a,b) / ∂ a = 0

∂ C(a,b) / ∂ b = 0

(66)

C(a,b) = 𝚺_i=1..n ( a x_i + b - y_i )²

∂ C(a,b) / ∂ a = 2 𝚺_i=1..n x_i ( a x_i + b - y_i ) = 0

∂ C(a,b) / ∂ b = 2 𝚺_i=1..n ( a x_i + b - y_i ) = 0

(67)

Solution 2: partial derivatives

C(a,b) = 𝚺_i=1..n ( a x_i + b - y_i )²

𝚺_i=1..n x_i ( a x_i + b - y_i ) = 0 𝚺_i=1..n ( a x_i + b - y_i ) = 0

(68)

𝚺_i=1..n x_i ( a x_i + b - y_i ) = a 𝚺_i=1..n x_i² + b 𝚺_i=1..nx_i - 𝚺_i=1..nx_i y_i = 0 𝚺_i=1..n ( a x_i + b - y_i ) = a 𝚺_i=1..n x_i + b n - 𝚺_i=1..ny_i = 0

(69)

(

^𝚺_i=1..n^x_i²

)

^{a +}

(

^𝚺_i=1..n^x_i

)

^{b =}^𝚺_i=1..n^x_i^y_i

(

^𝚺_i=1..n^x_i

)

a + n b = 𝚺_i=1..ny_i

(70)

(

^𝚺_i=1..n^x_i²

)

^{a +}

(

^𝚺_i=1..n^x_i

)

^{b =}^𝚺_i=1..n^x_i^y_i

(

^𝚺_i=1..n^x_i

)

a + n b = 𝚺_i=1..ny_i

a^*,b^*⇐ solve system of linear equations

(71)

(72)

a^*,b^*= argmin_a,b 𝚺_i=1..n ( a xⁱ + b - yⁱ)²

a^*,b^*⇐ solve system of linear equations

y⁼a^* x + b^*

f

_a,b

x y⁼a x + b

(73)

Evaluation

● Find good parameters θ

○ θ^*= argmin_θ 𝚺_i=1..n ( f(θ, x_i) - y_i )²

○ another method

● How good θ^*is?

● How well the regressor works?

y⁼a^* x + b^*

f

_θ

x

(74)

Evaluation

● Find good parameters θ

○ θ^*= argmin_θ 𝚺_i=1..n ( f(θ, x_i) - y_i )²

○ another method

● How good θ^*is?

● How well the regressor works?

● Given Training data (x₁, y₁), (x₂, y₂), …, (x_N, y_N)

Error = C(θ^*) = 𝚺_i=1..N ( f(θ^*, x_i) - y_i )²

y⁼a^* x + b^*

(75)

Learning from data

f θ

x ^∈ ^R^m y⁼f(^θ,x)

■ for unseen data

(76)

f θ

x ^∈ ^R^m y⁼f(^θ,x)

■ for unseen data

○ Generalization: How well the model works on unseen data