• Tidak ada hasil yang ditemukan

Linear Algebra - Lecture 23

N/A
N/A
Protected

Academic year: 2024

Membagikan "Linear Algebra - Lecture 23"

Copied!
76
0
0

Teks penuh

(1)

Linear Algebra for Computer Science

Lecture 23

Introduction to Machine Learning

and learning from data

(2)

Machine Learning

Model

input output

(3)

Classification

Classifier Apple

(4)

Classification

Classifier Orange

(5)

Object detection

Detector

(6)

Speech Recognition

Model .دوﺑﻧ ﯽﮑﯾ دوﺑ ﯽﮑﯾ

(7)

Segmentation

Model

(8)

Stock Market Prediction

Predictor

(9)

Learning from data

https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/

(10)

Supervised Learning

http://seansoleyman.com/effect-of-dataset-size-on-image-classification-accuracy/

(11)

Supervised Learning

http://seansoleyman.com/effect-of-dataset-size-on-image-classification-accuracy/

Training data:

X1, y1 X2, y2 X3, y3

:Xn, yn

(12)

Supervised Learning

http://seansoleyman.com/effect-of-dataset-size-on-image-classification-accuracy/

Training data:

Apple

Apple

Orange

Orange

(13)

Supervised Learning

http://seansoleyman.com/effect-of-dataset-size-on-image-classification-accuracy/

Training data:

0

0

1

1

(14)

Supervised Learning

Classifier/

Regressor

input output

(15)

Classification

Classifier

input features

y {Class1, Class2, …, Classn}

(16)

Classification

Classifier Apple

(17)

Classification

Classifier Orange

(18)

Regression

Regressor y R

input features

(19)

Regression

Regressor y Rn

input features

(20)

Regression

(21)

Learnable Models

Classifier/

Regressor

input output

(22)

Learnable Models: Example

Classifier 0

(23)

Learnable Models: Example

Classifier 1

(24)

Learnable Models: Input-output map

f

x Rm y Rn

y = f(x) f: Rm Rn

(25)

Learnable Models: Example

f

x =

I.flatten() y = 0

y = f(x) f: Rm Rn I

(26)

Learnable Models: Example

f

x =

features(I) y = 0

y = f(x) f: Rm Rn I

(27)

Learnable Models: parameters

f θ

x Rm y Rn

y = f(θ, x)

θ: model parameters

(28)

Learnable Models: parameters

(29)

Learnable Models: parameters

f θ

x Rm y = f(θ, x)

Parameter Learning:

A collection of input-output paris (x1, y1), (x2, y2), …, (xN, yN),

choose θ such that y = f(θ, x) is a reasonable output for any input x.

(30)

Learning from data

f θ

x Rm y = f(θ, x)

Parameter Learning:

A collection of input-output paris (x1, y1), (x2, y2), …, (xN, yN),

choose θ such that y = f(θ, x) is a reasonable output

for training data (x1, y1), (x2, y2), …, (xN, yN)

for unseen data (generalization)

(31)

Learning from data

f θ

x Rm y = f(θ, x)

Parameter Learning:

A collection of input-output paris (x1, y1), (x2, y2), …, (xN, yN),

choose θ such that y = f(θ, x) is a reasonable output

for training data (x1, y1), (x2, y2), …, (xN, yN)

for unseen data (generalization)

(32)

Learning from data

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

(33)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

Cost function

(34)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

cost function:

C(θ) = 𝚺i=1..N d( f(θ, xi), yi )

(35)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

cost function:

C(θ) = 𝚺i=1..n d( f(θ, xi), yi )

data output

(36)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

cost function:

C(θ) = 𝚺i=1..n d( f(θ, xi), yi )

model output given xi

(37)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

cost function:

C(θ) = 𝚺i=1..n d( f(θ, xi), yi )

distance

(38)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

cost function:

C(θ) = 𝚺i=1..n

ǁ

f(θ, xi) - yi

ǁ

2

distance

(39)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

cost function:

C(θ) = 𝚺i=1..n d( f(θ, xi), yi )

(40)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

cost function:

C(θ) = 𝚺i=1..n d( f(θ, xi), yi ) choose θ such that C(θ) is small

(41)

Learning from data: Cost function

f θ

x Rm y = f(θ, x)

Training data (x1, y1), (x2, y2), …, (xN, yN)

choose θ such that f(θ, xi) is close to yi

cost function:

C(θ) = 𝚺i=1..n d( f(θ, xi), yi ) θ* = argminθ C(θ)

(42)

Cost function

(43)

Example: Linear Regression

f θ

x Rm y = A x + b Rn

(44)

Example: Linear Regression

f θ

x Rm y = A x + b Rn

A: ? by ? matrix

b: ?-D vector

(45)

Example: Linear Regression

f θ

x Rm y = A x + b Rn

A: n by m matrix

b: n-D vector

(46)

Example: Linear Regression

f θ

x Rm y = A x + b Rn

y =

f(

θ,

x)

θ = ?

(47)

Example: Linear Regression

f θ

x Rm y = A x + b Rn

y =

f(

θ,

x)

θ = (A,b)

(48)

Example: Linear Regression

(49)

Affine maps

(50)

Example: Linear Regression

f θ

x R y = a x + b R

y =

f(

θ,

x)

θ = ?

(51)

Example: Linear Regression

f θ

x R y = a x + b R

y =

f(

θ,

x)

θ = (a,b)

(52)

Example: Linear Regression

(53)

Example: Linear Regression

Training data (x1, y1), (x2, y2), …, (xN, yN)

(54)

Example: Linear Regression

f

θ

x R y = a x + b R

Training data (x1, y1), (x2, y2), …, (xN, yN)

(55)

Example: Linear Regression

f

θ

x R y = a x + b R

Training data (x1, y1), (x2, y2), …, (xN, yN)

cost function:

C(θ) = 𝚺i=1..N d( f(θ, xi), yi )

(56)

Example: Linear Regression

f

θ

x R y = a x + b R

Training data (x1, y1), (x2, y2), …, (xN, yN)

cost function:

C(a,b) = 𝚺i=1..n d( f(a,b, xi), yi )

(57)

Example: Linear Regression

f

θ

x R y = a x + b R

Training data (x1, y1), (x2, y2), …, (xN, yN)

cost function:

C(a,b) = 𝚺i=1..n d( f(a,b, xi), yi ) = 𝚺i=1..n d( a xi + b, yi )

(58)

Example: Linear Regression

f

θ

x R y = a x + b R

Training data (x1, y1), (x2, y2), …, (xN, yN)

cost function (sum of squared errors):

C(a,b) = 𝚺i=1..n ( a xi + b - yi )2

(59)

Example: Linear Regression

f

θ

x R y = a x + b R

Training data (x1, y1), (x2, y2), …, (xN, yN)

cost function (sum of squared errors):

C(a,b) = 𝚺i=1..n ( a xi + b - yi )2

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

(60)

Example: Linear Regression

cost function (sum of squared errors):

C(a,b) = 𝚺i=1..n ( a xi + b - yi )2

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

How to find a*,b*?

(61)

Solution 1: Least squares

(62)

Solution 1: Least squares

(63)

Solution 1: Least squares

(64)

Example: Linear Regression

cost function (sum of squared errors):

C(a,b) = 𝚺i=1..n ( a xi + b - yi )2

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

How to find a*,b*?

(65)

Solution 2: partial derivatives

cost function (sum of squared errors):

C(a,b) = 𝚺i=1..n ( a xi + b - yi )2

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

∂ C(a,b) / ∂ a = 0

∂ C(a,b) / ∂ b = 0

(66)

Solution 2: partial derivatives

cost function (sum of squared errors):

C(a,b) = 𝚺i=1..n ( a xi + b - yi )2

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

∂ C(a,b) / ∂ a = 2 𝚺i=1..n xi ( a xi + b - yi ) = 0

∂ C(a,b) / ∂ b = 2 𝚺i=1..n ( a xi + b - yi ) = 0

(67)

Solution 2: partial derivatives

cost function (sum of squared errors):

C(a,b) = 𝚺i=1..n ( a xi + b - yi )2

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

𝚺i=1..n xi ( a xi + b - yi ) = 0 𝚺i=1..n ( a xi + b - yi ) = 0

(68)

Solution 2: partial derivatives

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

𝚺i=1..n xi ( a xi + b - yi ) = a 𝚺i=1..n xi2 + b 𝚺i=1..n xi - 𝚺i=1..n xi yi = 0 𝚺i=1..n ( a xi + b - yi ) = a 𝚺i=1..n xi + b n - 𝚺i=1..nyi = 0

(69)

Solution 2: partial derivatives

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

(

𝚺i=1..n xi2

)

a +

(

𝚺i=1..n xi

)

b = 𝚺i=1..n xi yi

(

𝚺i=1..n xi

)

a + n b = 𝚺i=1..n yi
(70)

Solution 2: partial derivatives

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

(

𝚺i=1..n xi2

)

a +

(

𝚺i=1..n xi

)

b = 𝚺i=1..n xi yi

(

𝚺i=1..n xi

)

a + n b = 𝚺i=1..n yi

a*,b* solve system of linear equations

(71)

Solution 2: partial derivatives

(72)

Example: Linear Regression

a*,b* = argmina,b 𝚺i=1..n ( a xi + b - yi )2

a*,b* solve system of linear equations

y = a* x + b*

f

a,b

x y = a x + b

(73)

Evaluation

Find good parameters θ

θ* = argminθ 𝚺i=1..n ( f(θ, xi) - yi )2

another method

How good θ* is?

How well the regressor works?

y = a* x + b*

f

θ

x

(74)

Evaluation

Find good parameters θ

θ* = argminθ 𝚺i=1..n ( f(θ, xi) - yi )2

another method

How good θ* is?

How well the regressor works?

Given Training data (x1, y1), (x2, y2), …, (xN, yN)

Error = C(θ*) = 𝚺i=1..N ( f(θ*, xi) - yi )2

y = a* x + b*

(75)

Learning from data

f θ

x Rm y = f(θ, x)

Parameter Learning:

A collection of input-output paris (x1, y1), (x2, y2), …, (xN, yN),

choose θ such that y = f(θ, x) is a reasonable output

for training data (x1, y1), (x2, y2), …, (xN, yN)

for unseen data

(76)

Learning from data

f θ

x Rm y = f(θ, x)

Parameter Learning:

A collection of input-output paris (x1, y1), (x2, y2), …, (xN, yN),

choose θ such that y = f(θ, x) is a reasonable output

for training data (x1, y1), (x2, y2), …, (xN, yN)

for unseen data

Generalization: How well the model works on unseen data

Referensi

Dokumen terkait

The discussion is more complicated for matrices of size greater than two and is best left to a second course in linear algebra. Nevertheless the following result is a

PREMANANDA BERA Department of Mathematics IIT Roorkee PRE-REQUISITES : Linear Algebra B.Sc /BE I/II year INTENDED AUDIENCE : Master students of Mathematics,,Physics, B.Tech III Year

Dilip Kumar J Saini Assistant Professor Computer Science & Engineering Lecture-06 DCS-503 Computer Networks... LAYERS IN THE OSI MODEL FUNCTIONS OF PHYSICAL LAYERS FUNCTIONS OF

This finding shows that smartphone-based learning is also an option for students to study Mathematics subjects and this is particularly relevant to the use of the Linear Algebra

ABHISHEK PARIDA INTRODUCTION TO COMPUTER SCIENCE INTRODUCTION TO PYTHON PROGRAMMING CODING BOOTCAMP: APPLIED PROBABILITY AND STATISTICS CODING BOOTCAMP: PYTHON MACHINE LEARNING

UNIVERSITI SAINS MALAYSIA Peperiksaan Kursus Semasa Cuti Panjang Sidang Akademik 2009/2010 Jun 2010 MSS 212 – Further Linear Algebra [Aljabar Linear Lanjutan] Duration : 3 hours

UNIVERSITI SAINS MALAYSIA Peperiksaan Kursus Semasa Cuti Panjang Sidang Akademik 2010/2011 Jun 2011 MSS 212 - Further Linear Algebra [Aljabar Linear Lanjutan] Duration : 3 hours

UNIVERSITI SAINS MALAYSIA First Semester Examination 2010/2011 Academic Session November 2010 MAT 517– Computational Linear Algebra and Function Approximation [Aljabar Linear