• Tidak ada hasil yang ditemukan

PATTERN RECOGNITION

N/A
N/A
Protected

Academic year: 2024

Membagikan "PATTERN RECOGNITION"

Copied!
28
0
0

Teks penuh

(1)

PATTERN RECOGNITION

JIN YOUNG CHOI

ECE, SEOUL NATIONAL UNIVERSITY

(2)

Notice

Lecture Notes: Slides References:

Pattern Classification by Richard O. Duda, et al.

Pattern Recognition and Machine Learning by Cristopher M. Bishop

Assistant:

λ°•μŠ¬κΈ°

, 133(ASRI)-412, [email protected] Evaluation: Quiz 40%, Midterm 30%, Final 30%,

Video: Every week two videos are uploaded.

Video 1: upload: Sun. 09:00, Quiz Due: Tue. 24:00 Video 2: upload: Wed. 09:00, Quiz Due: Fri. 24:00 Class web: etl.snu.ac.kr

Office: 133(ASRI)-406, [email protected]

(3)

INTRODUCTION TO

AI: ARTIFICIAL INTELLIGENCE ML: MACHINE LEARNING

DL: DEEP LEARNING

JIN YOUNG CHOI

ECE, SEOUL NATIONAL UNIVERSITY

(4)

Artificial Intelligence

Symbolism Connectionism Bayesian Theory

Minsky Rosenblatt

Statistical Machine Learning Filtering

Estimation Optimization

Data base Backpropagation Rule

Search, Inference Decision Tree

Cognitive science Neuroscience

Deep Learning Neural Networks

(5)

Artificial Intelligence

Learning from Experience (Observations, Examples)

?

Inference(Reasoning) for a Question

?

(6)

Artificial Intelligence

Learning from Experience (Observations, Examples)

If birds are given, then we can learn their features such as # of legs, shape of mouth, etc.

If cancer patients are given, then we can observe their symptoms via diagnosis

…..

Inference(Reasoning) for a Question

If features of something are given, then we can recognize what is it.

If symptoms of a patient are given, then we can infer what is his decease.

…...

(7)

Artificial Intelligence

Learning from Experience (Observations, Examples)

If birds are given, then we can learn their features such as # of legs, shape of mouth, etc.

If cancer patients are given, then we can record their symptoms via diagnosis If 𝑦 = 𝑦1, then π‘₯ = π‘₯1

If 𝑦 = 𝑦2, then π‘₯ = π‘₯2

Inference(Reasoning) for a Question

If features of something are given, then we can recognize what is it.

If symptoms of a patient are given, then we can infer what is his decease.

If π‘₯ = π‘₯1, then 𝑦 = 𝑦1 If π‘₯ = π‘₯2, then 𝑦 = 𝑦2

𝑦 π‘₯

𝑦1 π‘₯11 π‘₯12 π‘₯13 π‘₯14 π‘₯15 π‘₯16

𝑦2 π‘₯21 π‘₯22 π‘₯23 π‘₯24 π‘₯25 π‘₯26 DB

Decision Tree

𝑦 π‘₯

𝑦1 π‘₯11 π‘₯12 π‘₯13 π‘₯14 π‘₯15 π‘₯16 𝑦2 π‘₯21 π‘₯22 π‘₯23 π‘₯24 π‘₯25 π‘₯26

Search-based Inference Engine Symbolism

(8)

Artificial Intelligence

Learning from Experience (Observations, Examples) If 𝑦 = 𝑦1, then π‘₯ = π‘₯1

If 𝑦 = 𝑦2, then π‘₯ = π‘₯2

𝑝 π‘₯ = π‘₯𝑖 𝑦 = 𝑦𝑗 , 𝑝(𝑦 = 𝑦𝑗)

Inference(Reasoning) for a Question If π‘₯ = π‘₯1, then 𝑦 = 𝑦1

If π‘₯ = π‘₯2, then 𝑦 = 𝑦2

𝑝 𝑦 = 𝑦𝑗 π‘₯ = π‘₯𝑖 = 𝑝(π‘₯ = π‘₯𝑖 𝑦 = 𝑦𝑗)𝑝(𝑦 = 𝑦𝑗)

𝑝(π‘₯) ,

𝑝 π‘₯ = σ𝑖 𝑝 π‘₯ = π‘₯𝑖 𝑦 = 𝑦𝑗 𝑝(𝑦 = 𝑦𝑗)

Bayesian Theory

Density Estimation

(9)

Artificial Intelligence

Deep Neural Networks for Learning and Inference π‘œ = 𝑓 π‘Š, π‘₯ , ex, π‘œπ‘— = 𝑝 𝑦 = 𝑦𝑗 π‘₯ = π‘₯𝑖

Training (learning)

Find π‘Š to minimize the errors between π‘œπ‘— and 𝑑𝑗 for given training data { π‘₯𝑝, 𝑙𝑝 }

Inference(Reasoning)

Calculate π‘œπ‘— via the deep network

Connectionism

Network Training

Inference (feedforward)

(10)

Learning and Inference

Hight

Weight Feature Space

(11)

Learning and Inference

Hight

Weight Feature Space

𝑦 π‘₯

𝑦1 π‘₯11 π‘₯12 π‘₯13 π‘₯14 π‘₯15 π‘₯16

𝑦2 π‘₯21 π‘₯22 π‘₯23 π‘₯24 π‘₯25 π‘₯26 DB

Decision Tree

terminal (leaf) node internal

(split) node

root node 0

1 2

3 4 5 6

7 8 9 10 11 12 13 14 A general tree structure

Symbolism

(12)

Learning and Inference

Hight

Weight Feature Space

1

/ 2 1/ 2

1 1

( ) exp ( ) ( )

(2 ) | | 2

t

p d



 ο€­ οƒΉ

ο€½  οƒͺ ο€­  ο€­ 

x x ΞΌ x ΞΌ

Bayesian Theory

(13)

Learning and Inference

Hight

Weight Feature Space

Network Training

Inference (feedforward)

Connectionism

( ) t 0,

i i i

g x ο€½w xw

(14)

Convolutional Neural Networks

π‘œ = 𝑓 π‘Š, π‘₯

(15)

Supervised/Unsupervised Learning,

) ( i P



)

| (x



1

P

P ( x | 

2

) P ( x | 

3

)

Hidden Observed

Unsupervised

Supervised

(16)

Generative/Discriminative Model

ο‚§ Generating images

𝑧1

π‘₯

𝑝 𝑧1, 𝑧2, 𝑧3|π‘₯

Generative approach:

Model 𝑝 𝑧, π‘₯ = 𝑝 π‘₯ 𝑧 𝑝(𝑧) Use Bayes’ theorem

𝑝 𝑧|π‘₯ = 𝑝 π‘₯ 𝑧 𝑝(𝑧) 𝑝(π‘₯) Discriminative approach:

Model 𝑝 𝑧|π‘₯ directly

𝑧2 𝑧3

(17)

Unsupervised Learning,

Clustering: K-means, etc.

Variational Auto-Encoder

(18)

Statistical Learning

𝐿

2

Loss

𝐿 𝑑, 𝑓 π‘Š, π‘₯ = 𝑑 βˆ’ 𝑓 π‘Š, π‘₯

22

Total Loss

β„’ π‘Š = Χ¬ 𝐿 𝑑, 𝑓 π‘Š, π‘₯ 𝑑𝑝(π‘₯, 𝑑)

where 𝑝(π‘₯, 𝑑) is a joint PDF of π‘₯ and 𝑑, but unknown Empirical Total Loss

β„’

𝑖

π‘Š =

1

𝑁

Οƒ

𝑛=1𝑁

𝐿 𝑑

𝑛

, 𝑓 π‘Š, π‘₯

𝑛
(19)

Statistical Learning

𝐼 π‘₯

π‘˜

= log( 1

𝑝

π‘˜

) β†’ α‰Š π‘π‘Žπ‘ π‘’ 2 β†’ 𝑏𝑖𝑑𝑠 π‘π‘Žπ‘ π‘’ 𝑒 β†’ π‘›π‘Žπ‘‘π‘ 

32 bits β†’ 𝑝

π‘˜

= 1/2

32

for uniform distribution β†’ 𝐼

π‘˜

= 32 bits

β‘  𝐼 π‘₯

π‘˜

= 0 for 𝑝

π‘˜

= 1

β‘‘ 𝐼 π‘₯

π‘˜

β‰₯ 0 for 0 ≀ 𝑝

π‘˜

≀ 1

β‘’ 𝐼 π‘₯

π‘˜

β‰₯ 𝐼 π‘₯

𝑗

for 𝑝

π‘˜

≀ 𝑝

𝑗

Entropy : a measure of the average amount of information conveyed per message, i.e., expectation of Information 𝐻 π‘₯ = 𝐸 𝐼 π‘₯

π‘˜

= ෍

π‘˜

𝑝

π‘˜

𝐼 π‘₯

π‘˜

= βˆ’ ෍

π‘˜

𝑝

π‘˜

log 𝑝

π‘˜
(20)

Statistical Learning

Entropy becomes maximum when 𝑝

π‘˜

is equiprobable

32 bits β†’ 𝑝

π‘˜

= 1/2

32

for uniform distribution β†’ 𝐼

π‘˜

= 32 bits

β†’ 0 ≀ 𝐻(𝑋) ≀ βˆ’ ෍

π‘˜=1 232

1

2

32

log 1

2

32

= 32

β†’ 𝐻 𝑋 = 0 for an event 𝑝

π‘˜

= 1 and 𝑝

π‘—β‰ π‘˜

= 0 Cross Entropy Loss:

β„’ W = βˆ’ Οƒ

π‘˜πΎ

𝑑

π‘˜

log 𝑓

π‘˜

π‘Š, π‘₯ 𝑓

π‘˜

π‘Š, π‘₯ =

π‘’π‘Žπ‘˜

σ𝑗 π‘’π‘Žπ‘—

(softmax)

𝑑

π‘˜

: target label (one hot: 0000100)

… …

j

1 i I

1 k K

…

… 𝑀𝑗𝑖 … π‘€π‘˜π‘—

j ji i

i

net ο€½οƒ₯w x

k kj j

j

net ο€½οƒ₯w h

β„Žπ‘— π‘œπ‘˜

π‘₯𝑖

…

π‘Žπ‘˜

π‘Žπ‘—

(21)

Statistical Learning

Cross Entropy Loss:

β„’ W = βˆ’ ෍

π‘˜ 𝐾

[𝑑

π‘˜

log 𝑓

π‘˜

π‘Š, π‘₯ + (1 βˆ’ 𝑑

π‘˜

)log (1 βˆ’ 𝑓

π‘˜

π‘Š, π‘₯ )]

𝑓

π‘˜

π‘Š, π‘₯ =

1

1+π‘’βˆ’π‘Žπ‘˜

(sigmoid)

𝑑

π‘˜

: target label (multi-hot: 00110100)

… …

j

1 i I

1 k K

…

… 𝑀𝑗𝑖 … π‘€π‘˜π‘—

j ji i

i

net ο€½οƒ₯w x

k kj j

j

net ο€½οƒ₯w h

β„Žπ‘— π‘œπ‘˜

π‘₯𝑖

…

π‘Žπ‘˜

π‘Žπ‘—

(22)

Statistical Learning

Theorem (Gray 1990)

෍

π‘˜

𝑝

π‘˜

log 𝑝

π‘˜

π‘ž

π‘˜

β‰₯ 0

Relative entropy (or Kullback – Leibler divergence) 𝐷

𝐾𝐿

(𝑝 βˆ₯ π‘ž) = Οƒ

π‘˜

𝑝

π‘˜

log

π‘π‘˜

π‘žπ‘˜

𝐷

𝐾𝐿

(𝑝 βˆ₯ π‘ž) = 0 for 𝑝 ≑ π‘ž 𝑝

π‘˜

probability mass function

π‘ž

π‘˜

reference probability mass function

(23)

Scene and Object Generation

Pose Transformer

(24)
(25)

Motion Retargeting

(26)

Outline of ML techniques

26

Bayes Rule Likelihood Posteriori Priori Bayes Decision Learning

Parzen W.

K-NN

Lin. Classifier

Discriminative Model Generative Model

Random Forest GMM

Bayesian Net ML(P)E Bayes. L GM

Learning SVM

Histogram Entropy

EM, MCMC MCMC, VI

K-SVM K-SVDD

LS Convex O.

Convex O.

Deep NN SA GA BP(GD) NM

Boltzm. Machine MLE VI

Latent DA MCMC, VI

ICA

PCA

Linear DA

K-L Divergence Max. Separa.

Max. Scatter EM HMM

(27)

Course Outline

Intro. AI, ML, and DL Intro. Linear Algebra

Intro. Prob. & Information Bayesian Decision Theory Dim reduction PCA & LDA Learning Rules

Support Vector Machine

Deep Convolutional Networks Bayesian Networks

Parametric pdf Estimation

Non-Parametric pdf Estimation

Boltzman Machine

Markov Chain Monte Carlo

Inference of Bayesian Net, MCMC Inference of Bayesian Net, VI

Traffic Pattern Analysis, VI Recent Papers

- Active Learning

- Imbalanced Data Learning - Out of Distribution

- Weakly Supervised Learning

- Etc.

(28)

Questions

1. Describe the common things and differences among symbolism,

connectionism, and Bayesian approach.

2. Explain supervised/weekly-supervised/

unsupervised learning.

3. What is the difference between

discriminative and generative model?

Referensi

Dokumen terkait