PATTERN RECOGNITION
JIN YOUNG CHOI
ECE, SEOUL NATIONAL UNIVERSITY
Notice
Lecture Notes: Slides References:
Pattern Classification by Richard O. Duda, et al.
Pattern Recognition and Machine Learning by Cristopher M. Bishop
Assistant:
λ°μ¬κΈ°, 133(ASRI)-412, [email protected] Evaluation: Quiz 40%, Midterm 30%, Final 30%,
Video: Every week two videos are uploaded.
Video 1: upload: Sun. 09:00, Quiz Due: Tue. 24:00 Video 2: upload: Wed. 09:00, Quiz Due: Fri. 24:00 Class web: etl.snu.ac.kr
Office: 133(ASRI)-406, [email protected]
INTRODUCTION TO
AI: ARTIFICIAL INTELLIGENCE ML: MACHINE LEARNING
DL: DEEP LEARNING
JIN YOUNG CHOI
ECE, SEOUL NATIONAL UNIVERSITY
Artificial Intelligence
Symbolism Connectionism Bayesian Theory
Minsky Rosenblatt
Statistical Machine Learning Filtering
Estimation Optimization
Data base Backpropagation Rule
Search, Inference Decision Tree
Cognitive science Neuroscience
Deep Learning Neural Networks
Artificial Intelligence
Learning from Experience (Observations, Examples)
?
Inference(Reasoning) for a Question
?
Artificial Intelligence
Learning from Experience (Observations, Examples)
If birds are given, then we can learn their features such as # of legs, shape of mouth, etc.
If cancer patients are given, then we can observe their symptoms via diagnosis
β¦..
Inference(Reasoning) for a Question
If features of something are given, then we can recognize what is it.
If symptoms of a patient are given, then we can infer what is his decease.
β¦...
Artificial Intelligence
Learning from Experience (Observations, Examples)
If birds are given, then we can learn their features such as # of legs, shape of mouth, etc.
If cancer patients are given, then we can record their symptoms via diagnosis If π¦ = π¦1, then π₯ = π₯1
If π¦ = π¦2, then π₯ = π₯2
Inference(Reasoning) for a Question
If features of something are given, then we can recognize what is it.
If symptoms of a patient are given, then we can infer what is his decease.
If π₯ = π₯1, then π¦ = π¦1 If π₯ = π₯2, then π¦ = π¦2
π¦ π₯
π¦1 π₯11 π₯12 π₯13 π₯14 π₯15 π₯16
π¦2 π₯21 π₯22 π₯23 π₯24 π₯25 π₯26 DB
Decision Tree
π¦ π₯
π¦1 π₯11 π₯12 π₯13 π₯14 π₯15 π₯16 π¦2 π₯21 π₯22 π₯23 π₯24 π₯25 π₯26
Search-based Inference Engine Symbolism
Artificial Intelligence
Learning from Experience (Observations, Examples) If π¦ = π¦1, then π₯ = π₯1
If π¦ = π¦2, then π₯ = π₯2
π π₯ = π₯π π¦ = π¦π , π(π¦ = π¦π)
Inference(Reasoning) for a Question If π₯ = π₯1, then π¦ = π¦1
If π₯ = π₯2, then π¦ = π¦2
π π¦ = π¦π π₯ = π₯π = π(π₯ = π₯π π¦ = π¦π)π(π¦ = π¦π)
π(π₯) ,
π π₯ = Οπ π π₯ = π₯π π¦ = π¦π π(π¦ = π¦π)
Bayesian Theory
Density Estimation
Artificial Intelligence
Deep Neural Networks for Learning and Inference π = π π, π₯ , ex, ππ = π π¦ = π¦π π₯ = π₯π
Training (learning)
Find π to minimize the errors between ππ and ππ for given training data { π₯π, ππ }
Inference(Reasoning)
Calculate ππ via the deep network
Connectionism
Network Training
Inference (feedforward)
Learning and Inference
Hight
Weight Feature Space
Learning and Inference
Hight
Weight Feature Space
π¦ π₯
π¦1 π₯11 π₯12 π₯13 π₯14 π₯15 π₯16
π¦2 π₯21 π₯22 π₯23 π₯24 π₯25 π₯26 DB
Decision Tree
terminal (leaf) node internal
(split) node
root node 0
1 2
3 4 5 6
7 8 9 10 11 12 13 14 A general tree structure
Symbolism
Learning and Inference
Hight
Weight Feature Space
1
/ 2 1/ 2
1 1
( ) exp ( ) ( )
(2 ) | | 2
t
p d
ο°
ο© ο οΉ
ο½ ο οͺο«ο ο ο ο οΊο»
x x ΞΌ x ΞΌ
Bayesian Theory
Learning and Inference
Hight
Weight Feature Space
Network Training
Inference (feedforward)
Connectionism
( ) t 0,
i i i
g x ο½w xο«w
Convolutional Neural Networks
π = π π, π₯
Supervised/Unsupervised Learning,
) ( i P
ο·
)
| (x
ο·
1P
P ( x | ο·
2) P ( x | ο·
3)
Hidden Observed
Unsupervised
Supervised
Generative/Discriminative Model
ο§ Generating images
π§1
π₯
π π§1, π§2, π§3|π₯
Generative approach:
Model π π§, π₯ = π π₯ π§ π(π§) Use Bayesβ theorem
π π§|π₯ = π π₯ π§ π(π§) π(π₯) Discriminative approach:
Model π π§|π₯ directly
π§2 π§3
Unsupervised Learning,
Clustering: K-means, etc.
Variational Auto-Encoder
Statistical Learning
πΏ
2Loss
πΏ π, π π, π₯ = π β π π, π₯
22Total Loss
β π = Χ¬ πΏ π, π π, π₯ ππ(π₯, π)
where π(π₯, π) is a joint PDF of π₯ and π, but unknown Empirical Total Loss
β
ππ =
1π
Ο
π=1ππΏ π
π, π π, π₯
πStatistical Learning
πΌ π₯
π= log( 1
π
π) β α πππ π 2 β πππ‘π πππ π π β πππ‘π
32 bits β π
π= 1/2
32for uniform distribution β πΌ
π= 32 bits
β πΌ π₯
π= 0 for π
π= 1
β‘ πΌ π₯
πβ₯ 0 for 0 β€ π
πβ€ 1
β’ πΌ π₯
πβ₯ πΌ π₯
πfor π
πβ€ π
πEntropy : a measure of the average amount of information conveyed per message, i.e., expectation of Information π» π₯ = πΈ πΌ π₯
π= ΰ·
π
π
ππΌ π₯
π= β ΰ·
π
π
πlog π
πStatistical Learning
Entropy becomes maximum when π
πis equiprobable
32 bits β π
π= 1/2
32for uniform distribution β πΌ
π= 32 bits
β 0 β€ π»(π) β€ β ΰ·
π=1 232
1
2
32log 1
2
32= 32
β π» π = 0 for an event π
π= 1 and π
πβ π= 0 Cross Entropy Loss:
β W = β Ο
ππΎπ‘
πlog π
ππ, π₯ π
ππ, π₯ =
πππΟπ πππ
(softmax)
π‘
π: target label (one hot: 0000100)
β¦ β¦
j
1 i I
1 k K
β¦
β¦ π€ππ β¦ π€ππ
j ji i
i
net ο½ο₯w x
k kj j
j
net ο½ο₯w h
βπ ππ
π₯π
β¦
ππ
ππ
Statistical Learning
Cross Entropy Loss:
β W = β ΰ·
π πΎ
[π‘
πlog π
ππ, π₯ + (1 β π‘
π)log (1 β π
ππ, π₯ )]
π
ππ, π₯ =
11+πβππ
(sigmoid)
π‘
π: target label (multi-hot: 00110100)
β¦ β¦
j
1 i I
1 k K
β¦
β¦ π€ππ β¦ π€ππ
j ji i
i
net ο½ο₯w x
k kj j
j
net ο½ο₯w h
βπ ππ
π₯π
β¦
ππ
ππ
Statistical Learning
Theorem (Gray 1990)
ΰ·
π
π
πlog π
ππ
πβ₯ 0
Relative entropy (or Kullback β Leibler divergence) π·
πΎπΏ(π β₯ π) = Ο
ππ
πlog
ππππ
π·
πΎπΏ(π β₯ π) = 0 for π β‘ π π
πprobability mass function
π
πreference probability mass function
Scene and Object Generation
Pose Transformer
Motion Retargeting
Outline of ML techniques
26
Bayes Rule Likelihood Posteriori Priori Bayes Decision Learning
Parzen W.
K-NN
Lin. Classifier
Discriminative Model Generative Model
Random Forest GMM
Bayesian Net ML(P)E Bayes. L GM
Learning SVM
Histogram Entropy
EM, MCMC MCMC, VI
K-SVM K-SVDD
LS Convex O.
Convex O.
Deep NN SA GA BP(GD) NM
Boltzm. Machine MLE VI
Latent DA MCMC, VI
ICA
PCA
Linear DA
K-L Divergence Max. Separa.
Max. Scatter EM HMM
Course Outline
Intro. AI, ML, and DL Intro. Linear Algebra
Intro. Prob. & Information Bayesian Decision Theory Dim reduction PCA & LDA Learning Rules
Support Vector Machine
Deep Convolutional Networks Bayesian Networks
Parametric pdf Estimation
Non-Parametric pdf Estimation
Boltzman Machine
Markov Chain Monte Carlo
Inference of Bayesian Net, MCMC Inference of Bayesian Net, VI
Traffic Pattern Analysis, VI Recent Papers
- Active Learning
- Imbalanced Data Learning - Out of Distribution
- Weakly Supervised Learning
- Etc.
Questions
1. Describe the common things and differences among symbolism,
connectionism, and Bayesian approach.
2. Explain supervised/weekly-supervised/
unsupervised learning.
3. What is the difference between
discriminative and generative model?