PSZ 19:16 (Pind. 1/07)
DECLARATION OF THESIS / UNDERGRADUATE PROJECT PAPER AND COPYRIGHT
Author’s full name :
Date of birth :
Title : CONVOLUTIONAL NEURAL NETWORKS FOR FACE RECOGNITION
AND FINGER-VEIN BIOMETRIC IDENTIFICATION
Academic Session:
I declare that this thesis is classified as :
I acknowledged that Universiti Teknologi Malaysia reserves the right as follows :
1. The thesis is the property of Universiti Teknologi Malaysia.
2. The Library of Universiti Teknologi Malaysia has the right to make copies for the purpose of research only.
3. The Library has the right to make copies of the thesis for academic exchange.
Certified by :
NOTES : * If the thesis is CONFIDENTIAL or RESTRICTED, please attach with the letter from the organisation with period and reasons for confidentiality or restriction.
UNIVERSITI TEKNOLOGI MALAYSIA
√
CONFIDENTIAL (Contains confidential information under the Official Secret
Act 1972)*
RESTRICTED (Contains restricted information as specified by the
organisation where research was done)*
OPEN ACCESS I agree that my thesis to be published as online open access (full text)
2013/2014/2
SYAFEEZA BINTI AHMAD RADZI
810324-71-5024 MOHAMED KHALIL BIN MOHD HANI
“I hereby declare that I have read this thesis and in my
opinion this thesis is sufficient in terms of scope and quality for the award of the degree of Doctor of Philosophy (Electrical Engineering)”
Signature :
iv
ACKNOWLEDGEMENT
I am very blessed with the completion of this thesis. With this opportunity,
I would like to express my gratitude to several people that have involved directly or
indirectly to the completion of this thesis.
Firstly, I would like to express my sincere appreciation to my supervisor
Professor Dr. Mohamed Khalil bin Mohd Hani for the useful critics, guidance and
advices during the journey. I have gained valuable knowledge during his supervision.
It was also a privilege to work closely with the members of VeCAD Lab namely Rabia,
Vishnu, Lee Yee Hui, Sia Chen Wei, Moganesh and Liew Shan Sung. I am very
fortunate to have these people during my ups and downs of conducting this research. I
am glad for the sharing of constructive ideas and their concern.
Most importantly, I would like to thank my family especially my parents,
husband and daughter. Thank you for the continuous encouragement, patience and
boundless support in all my endeavours.
Last but not least, I would like to acknowledge Universiti Teknikal Malaysia
ABSTRACT
vi
ABSTRAK
Convolutional Neural Network (CNN) yang merupakan variasi kepada
Multilayer Perceptron(MLP) telah menunjukkan kebolehan dalam kerja pengecaman
yang rumit terutamanya dalam pengecaman corak visual. Walau bagaimanapun, senibina klasik CNN iaitu LeNet-5, yang merupakan asas kepada kebanyakan penyelesaian, mempunyai pengiraan intensif yang tinggi. CNN ini juga berhadapan dengan masa latihan yang terlalu lama disebabkan oleh bilangan lapisannya dalam lingkungan enam hingga lapan lapisan. Dalam kajian ini, model CNN dengan kurang kekompleksan telah dicadangkan untuk diaplikasi pada pengecaman muka dan pengesahan identiti biometrik urat jari. Senibina yang lebih ringkas telah diperolehi dengan cara mencantumkan lapisan convolution dan subsampling ke satu lapisan, dengan gabungan skim sambungan separa antara dua lapisan pertama dalam rangkaian. Keputusannya, jumlah lapisan telah dikurangkan kepada empat. Bilangan petak sifat pada setiap lapisan telah dioptimumkan berdasarkan kepada jenis pangkalan data yang digunakan. Kesannya, bilangan parameter rangkaian (termasuk neuron, parameter terlatih dan sambungan) nyata sekali dapat dikurangkan, terutamanya kebolehan generalisasi yang lebih baik. Algoritma Stochastic Diagonal Levenberg-Marquadt
TABLE OF CONTENTS
CHAPTER TITLE PAGE
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENT iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES xii
LIST OF FIGURES xv
LIST OF ABBREVIATIONS xviii
1 INTRODUCTION 1
1.1 Overview of Pattern Recognition 1
1.2 Neural Networks in Pattern Recognition Problems 2
1.3 Convolutional Neural Network 3
1.4 Biometric Pattern Recognition 4
1.4.1 Summary of Existing Face Recognizers 7
1.4.2 Summary of Work on Finger-vein Pattern
Recognition 8
1.5 Problem Statement 8
1.6 Research Objectives 13
1.7 Scope of Work 14
1.8 Research Contributions 14
1.9 Thesis Organization 16
2 LITERATURE REVIEW 17
2.1 Basic CNN Architectures 17
2.2 Previous Work applying CNN in Face Recognition 19 2.3 Related Previous Work on Finger-vein Biometric
viii
2.4 Other State-of-the-Art CNN Architectures 27
2.4.1 Siamese CNN 27
2.4.2 Space Displacement Neural Networks 31
2.4.3 Shunting Inhibitory Convolutional Neural
Networks 32
2.4.4 Sparse Convolutional Neural Networks 34
2.4.5 Multiscale Convolutional Neural
Net-works 34
2.4.6 Multiple Circular Path Convolutional
Neural Networks 35
2.4.7 Multi-column Deep Neural Networks 36
2.4.8 3D Convolutional Neural Networks 37
2.4.9 Recurrent Convolutional Neural
Net-works 39
2.5 Summary 39
3 CONVOLUTIONAL NEURAL NETWORK -
ALGO-RITHMS AND ARCHITECTURE 40
3.1 Multilayer Perceptron Neural Network (MLPNN) 40
3.1.1 Activation Functions 42 3.1.2 Weight Initialization Methods 42
3.1.3 Normalization Method 44
3.2 Neural Network Training 45
3.2.1 Gradient Descent Backpropagation
Algo-rithm 46
3.2.2 Resilient Backpropagation 49
3.3 LeNet-5 Convolutional Neural Networks 49
3.3.1 Algorithms in LeNet-5 50
3.3.2 Convolutional Neural Network Training 54
3.4 Summary 55
4 PROPOSED CONVOLUTIONAL NEURAL
NETWORK: MODELLING, ARCHITECTURE AND
ALGORITHM 57
4.1 Methodology 57
4.1.1 Cross-validation Technique to optimize
4.2.2 CNN with Fused
Convolution/Subsam-pling 63
4.2.3 Connection Schemes 67
4.3 Proposed Training Algorithm 69
4.3.1 Stochastic Diagonal
Levenberg-Marquardt (SDLM) 69
4.3.2 Proposed Modifications to SDLM
Algo-rithm 71
4.4 Proposed Training Scheme to recognize New
Categories without Full CNN Training 73 4.5 Generalizing the CNN Architecture for New
Problem Domain 76
4.6 Summary 77
5 EXPERIMENTAL WORK AND RESULTS FOR CNN IN
FACE RECOGNITION 78
5.1 Performance Evaluation Procedures 78
5.2 Data Preparation 82
5.2.1 Face databases 82
5.3 Experimental Work and Results of the Proposed
Baseline CNN Model 85
5.3.1 Results of Cross-validation 86
5.3.2 Best Connection Scheme 87
5.3.3 The Most Suitable Normalization and
Weight Initialization Methods 88
5.3.4 The Optimum Input Image Size 89
5.3.5 Benchmarking Results 90 5.3.5.1 Benchmarking of Accuracy 90
5.3.5.2 Benchmarking of Execution
and Convergence Time 92
5.3.5.3 Analysis and Benchmarking of
Model Complexity 93
5.3.5.4 Comparison between the
pro-posed CNN Model and LeNet-5 95
5.4 Experimental Work and Results of Proposed CNN
x
5.4.1 Results of Cross-validation 96
5.4.2 Best Connection Scheme 96
5.4.3 The Most Suitable Normalization and
Weight Initialization Methods 97 5.4.4 The Optimum Input Image Size 99
5.4.5 Benchmarking Results 100
5.5 Experimental Results of CNN Model Targeted for
Face Images of the ”Extreme Challenge” Type 101
5.5.1 Results of Cross-validation 101
5.5.2 Best Connection Scheme 103
5.5.3 The Most Suitable Normalization and
Weight Initialization Methods 104
5.5.4 The Optimum Input Image Size 105 5.5.5 Benchmarking Results 107
5.6 Analysis of the Proposed Modification on SDLM
Algorithm 107
5.6.1 Comparison between the Proposed
Mod-ification on SDLM Algorithm and
Stan-dard Backpropagation 109
5.7 Analysis of Training Scheme to recognize New
Categories without Full CNN Training 111
5.8 Conclusion 112
6 EXPERIMENTAL WORK AND RESULTS FOR CNN IN
FINGER-VEIN BIOMETRIC IDENTIFICATION 113
6.1 Generation of Finger-vein Database for Biometric
Identification 113
6.2 Performance Evaluation of Finger-vein Biometric
Identification 118
6.2.1 VeCAD-UTM database 118
6.2.2 Results of Cross-validation 118
6.2.3 Best Connection Scheme 120
6.2.4 The Most Suitable Normalization and
Weight Initialization Methods 120
6.2.5 The Optimum Input Image Size 123
6.2.6 Benchmarking Results 124
REFERENCES 132
xii
LIST OF TABLES
TABLE NO. TITLE PAGE
1.1 Characteristic comparison of biometric application [1] 6
1.2 History of some biometric development [2] 6
1.3 Previous work on face recognition based on CNN 7
1.4 Previous work on face recognition applying non-CNN
methods 7
1.5 Accuracy achieved by Computational Intelligence (CI)
approach for finger-vein identification system 8
1.6 Detail information and accuracy achieved by conventional
approach for fingervein identification system 8
2.1 Face Recognition System using CNN 22
2.2 Conventional methods of finger-vein identification 25
2.3 Algorithms applied by CI approach for finger-vein
identifica-tion system 26
2.4 Accuracy achieved by CI-based for finger-vein identification 27
2.5 Related previous work on finger-vein biometric system and
its recognition speed 27
3.1 Input/output relation and symbol for each activation function 43
3.2 Each row indicates which feature maps in S2 are combined
by the particular feature maps of C3 [3] 53
4.1 Variant type of connections between C1 and C2 layer 68
4.2 Connection scheme between C1 and C2 layer for AT&T
database 68
4.3 Variant type of connections between C1 and C2 layer 69
4.4 Connection scheme between C1 and C2 layer for AR Purdue
database 69
4.5 The values ofηgas given in [3] 73
5.1 Pose variations of FERET database 85
5.2 Performance of the model tested for AT&T database 87
5.6 Accuracy for different combinations of weight initialization
algorithm and normalization methods for AT&T database 88
5.7 Accuracy of each input size for AT&T database 89
5.8 Benchmarking of accuracy of other CNN face recognizers on
AT&T database 91
5.9 Benchmarking with non-CNN approach 91
5.10 Benchmarking of execution time and convergence learning
time against other face recognizers 92
5.11 Benchmarking of model complexity against previous work
that apply CNN (not restricted to face recognition problems) 95 5.12 Comparison between fused architecture and LeNet-5 96
5.13 Performance of the model tested for AR database 97
5.14 Variant type of connections between C1 and C2 layer 98
5.15 Connection scheme between C1 and C2 layer for AR Purdue
database 98
5.16 Parameter Setup for AR Purdue database 99
5.17 Accuracy for different combinations of weight initialization
algorithm and normalization methods for AR Purdue database 99
5.18 Parameter setup for each input size for AR Purdue database 100 5.19 Accuracy of each input size for AR Purdue database 100
5.20 Benchmarking of face identification system using AR Purdue
database with existing works 102
5.21 Performance of the model tested for FERET database 103
5.22 Variant type of connections between C1 and C2 layer for
FERET database 103
5.23 Connection scheme between C1 and C2 layer for FERET
database 104
5.24 Parameter Setup for FERET database 104
5.25 Accuracy for different combinations of weight initialization
algorithm and normalization methods for FERET database 105
5.26 Parameter setup for each input size for FERET database 105
5.27 Accuracy of each input size for FERET database 105
5.28 Benchmarking of face identification system using FERET
database with existing works 107
xiv
5.30 Result of BP training using combination of different initial
learning rate and learning rate fading 110
5.31 Result of BP training using combination of different initial
learning rate and learning rate fading 111 5.32 Result of accepting new subject(s) from JAFFE database into
the system 112
6.1 Performance of the model tested for VeCAD-UTM
finger-vein database 120
6.2 Variant type of connections between C1 and C2 layer 120
6.3 Connection scheme between C1 and C2 layer for
VeCAD-UTM database 120
6.4 Parameter Setup for Case A and Case B 121
6.5 Accuracy for different combinations of weight initialization algorithm and normalization methods for Case A
VeCAD-UTM finger-vein database 121
6.6 Accuracy for different combinations of weight initialization
algorithm and normalization methods for Case B
VeCAD-UTM finger-vein database 122
6.7 Accuracy of each input size for VeCAD-UTM database 123
6.8 Accuracy achieved by different number of subjects 124
6.9 Benchmarking of accuracy achieved by the proposed method
and other CI methods for finger-vein identification system 125 6.10 Benchmarking of accuracy achieved by the proposed method
and conventional methods for finger-vein identification
system 125
6.11 Benchmarking on recognition speed for finger-vein biometric
LIST OF FIGURES
FIGURE NO. TITLE PAGE
1.1 Typical pattern recognition flow 1
1.2 Example of CNN Architecture 4
1.3 Taxonomy of biometric methods 5
1.4 Challenges faced by biometric face recognition system 12 1.5 Inappropriate lighting in finger-vein samples of
VeCAD-UTM finger-vein database 13
2.1 Handwriting zip code application using CNN [4] 17
2.2 LeNet-5 CNN architecture for handwritting recognition 18
2.3 CNN architecture by Y. Simard et al. [5] with fused
convolution/subsampling 18
2.4 CNN architecture for fingerprint verification process [6] 28
2.5 Example of genuine and impostor pairs 29
2.6 Illustration of SCNN framework 30
2.7 A sample of ZIP code image and its possible segmentation 32
2.8 INSEG and OUTSEG segmentation for word recognition 32
2.9 Three schemes of SICoNNets. (a) Full-Connection. (b)
Toeplitz-Connection. (c) Binary-connection [7] 33
2.10 Shunting neuron model [7] 33
2.11 Sketch of the SPCNN connectivity [8]. 34
2.12 Multiscale Convolutional Neural Networks (MSCNN)
archi-tecture for traffic sign recognition [9]. 35
2.13 Three types of network paths connecting the input and the hidden layers in the Multiple Circular Path Convolutional
Neural Network (MCPCNN) (a) Full-connection path
(b) Self-correlation path (c) Neighbourhood-correlation
xvi
2.14 Multi-column Deep Neural Networks (MCDNN) and its basic
buiding block (a) Deep Neural Network (DNN) architecture
(b) The dataset is preprocessed (P) before the training starts; during training all original or pre-processed images are randomly distorted before each epoch (D). (c) MCDNN architecture: the input image is pre-processed by n different preprocessorsP0−Pn−1and final predictions are obtained by
averaging individual predictions of each DNN [11]. 37
2.15 Comparison between Two-Dimensional (2D) and
Three-Dimensional (3D) convolutions 38
2.16 3D CNN architecture for human action recognition 38
2.17 Recurrent Convolutional Neural Networks (RCNN)
architec-ture for dialog act classification 39 3.1 (a) Biological neuron (b) Modeling in NN 41
3.2 (a) Training phase (b) Test phase 48
3.3 An example of a LeNet-5 CNN architecture for handwritting
recognition [3] 51
3.4 Process in convolutional layer of CNN 52
3.5 Process in subsampling layer of CNN 52
3.6 LeNet-5 CNN architecture 54
4.1 Prediction of early stopping rule through cross-validation
technique [12]. 59
4.2 CNN architecture by Y. Simard et al. [5] with fused
convolution/subsampling 64
4.3 The proposed four-layers CNN architecture 64
4.4 (a) convolution and subsampling of LeNet-5 CNN
archi-tecture and (b) fused convolution/subsampling in proposed
CNN 66
4.5 Conceptual view of generalizing the feature extraction layer
to other database in the system 76
5.1 Four different types of weight initialization methods 81
5.2 Sample of images from AT&T database 83 5.3 Sample of images from AR Purdue database 83
5.4 Preprocessing stages for AR Purdue database 84
5.5 Sample of images from FERET database 85
5.6 Preprocessing stages for FERET database 85
5.7 Validation error rates in cross-validation of the six tested
5.10 Validation error rates in cross-validation of the seven tested
CNN models for AR database 97
5.11 Misclassified samples for AR database 99
5.12 The proposed architecture for AR Purdue database (referred
here as the15-45-130model) 100
5.13 Feature maps at each layer after training for AR database 101
5.14 Validation error rates in cross-validation of the six tested
CNN models for FERET 102
5.15 The proposed architecture for FERET database (referred here
as the15-47-130model) 106
5.16 Feature maps at each layer after training for FERET database 106
5.17 Error rate on the training dataset 108
5.18 Error rate on the test dataset 108
5.19 Classification accuracy performance. (Note: Case A is for
the proposed learning with repeating scheme and Case B is
the original SDLM) 109
6.1 Non-ideal vein samples of SDUMLA-HMT
finger-vein database 113
6.2 Sample of images from VeCAD-UTM database 114
6.3 Example of vertical cut 115
6.4 Example of horizontal cut 116
6.5 Image samples for 5 subjects (a) Case A and (b) Case B 117
6.6 Preprocessing stages and segmentation process for Case A 117
6.7 Preprocessing stages for Case B 117
6.8 Validation error rates in cross-validation of the six tested
CNN models for VeCAD-UTM finger-vein database 119
6.9 Training error forCase AandCase B 122 6.10 Test error forCase AandCase B 122 6.11 Rate of accuracy during training forCase AandCase B 122 6.12 The proposed model for VeCAD-UTM database (referred
here as the5-13-50model) 123
xviii
LIST OF ABBREVIATIONS
AI – Artificial Intelligence
ANN – Artificial Neural Network
BP – Backpropagation
CI – Computational Intelligence
CNNs – Convolutional Neural Networks
CMC – Cumulative Match Characteristic
DNN – Deep Neural Network
EBGM – Elastic Bunch Graph Matching
EER – Equal Error Rate
FERET – Face Recognition Technology
FAR – False Acceptance Rate
FRR – False Rejection Rate
FPGA – Field Programmable Gate Array
GA – Genetic Algorithm
GPU – Graphic Processing Unit
HD – Hausdorff Distance
HMM – Hidden Markov Model
LED – Light-Emitting Diode
LDA – Linear Discriminant Analysis
MATLAB – Matrix Laboratory
MSE – Mean Square Error
MHD – Modified Hausdorff Distance
MCDNN – Multi-column Deep Neural Networks
MLP – Multilayer Perceptron
MCPCNN – Multiple Circular Path Convolutional Neural Network
MSCNN – Multiscale Convolutional Neural Networks
NIR – Near Infrared
PC – Personal Computer
PIN – Personal Identification Number
PCA – Principal Component Analysis
PNN – Probabilistic Neural Network
RAM – Random Access Memory
ROC – Receiver Operating Characteristic
RCNN – Recurrent Convolutional Neural Networks
RNN – Recurrent Neural Networks
ROI – Region of Interest
RBF – Radial Basis Function
RF – Receptive Field
RPROP – Resilient Backpropagation
SOM – Self-Organizing Map
SICoNNets – Shunting Inhibitory Convolutional Neural Networks
SCNN – Siamese Convolutional Neural Network
SDNN – Space Displacement Neural Network
SPCNN – Sparse Convolutional Neural Network
SDLM – Stochastic Diagonal Levenberg-Marquardt
SVM – Support Vector Machine
3D – Three-Dimensional
TDNN – Time Delay Neural Networks
2D – Two-Dimensional
US – United State
CHAPTER 1
INTRODUCTION
1.1 Overview of Pattern Recognition
Pattern recognition continues to be an active area of research since half a
century ago. The basic approach in pattern recognition is to transform raw images through a series of image processing algorithms before applying the final stage of
classification. Examples of applications for pattern recognition includes: speech
recognition, handwriting recognition, object recognition, etc. Figure 1.1 shows a
common pattern recognition flow. The choice of sensors, preprocessing techniques
and decision making techniques depend on the characteristics of the problem domain.
Data
Figure 1.1: Typical pattern recognition flow
The first stage of a pattern recognition system is data acquisition. In this
stage, raw data or images are collected from sensors or capture devices. Image
preprocessing is then performed, where transformations such as image enhancement,
image restoration, compression and morphological processing are applied [13]. Image
enhancement is applied to highlight certain features of interest in an image by
applying contrast transformation, Region of Interest (ROI) processing and noise
filtering. Then the image is restored by improving its appearance. This involves
recovering the original image that has been degraded by using a priori knowledge of the degradation phenomenon and applying the inverse process. Image restoration
applies deblurring and noise reduction algorithm. Compression is then performed
to remove redundant data in the image. Typically, the image is converted to image
file formats such as JPEG (Joint Photographic Experts Group) image compression