Convolutional neural networks for face recognition and finger-vein biometric identification.

(1)

(2)

PSZ 19:16 (Pind. 1/07)

DECLARATION OF THESIS / UNDERGRADUATE PROJECT PAPER AND COPYRIGHT

Author’s full name :

Date of birth :

Title : CONVOLUTIONAL NEURAL NETWORKS FOR FACE RECOGNITION

AND FINGER-VEIN BIOMETRIC IDENTIFICATION

Academic Session:

I declare that this thesis is classified as :

I acknowledged that Universiti Teknologi Malaysia reserves the right as follows :

1. The thesis is the property of Universiti Teknologi Malaysia.

2. The Library of Universiti Teknologi Malaysia has the right to make copies for the purpose of research only.

3. The Library has the right to make copies of the thesis for academic exchange.

Certified by :

NOTES : * If the thesis is CONFIDENTIAL or RESTRICTED, please attach with the letter from the organisation with period and reasons for confidentiality or restriction.

UNIVERSITI TEKNOLOGI MALAYSIA

√

CONFIDENTIAL (Contains confidential information under the Official Secret

Act 1972)*

RESTRICTED (Contains restricted information as specified by the

organisation where research was done)*

OPEN ACCESS I agree that my thesis to be published as online open access (full text)

2013/2014/2

SYAFEEZA BINTI AHMAD RADZI

810324-71-5024 MOHAMED KHALIL BIN MOHD HANI

(3)

“I hereby declare that I have read this thesis and in my

opinion this thesis is sufficient in terms of scope and quality for the award of the degree of Doctor of Philosophy (Electrical Engineering)”

Signature :

(4)

(5)

(6)

(7)

(8)

iv

ACKNOWLEDGEMENT

I am very blessed with the completion of this thesis. With this opportunity,

I would like to express my gratitude to several people that have involved directly or

indirectly to the completion of this thesis.

Firstly, I would like to express my sincere appreciation to my supervisor

Professor Dr. Mohamed Khalil bin Mohd Hani for the useful critics, guidance and

advices during the journey. I have gained valuable knowledge during his supervision.

It was also a privilege to work closely with the members of VeCAD Lab namely Rabia,

Vishnu, Lee Yee Hui, Sia Chen Wei, Moganesh and Liew Shan Sung. I am very

fortunate to have these people during my ups and downs of conducting this research. I

am glad for the sharing of constructive ideas and their concern.

Most importantly, I would like to thank my family especially my parents,

husband and daughter. Thank you for the continuous encouragement, patience and

boundless support in all my endeavours.

Last but not least, I would like to acknowledge Universiti Teknikal Malaysia

(9)

ABSTRACT

(10)

vi

ABSTRAK

Convolutional Neural Network (CNN) yang merupakan variasi kepada

Multilayer Perceptron(MLP) telah menunjukkan kebolehan dalam kerja pengecaman

yang rumit terutamanya dalam pengecaman corak visual. Walau bagaimanapun, senibina klasik CNN iaitu LeNet-5, yang merupakan asas kepada kebanyakan penyelesaian, mempunyai pengiraan intensif yang tinggi. CNN ini juga berhadapan dengan masa latihan yang terlalu lama disebabkan oleh bilangan lapisannya dalam lingkungan enam hingga lapan lapisan. Dalam kajian ini, model CNN dengan kurang kekompleksan telah dicadangkan untuk diaplikasi pada pengecaman muka dan pengesahan identiti biometrik urat jari. Senibina yang lebih ringkas telah diperolehi dengan cara mencantumkan lapisan convolution dan subsampling ke satu lapisan, dengan gabungan skim sambungan separa antara dua lapisan pertama dalam rangkaian. Keputusannya, jumlah lapisan telah dikurangkan kepada empat. Bilangan petak sifat pada setiap lapisan telah dioptimumkan berdasarkan kepada jenis pangkalan data yang digunakan. Kesannya, bilangan parameter rangkaian (termasuk neuron, parameter terlatih dan sambungan) nyata sekali dapat dikurangkan, terutamanya kebolehan generalisasi yang lebih baik. Algoritma Stochastic Diagonal Levenberg-Marquadt

(11)

TABLE OF CONTENTS

CHAPTER TITLE PAGE

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENT iv

ABSTRACT v

ABSTRAK vi

TABLE OF CONTENTS vii

LIST OF TABLES xii

LIST OF FIGURES xv

LIST OF ABBREVIATIONS xviii

1 INTRODUCTION 1

1.1 Overview of Pattern Recognition 1

1.2 Neural Networks in Pattern Recognition Problems 2

1.3 Convolutional Neural Network 3

1.4 Biometric Pattern Recognition 4

1.4.1 Summary of Existing Face Recognizers 7

1.4.2 Summary of Work on Finger-vein Pattern

Recognition 8

1.5 Problem Statement 8

1.6 Research Objectives 13

1.7 Scope of Work 14

1.8 Research Contributions 14

1.9 Thesis Organization 16

2 LITERATURE REVIEW 17

2.1 Basic CNN Architectures 17

2.2 Previous Work applying CNN in Face Recognition 19 2.3 Related Previous Work on Finger-vein Biometric

(12)

viii

2.4 Other State-of-the-Art CNN Architectures 27

2.4.1 Siamese CNN 27

2.4.2 Space Displacement Neural Networks 31

2.4.3 Shunting Inhibitory Convolutional Neural

Networks 32

2.4.4 Sparse Convolutional Neural Networks 34

2.4.5 Multiscale Convolutional Neural

Net-works 34

2.4.6 Multiple Circular Path Convolutional

Neural Networks 35

2.4.7 Multi-column Deep Neural Networks 36

2.4.8 3D Convolutional Neural Networks 37

2.4.9 Recurrent Convolutional Neural

Net-works 39

2.5 Summary 39

3 CONVOLUTIONAL NEURAL NETWORK -

ALGO-RITHMS AND ARCHITECTURE 40

3.1 Multilayer Perceptron Neural Network (MLPNN) 40

3.1.1 Activation Functions 42 3.1.2 Weight Initialization Methods 42

3.1.3 Normalization Method 44

3.2 Neural Network Training 45

3.2.1 Gradient Descent Backpropagation

Algo-rithm 46

3.2.2 Resilient Backpropagation 49

3.3 LeNet-5 Convolutional Neural Networks 49

3.3.1 Algorithms in LeNet-5 50

3.3.2 Convolutional Neural Network Training 54

3.4 Summary 55

4 PROPOSED CONVOLUTIONAL NEURAL

NETWORK: MODELLING, ARCHITECTURE AND

ALGORITHM 57

4.1 Methodology 57

4.1.1 Cross-validation Technique to optimize

(13)

4.2.2 CNN with Fused

Convolution/Subsam-pling 63

4.2.3 Connection Schemes 67

4.3 Proposed Training Algorithm 69

4.3.1 Stochastic Diagonal

Levenberg-Marquardt (SDLM) 69

4.3.2 Proposed Modifications to SDLM

Algo-rithm 71

4.4 Proposed Training Scheme to recognize New

Categories without Full CNN Training 73 4.5 Generalizing the CNN Architecture for New

Problem Domain 76

4.6 Summary 77

5 EXPERIMENTAL WORK AND RESULTS FOR CNN IN

FACE RECOGNITION 78

5.1 Performance Evaluation Procedures 78

5.2 Data Preparation 82

5.2.1 Face databases 82

5.3 Experimental Work and Results of the Proposed

Baseline CNN Model 85

5.3.1 Results of Cross-validation 86

5.3.2 Best Connection Scheme 87

5.3.3 The Most Suitable Normalization and

Weight Initialization Methods 88

5.3.4 The Optimum Input Image Size 89

5.3.5 Benchmarking Results 90 5.3.5.1 Benchmarking of Accuracy 90

5.3.5.2 Benchmarking of Execution

and Convergence Time 92

5.3.5.3 Analysis and Benchmarking of

Model Complexity 93

5.3.5.4 Comparison between the

pro-posed CNN Model and LeNet-5 95

5.4 Experimental Work and Results of Proposed CNN

(14)

x

Weight Initialization Methods 97 5.4.4 The Optimum Input Image Size 99

5.4.5 Benchmarking Results 100

5.5 Experimental Results of CNN Model Targeted for

Face Images of the ”Extreme Challenge” Type 101

5.5.4 The Optimum Input Image Size 105 5.5.5 Benchmarking Results 107

5.6 Analysis of the Proposed Modification on SDLM

Algorithm 107

5.6.1 Comparison between the Proposed

Mod-ification on SDLM Algorithm and

Stan-dard Backpropagation 109

5.7 Analysis of Training Scheme to recognize New

Categories without Full CNN Training 111

5.8 Conclusion 112

6 EXPERIMENTAL WORK AND RESULTS FOR CNN IN

FINGER-VEIN BIOMETRIC IDENTIFICATION 113

6.1 Generation of Finger-vein Database for Biometric

Identification 113

6.2 Performance Evaluation of Finger-vein Biometric

Identification 118

6.2.1 VeCAD-UTM database 118

6.2.5 The Optimum Input Image Size 123

6.2.6 Benchmarking Results 124

(15)

REFERENCES 132

(16)

xii

LIST OF TABLES

TABLE NO. TITLE PAGE

1.1 Characteristic comparison of biometric application [1] 6

1.2 History of some biometric development [2] 6

1.3 Previous work on face recognition based on CNN 7

1.4 Previous work on face recognition applying non-CNN

methods 7

1.5 Accuracy achieved by Computational Intelligence (CI)

approach for finger-vein identification system 8

1.6 Detail information and accuracy achieved by conventional

approach for fingervein identification system 8

2.1 Face Recognition System using CNN 22

2.2 Conventional methods of finger-vein identification 25

2.3 Algorithms applied by CI approach for finger-vein

identifica-tion system 26

2.4 Accuracy achieved by CI-based for finger-vein identification 27

2.5 Related previous work on finger-vein biometric system and

its recognition speed 27

3.1 Input/output relation and symbol for each activation function 43

3.2 Each row indicates which feature maps in S2 are combined

by the particular feature maps of C3 [3] 53

4.1 Variant type of connections between C1 and C2 layer 68

4.2 Connection scheme between C1 and C2 layer for AT&T

database 68

4.4 Connection scheme between C1 and C2 layer for AR Purdue

database 69

4.5 The values ofηgas given in [3] 73

5.1 Pose variations of FERET database 85

5.2 Performance of the model tested for AT&T database 87

(17)

5.6 Accuracy for different combinations of weight initialization

algorithm and normalization methods for AT&T database 88

5.7 Accuracy of each input size for AT&T database 89

5.8 Benchmarking of accuracy of other CNN face recognizers on

AT&T database 91

5.9 Benchmarking with non-CNN approach 91

5.10 Benchmarking of execution time and convergence learning

time against other face recognizers 92

5.11 Benchmarking of model complexity against previous work

that apply CNN (not restricted to face recognition problems) 95 5.12 Comparison between fused architecture and LeNet-5 96

5.13 Performance of the model tested for AR database 97

5.15 Connection scheme between C1 and C2 layer for AR Purdue

database 98

5.16 Parameter Setup for AR Purdue database 99

algorithm and normalization methods for AR Purdue database 99

5.18 Parameter setup for each input size for AR Purdue database 100 5.19 Accuracy of each input size for AR Purdue database 100

5.20 Benchmarking of face identification system using AR Purdue

database with existing works 102

5.21 Performance of the model tested for FERET database 103

5.22 Variant type of connections between C1 and C2 layer for

FERET database 103

5.23 Connection scheme between C1 and C2 layer for FERET

database 104

5.24 Parameter Setup for FERET database 104

algorithm and normalization methods for FERET database 105

5.26 Parameter setup for each input size for FERET database 105

5.27 Accuracy of each input size for FERET database 105

5.28 Benchmarking of face identification system using FERET

database with existing works 107

(18)

xiv

5.30 Result of BP training using combination of different initial

learning rate and learning rate fading 110

5.31 Result of BP training using combination of different initial

learning rate and learning rate fading 111 5.32 Result of accepting new subject(s) from JAFFE database into

the system 112

6.1 Performance of the model tested for VeCAD-UTM

finger-vein database 120

6.3 Connection scheme between C1 and C2 layer for

VeCAD-UTM database 120

6.4 Parameter Setup for Case A and Case B 121

6.5 Accuracy for different combinations of weight initialization algorithm and normalization methods for Case A

VeCAD-UTM finger-vein database 121

algorithm and normalization methods for Case B

6.7 Accuracy of each input size for VeCAD-UTM database 123

6.8 Accuracy achieved by different number of subjects 124

6.9 Benchmarking of accuracy achieved by the proposed method

and other CI methods for finger-vein identification system 125 6.10 Benchmarking of accuracy achieved by the proposed method

and conventional methods for finger-vein identification

system 125

6.11 Benchmarking on recognition speed for finger-vein biometric

(19)

LIST OF FIGURES

FIGURE NO. TITLE PAGE

1.1 Typical pattern recognition flow 1

1.2 Example of CNN Architecture 4

1.3 Taxonomy of biometric methods 5

1.4 Challenges faced by biometric face recognition system 12 1.5 Inappropriate lighting in finger-vein samples of

2.1 Handwriting zip code application using CNN [4] 17

2.2 LeNet-5 CNN architecture for handwritting recognition 18

2.3 CNN architecture by Y. Simard et al. [5] with fused

convolution/subsampling 18

2.4 CNN architecture for fingerprint verification process [6] 28

2.5 Example of genuine and impostor pairs 29

2.6 Illustration of SCNN framework 30

2.7 A sample of ZIP code image and its possible segmentation 32

2.8 INSEG and OUTSEG segmentation for word recognition 32

2.9 Three schemes of SICoNNets. (a) Full-Connection. (b)

Toeplitz-Connection. (c) Binary-connection [7] 33

2.10 Shunting neuron model [7] 33

2.11 Sketch of the SPCNN connectivity [8]. 34

2.12 Multiscale Convolutional Neural Networks (MSCNN)

archi-tecture for traffic sign recognition [9]. 35

2.13 Three types of network paths connecting the input and the hidden layers in the Multiple Circular Path Convolutional

Neural Network (MCPCNN) (a) Full-connection path

(b) Self-correlation path (c) Neighbourhood-correlation

(20)

xvi

2.14 Multi-column Deep Neural Networks (MCDNN) and its basic

buiding block (a) Deep Neural Network (DNN) architecture

(b) The dataset is preprocessed (P) before the training starts; during training all original or pre-processed images are randomly distorted before each epoch (D). (c) MCDNN architecture: the input image is pre-processed by n different preprocessorsP0₋P_n−1and final predictions are obtained by

averaging individual predictions of each DNN [11]. 37

2.15 Comparison between Two-Dimensional (2D) and

Three-Dimensional (3D) convolutions 38

2.16 3D CNN architecture for human action recognition 38

2.17 Recurrent Convolutional Neural Networks (RCNN)

architec-ture for dialog act classification 39 3.1 (a) Biological neuron (b) Modeling in NN 41

3.2 (a) Training phase (b) Test phase 48

3.3 An example of a LeNet-5 CNN architecture for handwritting

recognition [3] 51

3.4 Process in convolutional layer of CNN 52

3.5 Process in subsampling layer of CNN 52

3.6 LeNet-5 CNN architecture 54

4.1 Prediction of early stopping rule through cross-validation

technique [12]. 59

4.2 CNN architecture by Y. Simard et al. [5] with fused

convolution/subsampling 64

4.3 The proposed four-layers CNN architecture 64

4.4 (a) convolution and subsampling of LeNet-5 CNN

archi-tecture and (b) fused convolution/subsampling in proposed

CNN 66

4.5 Conceptual view of generalizing the feature extraction layer

to other database in the system 76

5.1 Four different types of weight initialization methods 81

5.2 Sample of images from AT&T database 83 5.3 Sample of images from AR Purdue database 83

5.4 Preprocessing stages for AR Purdue database 84

5.5 Sample of images from FERET database 85

5.6 Preprocessing stages for FERET database 85

5.7 Validation error rates in cross-validation of the six tested

(21)

5.10 Validation error rates in cross-validation of the seven tested

CNN models for AR database 97

5.11 Misclassified samples for AR database 99

5.12 The proposed architecture for AR Purdue database (referred

here as the15-45-130model) 100

5.13 Feature maps at each layer after training for AR database 101

CNN models for FERET 102

5.15 The proposed architecture for FERET database (referred here

as the15-47-130model) 106

5.16 Feature maps at each layer after training for FERET database 106

5.17 Error rate on the training dataset 108

5.18 Error rate on the test dataset 108

5.19 Classification accuracy performance. (Note: Case A is for

the proposed learning with repeating scheme and Case B is

the original SDLM) 109

6.1 Non-ideal vein samples of SDUMLA-HMT

finger-vein database 113

6.2 Sample of images from VeCAD-UTM database 114

6.3 Example of vertical cut 115

6.4 Example of horizontal cut 116

6.5 Image samples for 5 subjects (a) Case A and (b) Case B 117

6.6 Preprocessing stages and segmentation process for Case A 117

6.7 Preprocessing stages for Case B 117

CNN models for VeCAD-UTM finger-vein database 119

6.9 Training error forCase AandCase B 122 6.10 Test error forCase AandCase B 122 6.11 Rate of accuracy during training forCase AandCase B 122 6.12 The proposed model for VeCAD-UTM database (referred

here as the5-13-50model) 123

(22)

xviii

LIST OF ABBREVIATIONS

AI – Artificial Intelligence

ANN – Artificial Neural Network

BP – Backpropagation

CI – Computational Intelligence

CNNs – Convolutional Neural Networks

CMC – Cumulative Match Characteristic

DNN – Deep Neural Network

EBGM – Elastic Bunch Graph Matching

EER – Equal Error Rate

FERET – Face Recognition Technology

FAR – False Acceptance Rate

FRR – False Rejection Rate

FPGA – Field Programmable Gate Array

GA – Genetic Algorithm

GPU – Graphic Processing Unit

HD – Hausdorff Distance

HMM – Hidden Markov Model

LED – Light-Emitting Diode

LDA – Linear Discriminant Analysis

MATLAB – Matrix Laboratory

MSE – Mean Square Error

MHD – Modified Hausdorff Distance

MCDNN – Multi-column Deep Neural Networks

MLP – Multilayer Perceptron

MCPCNN – Multiple Circular Path Convolutional Neural Network

MSCNN – Multiscale Convolutional Neural Networks

NIR – Near Infrared

(23)

PC – Personal Computer

PIN – Personal Identification Number

PCA – Principal Component Analysis

PNN – Probabilistic Neural Network

RAM – Random Access Memory

ROC – Receiver Operating Characteristic

RCNN – Recurrent Convolutional Neural Networks

RNN – Recurrent Neural Networks

ROI – Region of Interest

RBF – Radial Basis Function

RF – Receptive Field

RPROP – Resilient Backpropagation

SOM – Self-Organizing Map

SICoNNets – Shunting Inhibitory Convolutional Neural Networks

SCNN – Siamese Convolutional Neural Network

SDNN – Space Displacement Neural Network

SPCNN – Sparse Convolutional Neural Network

SDLM – Stochastic Diagonal Levenberg-Marquardt

SVM – Support Vector Machine

3D – Three-Dimensional

TDNN – Time Delay Neural Networks

2D – Two-Dimensional

US – United State

(24)

CHAPTER 1

INTRODUCTION

1.1 Overview of Pattern Recognition

Pattern recognition continues to be an active area of research since half a

century ago. The basic approach in pattern recognition is to transform raw images through a series of image processing algorithms before applying the final stage of

classification. Examples of applications for pattern recognition includes: speech

recognition, handwriting recognition, object recognition, etc. Figure 1.1 shows a

common pattern recognition flow. The choice of sensors, preprocessing techniques

and decision making techniques depend on the characteristics of the problem domain.

Data

Figure 1.1: Typical pattern recognition flow

The first stage of a pattern recognition system is data acquisition. In this

stage, raw data or images are collected from sensors or capture devices. Image

preprocessing is then performed, where transformations such as image enhancement,

image restoration, compression and morphological processing are applied [13]. Image

enhancement is applied to highlight certain features of interest in an image by

applying contrast transformation, Region of Interest (ROI) processing and noise

filtering. Then the image is restored by improving its appearance. This involves

recovering the original image that has been degraded by using a priori knowledge of the degradation phenomenon and applying the inverse process. Image restoration

applies deblurring and noise reduction algorithm. Compression is then performed

to remove redundant data in the image. Typically, the image is converted to image

file formats such as JPEG (Joint Photographic Experts Group) image compression