• Tidak ada hasil yang ditemukan

Feature selection for EEG Based biometrics

N/A
N/A
Protected

Academic year: 2023

Membagikan "Feature selection for EEG Based biometrics"

Copied!
42
0
0

Teks penuh

This thesis aims to evaluate electroencephalography (EEG) features and channels for biometrics and to propose methodology that identifies individuals. In addition, to evaluate intra-individual variability, we recorded the EEG ten times for each subject, and each recording performed on different days to reduce within-day effects. After acquiring data, for each channel, I calculated eight features: alpha/beta power ratio, alpha/theta power ratio, beta/theta power ratio, median frequency, PSD entropy, permutation entropy, sample entropy, and maximum Lyapunov exponents.

Furthermore, according to scores calculated by feature selection, EEG channels on occipital and right temporal areas contributed the most to identifying individuals.

Introduction

  • Introduction to biometrics
  • Introduction to EEG
  • EEG based biometrics
  • Research aim

Depending on the purpose, personal identification problems can be categorized into two types: verification and recognition (Jain et al. 2006). In this study, since we do not consider the recognition problem, personal identification will only represent the verification problem. Existing personal authentication techniques are based on three types of methods: what you have, what you know, and biometric characteristics (Ratha et al. 2001).

Personal identification systems based on a subject's possessions identify individuals by checking keys, such as a car key, ID card or even a credit card. Some personal identification systems use what you know, such as general access systems, using case-specific personal identification numbers (PINs) (Miller 1994). Biometric system identifies individuals based on physical, chemical or behavioral characteristics of the individual (Jain et al. 2007).

Biometric features, also called biometric identifiers, follow four requirements: (i) universality, (ii) distinctiveness, (iii) durability, and (iv) collectivity (Jain et al. 2007). Thus, the results of a biometric system can be represented by two types of error rates, false acceptance rate (FAR) and false rejection rate (FRR). For biometric systems using a multiple-retry scenario, false match rate (FMR) and false mismatch rate (FNMR) generally refer to performance.

However, in this study, we only consider the biometric systems that do not allow multiple attempts. To measure the electromagnetic field evoked by neurons, electroencephalography measures the electrical potential between ground and a target electrode on the scalp (Nunez et al. 2006). Our skull and scalp also act as a capacitor, only low-frequency signal is observed on the scalp (Ramon et al. 2009).

As a relatively inexpensive methodology to observe the human brain, EEG-based biometric systems have been developed by a number of studies (Del Pozo-Banos et al. 2014). Using multiple features, including AR coefficients, linear complexity, power spectral density, and phase synchronization, this biometric system identified three subjects with 82% accuracy (Bao et al. 2009). In another study, a 2-channel resting-state, eyes-closed EEG signal validated 23 subjects with 30% EER (Miyamoto et al. 2009).

Figure 1. Example of ROC curve.
Figure 1. Example of ROC curve.

Experimental Design

Participants and ethics approval

Data acquisition

Experimental procedure

Data Analysis

Preprocessing

Before calculation, the EEG signals were filtered by a finite impulse response (FIR) band-pass filter with a frequency range from 2 to 50 Hz. Where am is the autoregressive coefficient obtained using the Yule-Walker method, the power spectral density is given as. Sample entropy is defined as “the negative natural logarithm of the conditional probability that two series that are similar for m points also remain the same at the next point” (Richman & Moorman 2000).

To determine the conditional probability, each possible pair of sub-vectors is checked to see if the distance falls within the threshold range. If the time series signal is composed of a single pattern, conditional probability will become one. If a time series signal is completely irregular, the probability of permutations will be evenly distributed, and permutation entropy will be maximized.

If 𝛿𝑖(𝑡) is the distance between two trajectories in the ith dimension of the state space at time t, the Lyapunov exponent is defined as the average rate of increase of the distance. Since the Lyapunov exponent is related to expansion or contraction in phase space, it is widely used to quantify purity. For the calculation of the maximum Lyapunov exponent, we placed an EEG signal in three dimensions, with a time delay of one second.

Feature selection

For the multi-class classification problem, the information gain 𝐺𝑎𝑖𝑛(𝑆, 𝐴) of an attribute 𝐴, collection of examples 𝑆 is defined as. The information gain is therefore the expected reduction of entropy by knowing the feature 𝐴 (Mitchel 1997). Initial training data contains the first three records of genuine, and the complete data of the other subjects, except the impostors.

The number of true records in the initial training data was set to the minimum amount of data needed to build the model. Since the number of records for these two subjects was too small, we excluded them from the test and assigned them to the cheaters. To perform statistical analysis of feature selection, Wilcoxon's signed-rank test was performed between groups of features.

In general, the right hemisphere scores were much higher than the left hemisphere scores (Figure 5). There were significant differences between the right and left hemisphere scores for the three feature selection scores (𝑝 < 0.01). To calculate the optimal number of functions, we performed a check while changing the number of functions (Figure 6).

As a result, the half total error rate of feature selection algorithms was significantly lower than for randomly ordered features when the number of features was small. The area under the ROC curve of feature selection algorithms remained a significantly higher value than that of the random order, and the p-value first exceeded 0.05 then n for each Fisher score, ReliefF, information gain algorithm. When each recording was tested, the best four features were selected based on information gathering 'Alpha/Beta'/'O1', 'Entropy'/'O1', 'Entropy'/'O2', 'Median'/'P8'.

Compared to the randomly generated order, the results of our feature selection algorithm showed better performance when the number of features did not exceed 40. In the case of the adaptive scenario, the information gain algorithm resulted in much better performance than the algorithms of others, especially when the number of features was four. Although the results of the three feature selection algorithms were slightly different, we still found interesting comparisons.

Even though sample entropy and permutation entropy are relatively robust to noise levels (Aboy et al. 2007; Ramdani et al. 2009), these types of functions can introduce high complexity with an EEG device that has a low signal-to-noise ratio. In 2007, the 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (p.

Figure 4. (A) Fisher scores in fourteen EEG channels, (B) Fisher scores in feature eight feature groups, (C) ReliefF score in  fourteen EEG channels, (D) ReliefF score in feature eight feature groups, (E) Information gain in fourteen EEG channels, (F)  Inf
Figure 4. (A) Fisher scores in fourteen EEG channels, (B) Fisher scores in feature eight feature groups, (C) ReliefF score in fourteen EEG channels, (D) ReliefF score in feature eight feature groups, (E) Information gain in fourteen EEG channels, (F) Inf

Gambar

Figure 1. Example of ROC curve.
Figure 2. Names and positions of international 10-20 system (Oostenveld &amp; Praamstra 2001)
Figure 3. (A) Experimental environment, (B) Emotiv EPOC headset on a subject,  (C) Names and positions of Emotiv EPOC electrodes,
Figure 4. (A) Fisher scores in fourteen EEG channels, (B) Fisher scores in feature eight feature groups, (C) ReliefF score in  fourteen EEG channels, (D) ReliefF score in feature eight feature groups, (E) Information gain in fourteen EEG channels, (F)  Inf
+6

Referensi

Dokumen terkait

In this paper, wavelet transform, eight feature extraction methods and feature selection technique were used to obtain distinctive features of voltage disturbances.. SVM classifier was