Classifiers and Pattern Recognition Methods

Literature Review

3.3 Classifiers and Pattern Recognition Methods

Figure 3.8: delta2 MFCC extraction with time.

Computations in each output and hidden layer are given below:

O(x) = G(b(2) +W(2)h(x)) (3.2) H(x) =s(b(1) +W(1)x) (3.3)

3.3.2 RBFN

Radial Basis Function Network is structurally similar to MLP. We used RBFN in our research for identifying speaker by analyzing features from speech samples. One of the biggest advantage of RBFN is that it is applicable in most dimensions because of low restrictions on how data are handled. This method generally uses the concept of Euclidean space Rn. In that case the function is:

s(x) =

j=1

λ_jφ(||x−x_j||), xRⁿ (3.4) Here,xj are data points,φis univariate, and theλj are scaler parameters[20]. Figure 7 shows the structure of RBFN.

Figure 3.10: Representation of Radial Basis Function Network(RBFN) model.

3.3.3 Decision Tree

Decision trees are used as a tool for decision making that use the concept of a tree- like model of options and their possible results. The best way to demonstrate this

algorithm model is, it is completely established on the basis of conditional control statements.

Figure 3.11: Decision Tree.

3.3.4 Gradient Boosting Classifier

Gradient boosting classifiers are a collection of machine learning algorithms that integrate a number of weak learning models to create a powerful predictive model which use decision tress. The implementation processes of gradient boosting are:

• First we have to fit the model

• Then we have to tune the model’s parameters and Hyper parameters

• Next, we have to make predictions

• Finally, on the basis of those predictions we can interpret the results

Figure 3.12: Gradient Boosting Classifier.

3.3.5 Random Forest

Random forests perform their work on training a large number of decision trees.

The output of random forest is the class chosen by the majority of trees. Random decision forests address the problem of decision trees over-fitting their training set.

As a result, they give better result than decision trees in most cases. However, they produce less accurate results than gradient enhanced trees.

Figure 3.13: Random Forest.

3.3.6 Naive Bayes

It’s a classification technique based on Bayes’ Theorem and the predictor indepen- dence assumption. In simple terms, a Naive Bayes classifier assumes that chances of the presence of one feature in a class has no dependency on the chances of the presence of any other feature.

3.3.7 Logistic Regression

Logistic regression predicts the output and categorize them. Therefore, the result must be a categorical or discrete value. For example, Yes or No, 0 or 1, True or False etc. We fit an ”S” shaped logistic function which will be predicting our desired output as two maximum values from 0 to 1 in logistic regression.

Figure 3.14: Logistic Regression Classifier.

3.3.8 Support-vector-machine Classifier

SVM is a classifier algorithm which is utilized in regression, classification, and out- liner identification [26]. The fundamental goal of SVM is to split n-dimensional space into classes using a line or decision boundary so that data may be easily classified into the appropriate class. This decision boundary is also called the hyperplane [25].

Support vector machines have the advantage of being useful for high dimensions . Additionally, it is useful after the number of dimensions exceeds the number of features. Because the decision boundary is constructed from a subset of training points.

For instance, the SVM method can be used to categorize male and female voices.

To begin, we must train our model with a variety of male and female voices so that our SVM may learn how to identify between male and female based on their characteristics. Thus, because the support vector establishes a decision border between these two sets of data (male and female) and selects extreme cases, it will observe the extreme cases of females and males.

Figure 3.15: Support-Vector-Machine (SVM).

3.3.9 K-Nearest Neighbor

In K-Nearest Neighbor, the symbol ’K’ denotes the nearest neighbors of a new unknown variable that must be predicted or categorized. For example, we sometimes share many common characteristics with our nearest peers. As a result, we tend to form friendships with those we share more common interests. The same approach is used in the KNN algorithm.

Its goal is to find all of a new unknown data point’s nearest neighbors to determine what class it belongs to. It’s a method based on distance. To classify the findings correctly, we must first identify the value of K, which must always be an odd number.

Due to the fact that K is set to even, it is possible for a situation to emerge where the components of both groups are equal. It is preferable to choose an odd value for K since such a situation of equality between the two classes would never arise in this case. Given that one of the two groups would still be in the majority, an odd value for K is chosen.[35] The following table summarizes the effect of selecting a lesser or more significant K value on the model:

• Increased K value : When the value of k increases, the underfitting case arises.

The model could not learn correctly from the training data in this situation.

• Smaller K value : When the value of k is smaller, overfitting conditions de-

Figure 3.16: K-Nearest Neighbor.

velop. The model will capture all of the training data, including noise. In this circumstance, this model will perform poorly on the test data.

3.3.10 Gaussian Mixture Model

Gaussian mixture models are effective clustering models for classifying audio based on their ability to distinguish between sounds. For example, differentiating between different instruments in a song or separating the speaker’s voice from background noise when he speaks to his voice assistant. Gaussian mixture model is a probability distribution.

In the above figure, the clusters are highly defined because they are grouped into

Figure 3.17: Gaussian Mixture Model.

three clusters. However in the figure above, two distinct cluster groups intersect.

Thus, there are three clusters: one made of blue, one made of yellow, and another made of blue and yellow. So, GMM partitioned the dataset into distinct clusters, with the advantage that there is no requirement for specifying a relationship between the data points contained within the clusters. As this model is capable of classifying data-points into clusters and then applying learning, these clusters are created. By calculating the log likelihood of the voice vector and comparing it to previously gathered data, the Gaussian mixture model can recognize this speech. If the log probability of the voice vector matches the recorded data, the individual person can be identified [27]

Chapter 4

Dalam dokumen Voice Recognition Using Machine Learning And Central Database To Enhance Security System (Halaman 36-44)