Prediction and classification - ENGINEERINGFOOD for AUTOMATION

chapter five

Prediction

readings or image quantization to facilitate the computation of modeling for prediction and classification. Each of the classification indices is usually assigned binary numbers; for example, a good apple sample is labeled 1, and bad apple is labeled 0. Attribute values can be from survey or experi-ments. For example, beef sensory attributes may include values in tender-ness, juicitender-ness, flavor, and so on.

As discussed previously, the function f( ) may be linear or nonlinear. It can be built by linear and nonlinear statistical analysis or ANNs for prediction or classification. In the area of classification, the function f( ) is usually called a classifier, a term adopted from pattern recognition. In the area of prediction, the function f( ) is called a predictor, a term adopted from dynamic process modeling.

5.1.1 Example: Sample classification for beef grading based on linear statistical and ANN models

As described in the last chapter, two ANN approaches in supervised and unsupervised training algorithms along with statistical analysis were devel-oped for beef classification in quality grading based on ultrasonic A-mode signals. Table 5.1 presents a summary of the results obtained with BP train-ing. The outputs were processed in a winner-take-all fashion, that is, the node with the largest value was declared the winner. The accuracy was calculated simply as the number of correct classifications divided by the total number of samples in the validation data set.

Table 5.2 gives the classification results for the adaptive logic network.

Increasing the quantization level increased the accuracy of encoding, but also slightly increased the mean error. Lower levels of quantization also slightly increased the mean error. The number of training pairs varied from 93 to 97. A total of 24 samples were used for classification.

In the experiments of unsupervised training, different combinations of seven ultrasonic A-mode signal features were used and divided into three, four and eight classes as shown in Table 5.3.

Table 5.1 Classification of Marbling Levels with Back Propagation Training*

Probe

Number of Training Pairs

Number of Validation

Samples

Classification Accuracy

CPU (min) for Training

in Sun Workstation

4/490

Shear 9 24 54.2% 214

Longitudinal 9 24 70.8% 9.3

Shear 9 61 41.0% 150

Longitudinal 9 58 41.4% 6.1

Longitudinal 9 100 57.0% 28.5

* Adapted from Whittaker et al. (1991). With permission.

A benchmark study was performed to compare these methods using statistical, supervised, and unsupervised ANN training approaches on the basis of independent experiments. In the study, 97 samples with all 7 2.25 MHz shear probe frequency parameters were used for training, and 24 samples extracted from the same data set before training were used for classification.

Accuracy was determined in the following ranges: < 3% fat, 3 to 7% fat, and

> 7% fat. The number of near misses (NM) was recorded, where a NM was defined as misclassification by ± < 0.5% fat. Table 5.4 shows the results of the benchmark study.

Table 5.2 Classification of Marbling Levels with Adaptive Logic Networks*

Probe and Input Parameters

Classification Accuracy

Mean Error

CPU (min) for Training in Sun Sparcstation 1

<3%

Fat

3%–7%

Fat

>7%

Fat Overall Longitudinal

with all seven frequency parameters

50.0 61.5 57.1 58.3 1.93 11.9

Shear with all seven frequency parameters

100 53.8 57.1 62.5 1.81 20.0

Longitudinal with the frequency parameters fc, fsk, and Lm

75.0 61.5 42.0 58.3 2.05 15.7

Longitudinal with the frequency parameters fc, fsk, and Lm

100 61.5 57.1 58.3 3.85 5.1

* Adapted from Whittaker et al. (1991). With permission.

Table 5.3 Accuracies of Classification of Beef Ultrasonic A-Mode Signals with Unsupervised Training*

Input Parameters

3 Classes of Accuracy

4 Classes of Accuracy

8 Classes of Accuracy fa, fb, fp, fc, B^*, fsk, and Lm 68.9% 63.5% 31.1%

fb, fp, fc, B^*, fsk, and Lm 67.6% 54.1% 29.7%

fp, fc, B^*, fsk, and Lm 67.6% 54.1% 29.7%

fc, B^*, fsk, and Lm 66.2% 54.1% 29.7%

B^*, fsk, and Lm 67.9% 58.1% 29.7%

fsk, and Lm 40.5% 23.0% 16.2%

* Adapted from Whittaker et al. (1991). With permission.

5.1.2 Example: Electronic nose data classification for food odor pattern recognition

By recalling Eq. (4.29), obviously, the classification modeling of food odor by an electronic nose is a multivariate problem with high dimensionality because usually n >> 1, similar to AromaScan’s n = 32. It usually happens that in a multivariate problem with high dimensionality, the variables are partly correlated. So, a technique is needed to reduce the dimensionality and allow the information to be retained in much fewer dimensions. Principal Component Analysis (PCA) (Hotelling, 1933) is such a linear statistical tech-nique. Through the manipulation of the PCA, a high dimensional data set, such as , can be converted into a new data set, such as , which are uncorrelated. From the new data set, the first two or three variables are usually good enough to get good model, that is,

Table 5.4 Benchmark Comparison of Statistical, Supervised, and Unsupervised ANN Approaches*

Actual Classes

Predicted Classes Type II Accuracy

<3% 3 to 7% >7% NM (a) Statistical Regression

<3% 4 0 0 0 100.0%

3 to 7% 4 8 1 1 61.5%

>7% 0 4 3 3 42.8%

Type I accuracy

50.0% 66.7% 75.0% — 63.9%

(b) Supervised Training (Adaptive Logic Neural Network)

<3% 4 0 0 0 100.0%

3 to 7% 6 6 1 2 46.1%

>7% 2 2 3 0 42.9%

Type I accuracy

33.0% 75.0% 75.0% — 54.2%

(C) Unsupervised Training (Kohonen Self-Organizing Feature Maps Neural Network)

<3% 3 1 0 0 75.0%

3 to 7% 2 8 3 1 61.5%

>7% 1 4 2 0 28.6%

Type I accuracy

50.0% 61.5% 40.0% — 55.5%

* Adapted from Whittaker et al. (1991). With permission.

x˜i (i= 1, 2,…, n) X˜_i (i= 1, 2,…, n)

c β0

p β1

pX˜₁ β2 pX˜₂

+ +

cˆ = βˆ⁰^p+βˆ¹^pX˜₁+βˆ²^pX˜₂+βˆ³^pX˜₃

where are the coefficient estimates of the new model. These equations significantly simplify Eq. (4.29). Figure 5.1 shows generically the relationship between three groups of data after principal component pro-cessing before the three groups of data may be mixed up together. After the processing, the first two principle components may be used to differentiate the data sufficiently. When the groups of data can be differentiated linearly, that is, they are “linearly separable,” the discriminant function can be fitted statistically or with single-layer perceptron neural networks (Rosenblatt, 1959): Otherwise advanced ANN architectures are needed to handle the cases of nonlinear separable or nonlinear. Similarly, if the first three principle components are used to differentiate the data, three dimensional co-ordinates are needed to visualize the data with similar linear separable or nonlinear separable cases handled by statistical and ANN methods.

PCA is the most widely used method for electronic nose data classifica-tion. There are, also partial least squares (PLS) (Wold, 1966), cluster analysis (Gardner and Bartlett, 1992), discriminant function analysis (Gardner and Bartlett, 1992), and so on. PLS is especially effective for small sample problems (Yan et al., 1998). Cluster analysis is an unsupervised pattern recognition method, which is self-organized. It is often used together with PCA to identify groups or clusters of points in configuration space (Gardner and Bartlett, 1992). Discriminant function analysis assumes that the data are normally distributed, which limits the use of the method (Gardner and Bartlett, 1992).

Figure 5.1 Generic plot of an electronic nose data differentiation for three groups of data by the first two principal components.

0 2 4 6 8 10 12

Principal Component 1

Principal Component 2

Group 1 Group 2 Group 3

βˆⁱ^p (i= 0, 1, 2)

5.1.3 Example: Snack food classification for eating quality evaluation based on linear statistical and ANN models

The classification performance of the back propagational trained neural net-work was judged by defining the classification rate as

(5.2) where NC is the number of correctly classified samples.

The classification performance of the network varies depending on the number of features used. Tables 5.5 and 5.6 show the classification rate of the network on some training and validation samples respectively, using all 22 textural, size, and shape features. Tables 5.5 and 5.6 indicate that the classification rate was very high on the training samples and acceptably good on the corresponding validation samples. However, the classification of the quality with the network for reduced 11 and 8 features was not so efficient as the full 22 features, shown in Tables 5.7 and 5.8, respectively. This supports the assumption of nonlinearity between the textural and morphological fea-tures and sensory panel scores. Stepwise regression only can help find a

Table 5.5 Performance (% Classification Rate) of Neural Networks with 9 Hidden Nodes on Validation Samples with All 22 Features*

Machine Wear/Raw Material Conditions

Quality/Sensory Attributes

Bubble Rough Cell Firm Crisp Tooth Grit

A 92 90 83 88 85 84 94

B 90 98 98 98 94 94 92

C 90 94 86 78 82 82 90

D 90 93 78 95 87 88 90

* Adapted from Sayeed et al. (1995). With permission.

Table 5.6 Performance (% Classification Rate) of Neural Network with 9 Hidden Nodes on Training Samples with All 22 Features*

Machine Wear/Raw Material Conditions

Quality/Sensory Attributes

Bubble Rough Cell Firm Crisp Tooth Grit

A 96 98 90 94 93 94 97

B 91 91 98 97 96 89 93

C 95 95 94 88 92 91 97

D 99 100 96 96 99 97 98

* Adapted from Sayeed et al. (1995). With permission.

Classification rate % NC ---N ×100

compact linear relationship between input and output variables while the reduced features were derived by the stepwise regression which could not identify the nonlinearity. This confirms that ANNs were able to model the nonlinear relationship between the image textural and morphological fea-tures and sensory attributes.

The results of this work indicate that the combination of textural and mor-phological image features can be employed to quantify the sensory attributes of snack quality with a high degree of accuracy from the ANN classifier when compared with human experts.

5.1.4 Example: Meat attribute prediction based on linear statistical and ANN models

Wavelet decomposition as a promising alternative for textural feature extrac-tion from beef elastograms performed much better than the Haralick’s sta-tistical method for extraction of textural features in the stasta-tistical modeling.

This conclusion was based on the prediction ability of the models in terms of the feature parameters extracted by one of the two methods. Huang et al. (1997) showed that the relationship between Haralick’s statistical textural

Table 5.7 Performance (% Classification Rate) of Neural Networks with Reduced 11 Features*

Machine Wear/Raw Material Conditions

Quality/Sensory Attributes

Bubble Rough Cell Firm Crisp Tooth Grit

A 81 86 78 65 59 69 88

B 82 94 94 82 82 74 80

C 72 78 60 56 54 48 80

D 82 94 62 75 75 46 75

* Adapted from Sayeed et al. (1995). With permission.

Table 5.8 Performance (% Classification Rate) of Neural Networks with Reduced 8 Features*

Machine Wear/Raw Material Conditions

Quality/Sensory Attributes

Bubble Rough Cell Firm Crisp Tooth Grit

A 84 90 76 73 71 71 87

B 96 100 96 96 88 96 92

C 68 84 50 54 68 62 74

D 84 92 73 92 76 80 86

* Adapted from Sayeed et al. (1995). With permission.

feature parameters from beef elastograms and the beef attribute parameters were not significant in the sense of linear statistics. Wavelet textural features from beef elastograms were more informative, consistent, and compact and were used to build models with the ability to predict the attribute parameters acceptably.

Further, Huang et al. (1998) explored the relationship between the wavelet textural features from beef elastograms and the attribute parameters of the beef samples. The purpose of prediction was using the one-hidden-layered feedforward neural networks trained by different methods to implement the process of BP. When compared to the regular BP using the gradient descent method, adding a momentum term improved the training efficiency, and the training epochs were reduced (0.03 to 7.22 times). The Levenberg–

Marquardt algorithm was less efficient than the gradient descent algorithm for the cases with reduction of training epochs by 100 times or less. However, it was more efficient for the cases with reduction of training epochs greater than a few hundred times. In the case of difficult convergence in the SARC model using the gradient descent algorithm, the Levenberg–Marquardt algorithm converged much more efficiently. In all cases, the Levenberg–

Marquardt algorithm achieved better model output variation accounting and network generalization for attribute prediction. If the consideration of training efficiency were not necessary, the Levenberg–Marquardt algorithm would be a good choice. Further, incorporating the weight decay vs. imple-menting the Levenberg–Marquardt algorithm alone was effective in improv-ing network generalization resultimprov-ing in higher R² and lower validation MSE values.

This study concluded that ANNs were effective in the prediction of beef quality using wavelet textural feature parameters of the ultrasonic elasto-grams. ANNs can capture some unknown nonlinear relation between the process inputs and outputs and effectively model the variation in the textural feature space.

Dalam dokumen ENGINEERINGFOOD for AUTOMATION (Halaman 155-162)