Fault Feature Extraction and Selection - Phase I Experimentation

CHAPTER 4 Preliminary Time, Frequency Domain based Centrifugal

4.2 Phase I Experimentation – Time Domain Analysis

4.2.1 Fault Feature Extraction and Selection

From the literature survey summary presented in Table 1.8, features including the mean, standard deviation, skewness, kurtosis, crest factor, and entropy are selected to perform the

fault diagnosis in time-domain. The definition of the features and their physical significance are presented below

Mean (μ): The mean, is simply the average of all the data points in the time-domain data.

With the change in fault severity or the damage in the system, the average load on the CP system is expected to increase/ decrease depending upon the fault characteristics. Therefore, the mean is expected to be an essential feature for the CP fault study. It is given as,

1 ^N

i i

N x ^(4.1)

Where, N is the number of points in the sample,

x

_i is the amplitude of the i^thdata point in the sample.

Standard deviation (σ): Standard deviation is a dimensional quantity that measures the dispersion of the distribution from the mean value. It is a measure of the active energy/ power content of the signals. A low value of standard deviation implies that the data is placed close to the mean and a higher value implies that the data is spread out. It is given as,

2 1

( ) / ( 1)

n i i

x n

 





  ^(4.2)

Skewness (χ): The skewness is a non-dimensional feature that measures the degree of asymmetry of the probability distribution of real-valued variables around their mean. The value of skewness could be positive or negative or undefined. Physically, a negative skewness value represents a longer tail on the left side of the probability density function when compared to its right side, and the bulk values (including the median) lie to the left of the

mean. Similarly, a positive skewness value indicates an asymmetry towards the positive values. If skewness value is zero, the distribution is even about the mean. It is expressed as,

3 1

1 ^N

i i

N x

(4.3)

Kurtosis (κ): The kurtosis is also a non-dimensional feature that reflects the extent of flatness

or spikiness of the probability distribution of the signal. It gives an amount of the size of the tails of the distribution and is used as an indicator of significant peaks in a data set. Its value is generally expected to be high as the severity of the faults increases and vice versa. It is expressed as,

4 1

1 ^N

i i

N x

(4.4)

Crest factor (P): The crest factor is defined as the ratio of peak (maximum) value to the RMS

of the signal. It is a non-dimensional feature that measures the spikiness of the signals. Also, it reflects the impulse corresponding to continuous signal at a lower level. It increases if peaks appear in the time-domain signal. It is given by,

max x_i

P RMS ^(4.5)

Where, the RMS value of the signal is the normalized second statistical moment of it. It is the square root of the arithmetic mean of the squares of the data value and is defined by,

2 1

1 ^N

i i

RMS x

N (4.6)

Entropy (S): The entropy is a generic measure of system disorganization and is mathematically presented as,

10 1

( ) log ( )

j j

S p x p x



 



(4.7)

where, x = {x1, x2, …, xN} is a set of random phenomena, p(xj) is the probability of random phenomenon xj. Aperiodic signals have highest entropy and periodic ones have significantly lower values of it. This characteristic of entropy in time domain can thus be used to distinguish normal operating condition of the pump to the faulty condition. On the whole, six statistical features are extracted from each of the three orthogonal directions of vibration data acquired from two accelerometers. Figure 4.1 shows six features plotted for the HP and C0 conditions at 65 Hz.

0 50 100 150 200 250 300 350

-0.05 0 0.05

Data set number

X-Axis

0 50 100 150 200 250 300 350

-0.05 0 0.05

Data set number Mean (B0) Y-Axis

0 50 100 150 200 250 300 350

-0.05 0 0.05

Data set number

Z-Axis

0 50 100 150 200 250 300 350

-0.05 0 0.05

Data set number

X-Axis

0 50 100 150 200 250 300 350

-0.05 0 0.05

Data set number Mean (C0) Y-Axis

50 100 150 200 250 300

-0.05 0 0.05

Data set number

Z-Axis

(b) Variation of standard deviation with dataset for (left) HP and (right) C0

(d) Variation of kurtosis with dataset for (left) HP and (right) C0

0 50 100 150 200 250 300 350

0 1 2 3

X-axis

Data set number

0 50 100 150 200 250 300 350

0 1 2 3

Std. deviation (B0) Y-axis

Data set number

0 50 100 150 200 250 300 350

0 1 2 3

Z-axis

Data set number

0 50 100 150 200 250 300 350

0 1 2 3

X-axis

Data set number

0 50 100 150 200 250 300 350

0 1 2 3

Std. deviation (C0) Y-axis

Data set number

0 50 100 150 200 250 300 350

0 1 2 3

Z-axis

Data set number

0 50 100 150 200 250 300 350

-2 -1 0 1 2

X-axis

Data set number

0 50 100 150 200 250 300 350

-2 -1 0 1 2

Skewness (B0) Y-axis

Data set number

0 50 100 150 200 250 300 350

-2 -1 0 1 2

Z-axis

Data set number

0 50 100 150 200 250 300 350

-2 -1 0 1 2

X-axis

Data set number

0 50 100 150 200 250 300 350

-2 -1 0 1 2

Skewness (C0) Y-axis

Data set number

0 50 100 150 200 250 300 350

-2 -1 0 1 2

Z-axis

Data set number

0 50 100 150 200 250 300 350

0 2 4 6

X-axis

Data set number

0 50 100 150 200 250 300 350

0 2 4 6

Kurtosis (B0) Y-axis

Data set number

0 50 100 150 200 250 300 350

0 2 4 6

Z-axis

Data set number

0 50 100 150 200 250 300 350

0 2 4 6

X-axis

Data set number

0 50 100 150 200 250 300 350

0 2 4 6

Kurtosis (C0) Y-axis

Data set number

0 50 100 150 200 250 300 350

0 2 4 6

Z-axis

Data set number

(e) Variation of crest factor with dataset for (left) HP and (right) C0

(f) Variation of entropy with dataset for (left) HP and (right) C0

Figure 4.1: Time domain features with dataset number for (a) mean (b) standard deviation (c) skewness (d) kurtosis (e) crest factor and (f) entropy

From Figure 4.1 it can be inferred that the range in which some of the statistical features vary with the fault condition of the CP is significant (for example standard deviation). Therefore these features may be useful in identifying CP faults when fed to the machine learning algorithm. After extracting the features, the best one or their best combination needs to be systematically selected. Therefore, for selecting the best features, a binary (SB1 versus HP at 40 Hz) and a multi-class classification are performed (classification of SB1, SB2, SB3, SB4,

0 50 100 150 200 250 300 350

2 4 6

Data set number

X-Axis

0 50 100 150 200 250 300 350

2 4 6

Data set number Crest Factor (B0) Y-Axis

0 50 100 150 200 250 300 350

2 4 6

Data set number

Z-Axis

0 50 100 150 200 250 300 350

2 4 6

Data set number

X-Axis

0 50 100 150 200 250 300 350

2 3 4 5

Data set number Crest Factor (C0) Y-Axis

0 50 100 150 200 250 300 350

2 4 6

Data set number

Z-Axis

0 50 100 150 200 250 300 350

2 3 4 5

Data set number

X-Axis

0 50 100 150 200 250 300 350

2 3 4 5

Data set number Entropy (B0) Y-Axis

0 50 100 150 200 250 300 350

2 3 4 5

Data set number

Z-Axis

0 50 100 150 200 250 300 350

2 3 4 5

Data set number

X-Axis

0 50 100 150 200 250 300 350

2 2.5 3 3.5

Data set number Entropy (C0) Y-Axis

0 50 100 150 200 250 300 350

2 3 4 5

Data set number

Z-Axis

and HP at 40 Hz). Classification accuracy is defined to quantify the performance of the classifier. It is the ratio of the number of feature points accurately predicted by the classifier for a specific condition to the total number of feature points tested for that condition. For example, in the binary classification of HP and SB1, the classification accuracy is given as,

No. of accurately predicted test feature points of HP

Classification accuracy of HP = 100

Total no. of test feature points of HP  (4.8)

No. of accurately predicted test feature points of SB1

Classification accuracy of SB1 = 100

Total no. of test feature points of SB1  (4.9) From the analyses tabulated in Table 4.1, it is observed that standard deviation feature outperforms all the other extracted features for multi-class classifications. Therefore, hereafter standard deviation would be utilized for all the subsequent classification analyses in this chapter. Initially, the binary fault classification is performed; where each blockage state would be compared with the HP condition. Later on, a multi-class classification involving all the blockage severities at a time is considered.

Table 4.1: Multiclass fault classification at 40 Hz for feature finalization Feature Classification accuracy

γ C

HP SB1 SB2 SB3 SB4 Average

Mean 50 50 50 50 100 60 0.5 100

Standard deviation 47.14 58.57 65.71 82.86 100 70.85 0.2 1000 Skewness 18.57 34.29 8.57 31.43 35.71 25.71 50 1

Kurtosis 31.43 22.86 14.29 14.29 55.71 27.71 0.05 1 Crest factor 34.29 10.00 30.00 5.71 51.43 26.28 1 1

Entropy 27.14 34.29 35.71 50 90 47.42 0.1 1000 Standard deviation

and entropy 42.86 52.86 70.00 75.71 100.00 68.28 0.005 10000

Dalam dokumen Diagnosis of multiple independent and coexisting mechanical and hydraulic faults in centrifugal pumps using support vector machine based algorithms (Halaman 160-167)