L ABEL DEFINITION - MULTIVARIATE TIME SEIRES DISCRETIZATION

CHAPTER 4 MULTIVARIATE TIME SEIRES DISCRETIZATION

4.2 L ABEL DEFINITION

Finally, by interpreting the signals’ behaviors in both fast and slow time scales, we will express the system’s state for a DSV or a sub-matrix of DSVs as an event code, denoted by 𝑒_𝑟 (𝑟 = 1,2, … , 𝑡ℎ𝑒 𝑛𝑢𝑚𝑒𝑏𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑡𝑛𝑐𝑡 𝑒𝑣𝑒𝑛𝑡 𝑐𝑜𝑑𝑒). In the next sections, we give a more detailed description of the proposed discretization procedure.

Function (PDF) of each sensor signal by (i) selecting the most appropriate distribution model from a set of pre-defined PDF models, so-called “PDF library” and (ii) optimizing its distribution parameters by maximum likelihood estimation (I. J. Myung, 2003).

In other words, given a PDF library containing ‘z’ number of PDF models, 𝑷𝑫𝑭 = {𝑃𝐷𝐹₁, 𝑃𝐷𝐹₂, … , 𝑃𝐷𝐹_𝑝, … , 𝑃𝐷𝐹_𝑧} and measured sensor data, we first compute the optimum likelihood values between every sensor data and all 𝑃𝐷𝐹_𝑝, such that optimum likelihood values for all 𝑃𝐷𝐹_𝑝 are estimated. In this paper, 17 continuous PDF are included in the PDF library: Beta, Birnbaum- Saunders, Exponential, Extreme Value, Gamma, Generalized Extreme Value, Generalized Pareto, Inverse Gaussian, Logistic, Log-Logistic, Lognormal, Nakagami, Normal, Rayleigh, T-Location-Scale, And Weibull. We then select the best PDF, 𝑃𝐷𝐹_𝑜𝑝𝑡that has the maximum likelihood value for representing measured sensor data (line 2 in Algorithm 4.1) (A. Fischer & C. Igel, 2012)

4.2.2 Step A2: Cut-point determination

To transform a time series 𝑋_𝑖 = (𝑥_𝑖1, 𝑥_𝑖2, … , 𝑥_𝑖𝑛) into the discretized time series 𝐷(𝑋_𝑖) that is expressed as a series of labels, it is necessary to first define a set of contiguous labels 𝑳_𝒊 = (𝑙_𝑖1, 𝑙_𝑖2, … , 𝑙_𝑖𝐿) for all sensor data (i = 1, 2, … , m), where the number of labels L must be carefully pre-defined. The prerequisite for defining the labels that will symbolize system’s states is to determine the cut-points for classification (S. Ramírez-Gallego et al., 2016).

Corresponding discretization parameters are the number of bins ‘b’ and bin width threshold

‘bw’ that are closely correlated with the number of labels ‘L’. It is worthy to note that the number of bins and the alphabet size for the SAX approach, in the J. Lin et al. (2007)’s research, play a similar role in determining how many bins are required to classify sensor values. Bin width threshold implies the probability that a sensor value falls in the center bin that includes the centroid of a distribution, so that if the number of bins is odd, then the remaining bins have the same probability, (1 − 𝑏𝑤)/(𝑏 − 1), where b > 1. For this reason, in this study, we use odd bin numbers (e.g., 3, 5, or 7) to include the centroid of a distribution in the center bin. For example, if b and bw are set to 3 and 33.3%

respectively, it is equivalent to equal frequency binning (H. Liu et al., 2002). Figure 4.2 shows an example of cut-points determination for partitioning the original sensor data where b and bw are set to 3 and 80% respectively.

In practice, bin width threshold must be controlled as a very critical discretization parameter and it is usually recommended to be larger than 100/𝑏 (%) in order to ensure the discrimination of

abnormal sensor values from normal ones. For the sake of simplicity, we use the same parameter values, b, bw and L for all sensors. Once these values are specified, a set of cut-points for the ith sensor, 𝐶𝑃_𝑖 can be determined and denoted as 𝐶𝑃_𝑖 = [𝑐𝑝_𝑖1 𝑐𝑝_𝑖2… 𝑐𝑝_{𝑖(𝑏−1)}]^𝑇.

The performance of cut-points determination using the estimated PDF was examined in automotive gasoline engine knocking detection, compared to conventional data discretization, such as equal frequency binning and entropy discretization (S. Baek, Y. J. Lee, & D.-Y. Kim, 2012). Equal frequency binning is unsupervised data discretization which divides the bins so that all bins contain the same number of measurement, whereas entropy discretization partition the bins so that the entropy of every bin is minimized. The proposed one requires more expensive computation costs than equal frequency binning, but similar to entropy binning. In the case of detection results, the performance is better than other two methods regardless the parameter setting, even it does not use any class information obtained from the supervised problem.

Figure 4.2 An example of distribution model estimation of sensor data and the original sensor data over time: the PDFs of three sensor signals (crank position, manifold absolute pressure, and throttle position) are estimated as generalized extreme value, t-location-scale, and normal distribution respectively, where b =3 bins and bw = 80% are given

4.2.3 Step A3: Consideration of the linear trend in the time segment

In general, a representative quantity of sensor data (e.g., a mean) in a bin, or its coded value can be used as a label for the bin. In order to further take account into the linear trend of sensor data, we refer to the slope of the regression line in a time segment. In other words, a label for a bin can be further detailed into three sub-labels depending on the sign of the slope, so that we can constitute a more detailed set of labels considering the linear trend in a time segment.

Sub-labels can be obtained with respect to whether the sign of the slope is positive/negative and statistically significant (e.g., significance level of 0.05). We use a Boolean, linearT, whether to include the linear trend for label definition, which is also considered to be a discretization parameter.

If the more trend information is necessary to convert the original time series into discretized one, then this step can be modified to compute 1^st and 2^nd slope of the 2^nd order regression line or more.

4.2.4 Step A4: Generation of a set of labels

The number of labels L is calculated according to a set of cut-points and linear trend consideration. If we do not include the linear trend for label definition (i.e., linearT = false), then L is equal to the number of bins b. For example, suppose that b is 3, linearT = true, and the corresponding cut-points are computed as 𝐶𝑃_𝑖= [𝑐𝑝_𝑖1 𝑐𝑝_𝑖2]^𝑇, then a set of nine labels for the i^th sensor 𝐿_𝑖 can be generated as follows.

𝐿_𝑖=

{

𝑙_𝑖9 𝑖𝑓 𝑋̅_𝑖𝑘 > 𝑐𝑝_𝑖2 𝑎𝑛𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒 𝑙_𝑖8 𝑖𝑓 𝑋̅_𝑖𝑘 > 𝑐𝑝_𝑖2 𝑎𝑛𝑑 𝑧𝑒𝑟𝑜 𝑜𝑟 𝑢𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑠𝑙𝑜𝑝𝑒 𝑙_𝑖7 𝑖𝑓 𝑋̅_𝑖𝑘 > 𝑐𝑝_𝑖2 𝑎𝑛𝑑 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒 𝑙_𝑖6 𝑖𝑓 𝑐𝑝_𝑖1< 𝑋̅_𝑖𝑘 ≤ 𝑐𝑝_𝑖2 𝑎𝑛𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒 𝑙_𝑖5 𝑖𝑓 𝑐𝑝_𝑖1 < 𝑋̅_𝑖𝑘 ≤ 𝑐𝑝_𝑖2 𝑎𝑛𝑑 𝑧𝑒𝑟𝑜 𝑜𝑟 𝑢𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑠𝑙𝑜𝑝𝑒

𝑙_𝑖4 𝑖𝑓 𝑐𝑝_𝑖1< 𝑋̅_𝑖𝑘 ≤ 𝑐𝑝_𝑖2 𝑎𝑛𝑑 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒 𝑙_𝑖3 𝑖𝑓 𝑋̅_𝑖𝑘 ≤ 𝑐𝑝_𝑖1 𝑎𝑛𝑑 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒 𝑙_𝑖2 𝑖𝑓 𝑋̅_𝑖𝑘 ≤ 𝑐𝑝_𝑖1 𝑎𝑛𝑑 𝑧𝑒𝑟𝑜 𝑜𝑟 𝑢𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 𝑠𝑙𝑜𝑝𝑒 𝑙_𝑖1 𝑖𝑓 𝑋̅_𝑖𝑘 ≤ 𝑐𝑝_𝑖1 𝑎𝑛𝑑 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒

where 𝑋̅_𝑖𝑘 is the mean value of ith sensor data in a k^th time segment 𝑋𝑖(𝑤×(𝑘−1)+1:𝑤×𝑘), given w as the length of time segment.

One the other hand, suppose that b is 3, linearT = false and the corresponding cut-points are identically computed as 𝐶𝑃_𝑖 = [𝑐𝑝_𝑖1 𝑐𝑝_𝑖2]^𝑇, then a set of three labels for the i^th sensor 𝐿_𝑖 can be generated as being identically same the set of cut-points, as the following equation.

𝐿_𝑖 = {

𝑙_𝑖3 𝑖𝑓 𝑋̅_𝑖𝑘 > 𝑐𝑝_i2 𝑙_𝑖2 𝑖𝑓 𝑐𝑝_𝑖1< 𝑋̅_𝑖𝑘 ≤ 𝑐𝑝_i2 𝑙_𝑖1 𝑖𝑓 𝑐𝑝_𝑖1< 𝑋̅_𝑖𝑘

We use contiguous integers (i.e., coded value of the representative quantity of sensor data in each bin) for label definition in order to adopt a Euclidean distance-based similarity measure between discretized state vectors.

Dalam dokumen FAULT DETECTION AND PREDICTION IN ELECTROMECHANICAL SYSTEMS VIA THE (Halaman 129-133)