• Tidak ada hasil yang ditemukan

CHAPTER 4 MULTIVARIATE TIME SEIRES DISCRETIZATION

4.2 L ABEL DEFINITION

Finally, by interpreting the signals’ behaviors in both fast and slow time scales, we will express the system’s state for a DSV or a sub-matrix of DSVs as an event code, denoted by π‘’π‘Ÿ (π‘Ÿ = 1,2, … , π‘‘β„Žπ‘’ π‘›π‘’π‘šπ‘’π‘π‘’π‘Ÿ π‘œπ‘“ 𝑑𝑖𝑠𝑑𝑛𝑐𝑑 𝑒𝑣𝑒𝑛𝑑 π‘π‘œπ‘‘π‘’). In the next sections, we give a more detailed description of the proposed discretization procedure.

Function (PDF) of each sensor signal by (i) selecting the most appropriate distribution model from a set of pre-defined PDF models, so-called β€œPDF library” and (ii) optimizing its distribution parameters by maximum likelihood estimation (I. J. Myung, 2003).

In other words, given a PDF library containing β€˜z’ number of PDF models, 𝑷𝑫𝑭 = {𝑃𝐷𝐹1, 𝑃𝐷𝐹2, … , 𝑃𝐷𝐹𝑝, … , 𝑃𝐷𝐹𝑧} and measured sensor data, we first compute the optimum likelihood values between every sensor data and all 𝑃𝐷𝐹𝑝, such that optimum likelihood values for all 𝑃𝐷𝐹𝑝 are estimated. In this paper, 17 continuous PDF are included in the PDF library: Beta, Birnbaum- Saunders, Exponential, Extreme Value, Gamma, Generalized Extreme Value, Generalized Pareto, Inverse Gaussian, Logistic, Log-Logistic, Lognormal, Nakagami, Normal, Rayleigh, T-Location-Scale, And Weibull. We then select the best PDF, π‘ƒπ·πΉπ‘œπ‘π‘‘that has the maximum likelihood value for representing measured sensor data (line 2 in Algorithm 4.1) (A. Fischer & C. Igel, 2012)

4.2.2 Step A2: Cut-point determination

To transform a time series 𝑋𝑖 = (π‘₯𝑖1, π‘₯𝑖2, … , π‘₯𝑖𝑛) into the discretized time series 𝐷(𝑋𝑖) that is expressed as a series of labels, it is necessary to first define a set of contiguous labels π‘³π’Š = (𝑙𝑖1, 𝑙𝑖2, … , 𝑙𝑖𝐿) for all sensor data (i = 1, 2, … , m), where the number of labels L must be carefully pre-defined. The prerequisite for defining the labels that will symbolize system’s states is to determine the cut-points for classification (S. RamΓ­rez-Gallego et al., 2016).

Corresponding discretization parameters are the number of bins β€˜b’ and bin width threshold

β€˜bw’ that are closely correlated with the number of labels β€˜L’. It is worthy to note that the number of bins and the alphabet size for the SAX approach, in the J. Lin et al. (2007)’s research, play a similar role in determining how many bins are required to classify sensor values. Bin width threshold implies the probability that a sensor value falls in the center bin that includes the centroid of a distribution, so that if the number of bins is odd, then the remaining bins have the same probability, (1 βˆ’ 𝑏𝑀)/(𝑏 βˆ’ 1), where b > 1. For this reason, in this study, we use odd bin numbers (e.g., 3, 5, or 7) to include the centroid of a distribution in the center bin. For example, if b and bw are set to 3 and 33.3%

respectively, it is equivalent to equal frequency binning (H. Liu et al., 2002). Figure 4.2 shows an example of cut-points determination for partitioning the original sensor data where b and bw are set to 3 and 80% respectively.

In practice, bin width threshold must be controlled as a very critical discretization parameter and it is usually recommended to be larger than 100/𝑏 (%) in order to ensure the discrimination of

abnormal sensor values from normal ones. For the sake of simplicity, we use the same parameter values, b, bw and L for all sensors. Once these values are specified, a set of cut-points for the ith sensor, 𝐢𝑃𝑖 can be determined and denoted as 𝐢𝑃𝑖 = [𝑐𝑝𝑖1 𝑐𝑝𝑖2… 𝑐𝑝𝑖(π‘βˆ’1)]𝑇.

The performance of cut-points determination using the estimated PDF was examined in automotive gasoline engine knocking detection, compared to conventional data discretization, such as equal frequency binning and entropy discretization (S. Baek, Y. J. Lee, & D.-Y. Kim, 2012). Equal frequency binning is unsupervised data discretization which divides the bins so that all bins contain the same number of measurement, whereas entropy discretization partition the bins so that the entropy of every bin is minimized. The proposed one requires more expensive computation costs than equal frequency binning, but similar to entropy binning. In the case of detection results, the performance is better than other two methods regardless the parameter setting, even it does not use any class information obtained from the supervised problem.

Figure 4.2 An example of distribution model estimation of sensor data and the original sensor data over time: the PDFs of three sensor signals (crank position, manifold absolute pressure, and throttle position) are estimated as generalized extreme value, t-location-scale, and normal distribution respectively, where b =3 bins and bw = 80% are given

4.2.3 Step A3: Consideration of the linear trend in the time segment

In general, a representative quantity of sensor data (e.g., a mean) in a bin, or its coded value can be used as a label for the bin. In order to further take account into the linear trend of sensor data, we refer to the slope of the regression line in a time segment. In other words, a label for a bin can be further detailed into three sub-labels depending on the sign of the slope, so that we can constitute a more detailed set of labels considering the linear trend in a time segment.

Sub-labels can be obtained with respect to whether the sign of the slope is positive/negative and statistically significant (e.g., significance level of 0.05). We use a Boolean, linearT, whether to include the linear trend for label definition, which is also considered to be a discretization parameter.

If the more trend information is necessary to convert the original time series into discretized one, then this step can be modified to compute 1st and 2nd slope of the 2nd order regression line or more.

4.2.4 Step A4: Generation of a set of labels

The number of labels L is calculated according to a set of cut-points and linear trend consideration. If we do not include the linear trend for label definition (i.e., linearT = false), then L is equal to the number of bins b. For example, suppose that b is 3, linearT = true, and the corresponding cut-points are computed as 𝐢𝑃𝑖= [𝑐𝑝𝑖1 𝑐𝑝𝑖2]𝑇, then a set of nine labels for the ith sensor 𝐿𝑖 can be generated as follows.

𝐿𝑖=

{

𝑙𝑖9 𝑖𝑓 π‘‹Μ…π‘–π‘˜ > 𝑐𝑝𝑖2 π‘Žπ‘›π‘‘ π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’ π‘ π‘™π‘œπ‘π‘’ 𝑙𝑖8 𝑖𝑓 π‘‹Μ…π‘–π‘˜ > 𝑐𝑝𝑖2 π‘Žπ‘›π‘‘ π‘§π‘’π‘Ÿπ‘œ π‘œπ‘Ÿ 𝑒𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 π‘ π‘™π‘œπ‘π‘’ 𝑙𝑖7 𝑖𝑓 π‘‹Μ…π‘–π‘˜ > 𝑐𝑝𝑖2 π‘Žπ‘›π‘‘ π‘›π‘’π‘”π‘Žπ‘‘π‘–π‘£π‘’ π‘ π‘™π‘œπ‘π‘’ 𝑙𝑖6 𝑖𝑓 𝑐𝑝𝑖1< π‘‹Μ…π‘–π‘˜ ≀ 𝑐𝑝𝑖2 π‘Žπ‘›π‘‘ π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’ π‘ π‘™π‘œπ‘π‘’ 𝑙𝑖5 𝑖𝑓 𝑐𝑝𝑖1 < π‘‹Μ…π‘–π‘˜ ≀ 𝑐𝑝𝑖2 π‘Žπ‘›π‘‘ π‘§π‘’π‘Ÿπ‘œ π‘œπ‘Ÿ 𝑒𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 π‘ π‘™π‘œπ‘π‘’

𝑙𝑖4 𝑖𝑓 𝑐𝑝𝑖1< π‘‹Μ…π‘–π‘˜ ≀ 𝑐𝑝𝑖2 π‘Žπ‘›π‘‘ π‘›π‘’π‘”π‘Žπ‘‘π‘–π‘£π‘’ π‘ π‘™π‘œπ‘π‘’ 𝑙𝑖3 𝑖𝑓 π‘‹Μ…π‘–π‘˜ ≀ 𝑐𝑝𝑖1 π‘Žπ‘›π‘‘ π‘π‘œπ‘ π‘–π‘‘π‘–π‘£π‘’ π‘ π‘™π‘œπ‘π‘’ 𝑙𝑖2 𝑖𝑓 π‘‹Μ…π‘–π‘˜ ≀ 𝑐𝑝𝑖1 π‘Žπ‘›π‘‘ π‘§π‘’π‘Ÿπ‘œ π‘œπ‘Ÿ 𝑒𝑛𝑑𝑒𝑓𝑖𝑛𝑒𝑑 π‘ π‘™π‘œπ‘π‘’ 𝑙𝑖1 𝑖𝑓 π‘‹Μ…π‘–π‘˜ ≀ 𝑐𝑝𝑖1 π‘Žπ‘›π‘‘ π‘›π‘’π‘”π‘Žπ‘‘π‘–π‘£π‘’ π‘ π‘™π‘œπ‘π‘’

where π‘‹Μ…π‘–π‘˜ is the mean value of ith sensor data in a kth time segment 𝑋𝑖(𝑀×(π‘˜βˆ’1)+1:π‘€Γ—π‘˜), given w as the length of time segment.

One the other hand, suppose that b is 3, linearT = false and the corresponding cut-points are identically computed as 𝐢𝑃𝑖 = [𝑐𝑝𝑖1 𝑐𝑝𝑖2]𝑇, then a set of three labels for the ith sensor 𝐿𝑖 can be generated as being identically same the set of cut-points, as the following equation.

𝐿𝑖 = {

𝑙𝑖3 𝑖𝑓 π‘‹Μ…π‘–π‘˜ > 𝑐𝑝i2 𝑙𝑖2 𝑖𝑓 𝑐𝑝𝑖1< π‘‹Μ…π‘–π‘˜ ≀ 𝑐𝑝i2 𝑙𝑖1 𝑖𝑓 𝑐𝑝𝑖1< π‘‹Μ…π‘–π‘˜

We use contiguous integers (i.e., coded value of the representative quantity of sensor data in each bin) for label definition in order to adopt a Euclidean distance-based similarity measure between discretized state vectors.