CHAPTER 4 MULTIVARIATE TIME SEIRES DISCRETIZATION
4.2 L ABEL DEFINITION
Finally, by interpreting the signalsβ behaviors in both fast and slow time scales, we will express the systemβs state for a DSV or a sub-matrix of DSVs as an event code, denoted by ππ (π = 1,2, β¦ , π‘βπ ππ’πππππ ππ πππ π‘πππ‘ ππ£πππ‘ ππππ). In the next sections, we give a more detailed description of the proposed discretization procedure.
Function (PDF) of each sensor signal by (i) selecting the most appropriate distribution model from a set of pre-defined PDF models, so-called βPDF libraryβ and (ii) optimizing its distribution parameters by maximum likelihood estimation (I. J. Myung, 2003).
In other words, given a PDF library containing βzβ number of PDF models, π·π«π = {ππ·πΉ1, ππ·πΉ2, β¦ , ππ·πΉπ, β¦ , ππ·πΉπ§} and measured sensor data, we first compute the optimum likelihood values between every sensor data and all ππ·πΉπ, such that optimum likelihood values for all ππ·πΉπ are estimated. In this paper, 17 continuous PDF are included in the PDF library: Beta, Birnbaum- Saunders, Exponential, Extreme Value, Gamma, Generalized Extreme Value, Generalized Pareto, Inverse Gaussian, Logistic, Log-Logistic, Lognormal, Nakagami, Normal, Rayleigh, T-Location-Scale, And Weibull. We then select the best PDF, ππ·πΉπππ‘that has the maximum likelihood value for representing measured sensor data (line 2 in Algorithm 4.1) (A. Fischer & C. Igel, 2012)
4.2.2 Step A2: Cut-point determination
To transform a time series ππ = (π₯π1, π₯π2, β¦ , π₯ππ) into the discretized time series π·(ππ) that is expressed as a series of labels, it is necessary to first define a set of contiguous labels π³π = (ππ1, ππ2, β¦ , πππΏ) for all sensor data (i = 1, 2, β¦ , m), where the number of labels L must be carefully pre-defined. The prerequisite for defining the labels that will symbolize systemβs states is to determine the cut-points for classification (S. RamΓrez-Gallego et al., 2016).
Corresponding discretization parameters are the number of bins βbβ and bin width threshold
βbwβ that are closely correlated with the number of labels βLβ. It is worthy to note that the number of bins and the alphabet size for the SAX approach, in the J. Lin et al. (2007)βs research, play a similar role in determining how many bins are required to classify sensor values. Bin width threshold implies the probability that a sensor value falls in the center bin that includes the centroid of a distribution, so that if the number of bins is odd, then the remaining bins have the same probability, (1 β ππ€)/(π β 1), where b > 1. For this reason, in this study, we use odd bin numbers (e.g., 3, 5, or 7) to include the centroid of a distribution in the center bin. For example, if b and bw are set to 3 and 33.3%
respectively, it is equivalent to equal frequency binning (H. Liu et al., 2002). Figure 4.2 shows an example of cut-points determination for partitioning the original sensor data where b and bw are set to 3 and 80% respectively.
In practice, bin width threshold must be controlled as a very critical discretization parameter and it is usually recommended to be larger than 100/π (%) in order to ensure the discrimination of
abnormal sensor values from normal ones. For the sake of simplicity, we use the same parameter values, b, bw and L for all sensors. Once these values are specified, a set of cut-points for the ith sensor, πΆππ can be determined and denoted as πΆππ = [πππ1 πππ2β¦ πππ(πβ1)]π.
The performance of cut-points determination using the estimated PDF was examined in automotive gasoline engine knocking detection, compared to conventional data discretization, such as equal frequency binning and entropy discretization (S. Baek, Y. J. Lee, & D.-Y. Kim, 2012). Equal frequency binning is unsupervised data discretization which divides the bins so that all bins contain the same number of measurement, whereas entropy discretization partition the bins so that the entropy of every bin is minimized. The proposed one requires more expensive computation costs than equal frequency binning, but similar to entropy binning. In the case of detection results, the performance is better than other two methods regardless the parameter setting, even it does not use any class information obtained from the supervised problem.
Figure 4.2 An example of distribution model estimation of sensor data and the original sensor data over time: the PDFs of three sensor signals (crank position, manifold absolute pressure, and throttle position) are estimated as generalized extreme value, t-location-scale, and normal distribution respectively, where b =3 bins and bw = 80% are given
4.2.3 Step A3: Consideration of the linear trend in the time segment
In general, a representative quantity of sensor data (e.g., a mean) in a bin, or its coded value can be used as a label for the bin. In order to further take account into the linear trend of sensor data, we refer to the slope of the regression line in a time segment. In other words, a label for a bin can be further detailed into three sub-labels depending on the sign of the slope, so that we can constitute a more detailed set of labels considering the linear trend in a time segment.
Sub-labels can be obtained with respect to whether the sign of the slope is positive/negative and statistically significant (e.g., significance level of 0.05). We use a Boolean, linearT, whether to include the linear trend for label definition, which is also considered to be a discretization parameter.
If the more trend information is necessary to convert the original time series into discretized one, then this step can be modified to compute 1st and 2nd slope of the 2nd order regression line or more.
4.2.4 Step A4: Generation of a set of labels
The number of labels L is calculated according to a set of cut-points and linear trend consideration. If we do not include the linear trend for label definition (i.e., linearT = false), then L is equal to the number of bins b. For example, suppose that b is 3, linearT = true, and the corresponding cut-points are computed as πΆππ= [πππ1 πππ2]π, then a set of nine labels for the ith sensor πΏπ can be generated as follows.
πΏπ=
{
ππ9 ππ πΜ ππ > πππ2 πππ πππ ππ‘ππ£π π ππππ ππ8 ππ πΜ ππ > πππ2 πππ π§πππ ππ π’ππππππππ π ππππ ππ7 ππ πΜ ππ > πππ2 πππ πππππ‘ππ£π π ππππ ππ6 ππ πππ1< πΜ ππ β€ πππ2 πππ πππ ππ‘ππ£π π ππππ ππ5 ππ πππ1 < πΜ ππ β€ πππ2 πππ π§πππ ππ π’ππππππππ π ππππ
ππ4 ππ πππ1< πΜ ππ β€ πππ2 πππ πππππ‘ππ£π π ππππ ππ3 ππ πΜ ππ β€ πππ1 πππ πππ ππ‘ππ£π π ππππ ππ2 ππ πΜ ππ β€ πππ1 πππ π§πππ ππ π’ππππππππ π ππππ ππ1 ππ πΜ ππ β€ πππ1 πππ πππππ‘ππ£π π ππππ
where πΜ ππ is the mean value of ith sensor data in a kth time segment ππ(π€Γ(πβ1)+1:π€Γπ), given w as the length of time segment.
One the other hand, suppose that b is 3, linearT = false and the corresponding cut-points are identically computed as πΆππ = [πππ1 πππ2]π, then a set of three labels for the ith sensor πΏπ can be generated as being identically same the set of cut-points, as the following equation.
πΏπ = {
ππ3 ππ πΜ ππ > ππi2 ππ2 ππ πππ1< πΜ ππ β€ ππi2 ππ1 ππ πππ1< πΜ ππ
We use contiguous integers (i.e., coded value of the representative quantity of sensor data in each bin) for label definition in order to adopt a Euclidean distance-based similarity measure between discretized state vectors.