Out-of-distribution Data in Learning-enabled Cyber-physical Systems

enabled CPS. Typical OOD detection techniques may result in a large number of false alarms due to the dynamical nature of CPS. In Chapter 3, a method that aims to improve the robustness of detection by using multiple examples sampled from a Variational AutoEncoder (VAE) model is proposed. The method is based on Inductive Conformal Anomaly Detection (ICAD) [63], and it is efficient so it can be used online. The chapter follows a similar approach but utilizes VAE for classification and regression models. The main benefit of such models is that they take into account both the input and output of the LEC, which enables the detection of different types of OOD data present in CPS.

Another contribution of the chapter is the comprehensive evaluation using several datasets for classification and regression tasks. We design experiments for various types of OOD data and use the same model for OOD detection. The experimental results show the proposed approach can detect different types of OOD data with a very small number of false alarms. The execution time is comparable with the sampling period of the typical CPSs, which enables real-time detection.

The outline of this chapter is as follows. Section 4.2 formulates the problem of detection of OOD data and discusses different types of OOD data in learning-enabled CPSs. Section 4.3 introduces the VAE for classification and regression model and presents the detection algorithm based on this model. Section 4.4 shows the evaluation results, and Section 4.5 concludes the chapter.

4.2 Out-of-distribution Data in Learning-enabled Cyber-physical Systems

4.2.1.2 OOD data caused by label shift

Label shift describes the case where the distribution over output variable P(y) changes but the output- conditional distribution, P(x|y)remains unchanged. OOD data caused by label shift can be defined as the data where the output variableyis not sampled from the training distributionPtrain(y)whereas the conditional probability ofxgivenyremains the same, i.e.Ptrain(x|y) =Ptest(x|y).

4.2.1.3 OOD data caused by concept shift

A concept shift is simply a contextual shift where the underlying relationship between input and output changes while the distribution over the input is preserved [112]. Using the definition, for the OOD data caused by concept shift, we assume that the input variable xis from the same distribution as the training datasetPtrain(x)while the relationship between input and output changes, i.e.Ptrain(y|x)̸=Ptest(y|x).

4.2.1.4 OOD data caused by label concept shift

For the case of label concept shift, the distribution over the output stays the same, but the conditional probability ofxgivenychanges. OOD data caused by label concept shift can be defined as the data where the output variableyis sampled from the same distribution as the training datasetPtrain(y), whereas the conditional probability ofxgivenychanges such thatPtrain(x|y)̸=Ptest(x|y).

4.2.2 Examples of OOD Data 4.2.2.1 Classification

Consider the well-known digit recognition problem for the MNIST dataset [113]. The classification model is trained on the MNIST dataset, which only contains black and white handwritten digits. However, if a colorful handwritten digit or a handwritten digit with a different background is used as a test input, a classification model is very likely to make erroneous predictions. In this case, the test images are not from the same distribution as the training dataset. However, the classification results should be independent of the color or the background of the digits, and therefore the underlying relationshipP(y|x)should not change. Such test examples can be defined as OOD data caused by covariate shift. Further, the classification model can be influenced by OOD data caused by label shift, for example, when the probability distribution for the digit classPtrain(y)is not uniform or some classes of digits are not present in the training dataset.

OOD data caused by label concept shift arise in fault diagnosis and identification, where a classification model is used to predict the type of fault based on sensor measurements. For example, consider the fault diagnosis model for a gearbox [114] which aims at classifying the type of damage that may occur. Typically, the model is trained using data obtained under specific load conditions and tested under similar conditions re-

sulting in satisfying accuracy. However, if the model is tested under a higher load condition, the performance will be degraded. In this case, although damage types in the test examples are still the same, the underlying relationshipP(x|y)changes due to additional load.

4.2.2.2 Regression

Covariate shifts occur in perception LECs used in autonomous vehicles. Consider, for example, an Advanced Emergency Braking System (AEBS) for an automobile that is designed to detect obstacles in Chapter 3. In this case, the perception LEC performs regression, and its performance can be degraded in the case of OOD data caused by covariate shift which arises when the environmental conditions for the test data are different from conditions considered during training. Such components may also be susceptible to OOD data caused by label shift. Similar to the classification problem, it is typically assumed that the probability distribution of the output, e.g., distance to the obstacle, Ptrain(y) is uniform. However, in real-life situations, unique traffic patterns may impose a distribution,Ptest(y)that does not match this assumption. Further, it is usually assumed that the vehicle types and conform to typical specifications (e.g., size and shape). However, such specifications may change, for example, in response to autonomous vehicle technologies and the regression model may not be able to correctly estimate the distance to a vehicle of type or size not used during training.

In this case, the conditional probabilityP(x|y)changes, and such data can be regarded as OOD data caused by label concept shift. Additional examples and datasets relating to applications in industrial informatics are evaluated in Section 4.4.

4.2.3 Problem Formulation

Consider an LECf:X → Ythat is well trained to perform classification or regression using a training dataset D^train={(xi,yi)}^li=1, where each example pair (xi,yi) contains the inputxi∈ X and corresponding label yi∈ Y. During the system operation, the LEC receives a sequence of inputs{x^′₁, . . . ,x^′_t, . . .}one by one and predicts the targets{y^′₁, . . . ,y^′_t, . . .}. Such models are deployed with the assumption that the training and test examples are drawn from the same distribution. However, when the test data pair(x^′_t,y_t^′)is not sampled from the same distribution as the training dataset, the LEC f can become ineffective, make erroneous predictions, and undermine the safety of the system. Therefore, it is crucial to compute a measure quantifying the degree to which OOD data are present in the input sequence. OOD detection should consider all the various types of OOD data that may be present. Further, online detection algorithms must be robust with a small number of false alarms and computationally efficient so they can be executed in real time.

4.3 Detection of Out-of-distribution Data in Learning-enabled Cyber-physical Systems

Dalam dokumen Feiyang Cai's Dissertation (Halaman 61-64)