Pimentel - Novelty Detection - 2014.pdf

Introduction

Novelty detection as one-class classification

Overview of reviews on novelty detection

Methods of novelty detection

Organisation of the survey

Probabilistic novelty detection

Parametric approaches

Mixture models
State-space models

The two algorithms developed by the authors gradually 'discount' the effect of past examples in the online process; that is, recent examples are weighted more heavily in the update algorithm than older examples. Also based on a dynamic model of normal time series data, the Multidimensional Probability Evolution method [73,74] characterizes normal data using a nonlinear state space model; that is, the pdf within a multidimensional state space is calculated for each window of the time-varying signal.

Non-parametric approaches

Kernel density estimators
Negative selection

The authors propose an online approximation of the data distribution by considering the values in a sliding window. The variance of the kernel for the values in the sliding window is calculated using a histogram along the time axis. In [90] the authors use an average of the prior with a smaller value than the positive class labels (eg y¼1), such as a zero average.

Kemmler et al. investigate a heuristic measure: the predictive mean divided by the standard deviation, which was proposed as a combined measure for describing the uncertainty of the estimate. The nature of the negative selection algorithm is inspired by the properties of the immune system.

Method evaluation

The job of the immune system is to distinguish between antigens and the body itself, a process known as self/non-self discrimination, which is achieved when an antigen is "recognized" by a specific antibody called a T-cell receptor. The detector (antibody) defines the hypersphere; i.e. An n-dimensional vector corresponding to the center of the sphere and a scalar value. The matching rule is expressed by a membership function, which is a function of the detector-antigen Euclidean distance and the detector radius.

The algorithm attempts to develop a set of points (antibodies or detectors) covering non-self space using an iterative process that updates the position of the detector. Taylor and Corne [102] demonstrate the feasibility of the above approaches in fault detection in refrigeration systems.

Distance-based novelty detection

Nearest neighbour-based approaches

This preprocessing step enabled faster determination of the nearest nearest neighbors compared to ORCA. The number of attribute-value pairs in common indicates the strength of the associated link between these two points. The LOF of a point is based on the ratios of the local density of the area around the point and the local densities of its neighbors.

The size of a point's neighborhood is determined by the area containing a user-supplied minimum number of points. Several other variants of the LOF method have been proposed to handle different data types[199] and applied to detect spatial outliers or anomalies in climate data protein sequences [201], network intrusion and video sensor data[169].

Clustering-based approaches

This use of similarity graphs overcomes the disadvantage of the traditional similarity measure (which assumes that outliers are far away from the “normal” points in the data space) and can be easily applicable to both categorical and numerical data. Using a shared definition of nearest neighbor similarity alleviates problems with varying densities and high dimensionality, and using keypoints solves problems with the shape and size of the distribution. Another novel aspect of the shared nearest neighbor clustering algorithm is that the resulting clusters do not contain all points, but only those points that lie in regions of relatively uniform density.

The distances to the centroid for all points in the same scene are calculated and a threshold is determined based on the mean and standard deviation of the distances. Finally, in the third step, the local novelty scores are calculated using the set of local model results from the previous step.

Method evaluation

Their approach is based on the principle of neighborhood preservation: when the system is operating normally, the neighborhood graph for each sensor is virtually invariant with respect to fluctuations arising from experimental conditions. Using this idea of a stochastic neighborhood, the proposed method was able to calculate novelty scores for each sensor. The authors apply a graph-based method to recommend items that are not within the user's current interests, but are in adjacent areas of interest.

This ensures novelty and offers variety in the recommendations, which is viewed favorably by users.

Reconstruction-based novelty detection

Neural network-based approaches

The Euclidean distance between a test point and nodes in the SOM is evaluated and used to determine novelty. An important characteristic of these types of neural networks is that they are topology-preserving; that is, the network maintains neighborhood relationships among the data by mapping neighboring inputs to adjacent nodes on the map. The latter include the 'growing cell structures' model, the 'incremental grid growing' model, the 'growing neural gas', the 'growing SOM' and the 'evolving SOM'.

In this method, each node is associated with a subset of the input space, and the network is initialized with a small number of nodes randomly placed in the input space and not connected to each other. The proposed change aims to satisfy real-time temporal constraints when adapting the network.

Subspace-based approaches

A new node is added when the "activity" of the best matching node (the node that best matches the input) is not high enough. Shyu et al.[245]propose a PCA-based novelty detection approach, which can be viewed as a powerful estimator of the correlation matrix of normal data. A strategy to improve the convergence of the kernel algorithm for iterative kernel PCA is described in [238].

The proposed method was used for novelty detection and benefited from the robustness of the L1 norm for outliers. The network continuously integrates information regarding the distribution of training data and the dependence on their co-occurrence.

Method evaluation

The method is similar to multiple discriminant analysis in that it attempts to find a subspace that maximizes the difference between the average distance of the “normal” class and the average distance of the “abnormal” class. The effect of the reduced subspace on the classification was found to be better than that obtained from other dimensionality reduction methods (such as PCA and kernel PCA), for machine monitoring data. Similar to Kohonen's SOM, the CCA aims to reproduce the topology of the original data in a projection subspace, but without capturing the configuration of the topology.

Since the topology cannot be completely reproduced in the design subspace, which has a lower dimension than the original subspace, the local topology is favored over the global topology. It places a set of points from the original space into a randomly chosen subspace whose dimension is logarithmic with respect to the dimension of the original space, such that the pairwise distances between the points before and after the projection differ by only a small factor .

Domain-based novelty detection

Support vector data description approaches

Some extensions to the SVDD approach have recently been proposed to improve the margins of hypersphericity. 273], whose method aims to maximize (i) the margin between the surface of the hypersphere and abnormal data, and (ii) the margin between this surface and the normal data, while minimizing the volume of the hypersphere. However, the method is heuristic and no demonstration is given that the multi-hypersphere approach can provide a better description of the data.

The decision function in this case contains only one kernel term, and thus the decision boundary of the fast SVDD is only spherical in the original space. Consequently, the runtime complexity of the fast SVDD decision function is no longer linear in the support vectors, but is a constant, independent of the size of the training set.

One-class support vector machine approaches

Unlike traditional methods that try to compress the kernel expansion into one with fewer terms, the proposed fast SVDD directly finds the preimage of the feature vector and then uses a simple relationship between this feature vector and the center of the SVDD to update the center position. An efficient SVDD first finds the critical points using a fuzzy c-means kernel clustering technique [293] and then uses the images of these points to re-express the center of the SVDD. Novelty scores computed using a one-class SVM approach are obtained from each of the input time series, and different classifier combination strategies are also examined.

The authors propose a framework to overcome this problem, which involves exploring subspaces of the data, training a separate model for each subspace, and then concatenating the decision variables produced by the test data for each subspace. using fuzzy logic adders. The performance of the proposed method, however, was not compared with that of other current approaches.

Method evaluation

Clifton et al [280,281] investigate the use of one-class SVM using multivariate combustion data to predict combustion instability. Lee and Cho [288] compare the performance of a one-class SVM with that of an autoassociative neural network, and the results obtained by analyzing six benchmark data sets show that the former performs consistently better than the latter. One-class SVM has been used to detect novelty in: functional magnetic resonance imaging data [284]; sound recordings [291]; text data [292];.

282] illustrate the impact of high dimensionality on kernel methods and, specifically, on the one-class SVM. Numerical experiments showed that the proposed method consistently outperformed the one-class SVM in estimating minimum volume sets.

Information-theoretic novelty detection

Method evaluation

Information theoretic approaches to novelty detection typically make no assumptions about the underlying distribution of the data. They require a measure that is sensitive enough to detect the effects of new points in the dataset. The main disadvantage of these types of techniques is the choice of the information-theoretic measure.

The performance of such techniques is thus highly dependent on the choice of the information theoretic scale. Finally, it can be difficult to link a novelty score to a test item using an information theory-based method.

Application domains

Electronic IT security
Healthcare informatics/medical diagnostics and monitoring
Industrial monitoring and damage detection
Image processing/video surveillance
Text mining
Sensor networks

Novelty detection has generated much research in the field of electronic IT security systems, in which the goals include network intrusion detection and fraud detection. Healthcare informatics and medical diagnostics are an important application area of novelty detection approaches. Novelty detection has been extensively applied to recognize new objects in images and video streams.

Examples of novelty detection methods in the six main application domains covered in this review. Novelty detection is applied to sensor networks to capture sensor faults and/or malicious attacks on the network.

Conclusion

Chow, Parzen-window network intrusion detectors, in: Proceedings of the 16th International Conference on Pattern Recognition, vol. Pizzuti, Fast outlier detection in high dimensional spaces, in: Proceedings of the 6th European Conference on. Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), ACM, 2003, pp.

Chawla, On local spatial outliers, in: Proceedings of the 4th International IEEE Conference on Data Mining, IEEE, 2004, pp. Ballard, Novelty detection using the growing neural gas for visuospatial memory, in: Proceedings of the IEEE/. Cook, Graph-based anomaly detection, in: Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining (SIGKDD), ACM, 2003, pp.

Vemuri, Robust anomaly detection using support vector engines, in: Proceedings of the International Conference on Machine Learning, 2003, pp.