Semi-Supervised Contrastive Learning for Anomaly Detection in Contaminated Data

Here we propose labeled anomaly-informed neural transforms (NTL(with LOE+)) where labeled anomaly information can be considered. The key idea is to modify the objective function of Latent Outlier Exposure (LOE), a joint loss function, so that NTL(with LOE) can use labeled anomaly. Therefore, the training dataset typically consists of large-scale unlabeled data and few labeled anomaly data.

We modify the common Latent Exit Exposure (LOE) objective function [5], which can account for polluted scenarios, so that the NTL (with LOE) can use the labeled anomaly. We propose Labeled Anomaly Informed Neural Transforms (NTL(with LOE+)), a semi-supervised contrast learning that can take labeled anomaly information into account, assigning a guaranteed label (y=1) to the labeled anomaly. We demonstrate the superiority of our proposed Anomaly Informed Labeled Neural Transforms (NTL(with LOE+)) for anomaly detection compared to a related existing method.

Section IV describes the Anomaly Informed Anomaly Neural Transforms (NTL(with LOE+)) for anomaly detection.

Figure 1: On-site inspection station in the LG Smart Park, Changwon, South Korea

Deep anomaly detection

Self-supervised anomaly detection

As part of this thesis, we learn effective representations using transformation methods suitable for recently developed tabular data and time series data. Self-monitoring anomaly detection improves the detection accuracy dramatically in scenarios where the training dataset is contaminated and, falls into two categories: "self-predictive" and. Self-supervised anomaly detection based on self-prediction predicts one part of the sample based on another.

[36] Also mask part of the image and predict the missing area by the unmasked part of the image. Self-supervised anomaly detection based on contrastive learning predicts the relationship between samples, determining whether two views are of the same original sample, and trains a classifier using a contrastive loss. The contrastive learning methods, which are the focus of this thesis, outperform self-predictive methods [37].

Our thesis is in the domain of deep self-supervised anomaly detection and can be used for robust anomaly detection by exploiting the information from the labeled anomaly.

Neural Transformation Learning for Deep Anomaly Detection (NeuTraL AD)

This section introduces the NeuTraL AD (NTL) [31] and LOE [5] methods and deeply analyzes the details of each method for study. There are two types of parameters in NeuTraL AD θ= [φ,θ1:K]: the parametersφof the encoder, and the parametersθ1:Kof the learnable transformations. Data(sample)x is transformed by the neural transformationsTk and embedded in a semantic space by the encoder fφ.

DCL is used to jointly train the transforms and the encoder and also score anomalies after training.

Latent Outlier Exposure (LOE)

By assuming that these losses are optimized over θ, an anomaly can be identified by Eq. For this purpose, a series of parameters θt and labels yt are considered and intermittent updates are continued. Based on these scores, yi can be quantified in terms of its effect on the minimization of Eq.

Since each update improves the loss, the subtraction of block coordinates converges to a local optimum when all losses are bounded from below. For simplicity and memory efficiency, the label delimits Eq. 5) are imposed on each mini-batch, and θ and are optimized in alternation. Often, block coordinate subtraction procedures are secure in their assignment, resulting in suboptimal training.

An NTL (with LOES) is simply performed by modifying the restricted set (Eq. 8). There are no changes to the model's training and testing scheme. The effect of an identified anomaly yi=0.5 is to minimize an equal combination of both losses, 0.5(Ln(xi) +La(xi)). As described in Section 3.2, the same approach can be used for detecting anomalies in a test set as it was during training.

However, in testing we may not want to assume that we will see the same types of anomalies that we saw in training. It is the same in tests for NTL (with LOES) as well as NTL (with LOEH). In Section IV, we discuss how to modify the objective function of LOE, a common loss function, so that NTL (with LOE) can use the labeled anomaly.

Training Dataset composition

Proposed Method: Labeled anomaly-Informed Neural Transformations (NTL(with

Just as in section 3.2, the effect of an identified anomaly yi =0.5 is to minimize an equal combination of both losses, 0.5(Ln(xi) +La(xi)), and indicates that the algorithm has uncertainty about whether the algorithm belongs to either the normal. or abnormal data, and treat both cases as equally likely. As described in section 3.2, our approach can be used for detecting anomalies in a test set as it was during training. In testing, we may not want to assume that we will see the same types.

A Theory of Neural Transformation Learning

Assertion 1. The 'constant' edge case fφ(Tk(x)) =Cck, where is a vector with one heat encoding the position kth (i.e.ckk=1 ), tends to the minimum of LP(Eq. 4)) as a constant It goes to infinity. Intuitively, in the first proposition, transformations that only encode which transformation is used make prediction easy, and in the second proposition, identity makes any two views of a sample the same, so LC can recognize them easily as positive pairs. As a result of the proposals, the use of LPorLC would not be suitable for transformation learning and anomaly detection.

No matter whether the input is normal or abnormal, LP and LC suffer the same loss if the optimization reaches the edge cases of Propositions 1 and 2. The 'constant' edge case fφ(Tk(x)) =Cck, whereck a one-hot vector encoding diekth position (i.e. ckk=1) does not minimize L(DCL, Eq. Using the chain rule, the partial derivative. zn is equal to zero atz1:K=Cc1:K we get diekth entry ofz0 16) Likewise, by assuming kth entry of∂LNT L(metLOE+)(x). zm is equal to zero atz1:K=Cc1:Kwithm̸=nandm̸=k which we have.

Proposition 3, Part 2. The 'identity' edge-caseTk(x) =xdoes not minimizeL(Eq. 21) Due to Ktimes the cross entropy of the uniform distribution, the identity transformation corresponds to random guess for DCL's task, which is to predict which sample there is the original based on a transformed view. Unlike most contrastive losses, the "negative samples" are derived deterministically from xdata points, not from a noise distribution. In Section V, the structure of the learnable transformations and the encoders are used equally in NTL (with LOE) and the proposed method.

The difference between NTL (with LOE) and the proposed method is that NTL (with LOE) treats the labeled anomaly as normal during training, while the proposed method treats the labeled anomaly as abnormal. We predict that as the labeled anomaly increases, the performance will improve compared to NTL (with LOE). This purpose is to compare performance when practicing without knowledge of the marked anomaly and when practicing with knowledge of the marked anomaly.

Experiment Setup

Performance Evaluation

Simulation study

We define the simulation model as f(r1,r2), where r1 means a true anomaly ratio in the total number of samples, and r2 means the ratio between the labeled anomaly and the true anomaly. That is, in the total number of samples (N), the number of true normal is N×(1−r1), the number of true anomaly is N×r1, and among them the number of labeled anomaly is N×r1×r2. For each sample xi, letyi=0 if the sample is unlabeled normal and unlabeled anomaly, enyi=1 if it is labeled anomaly.

Here we evaluate the NTL(with LOE) and the proposed method (NTL(with LOE+)) performance degradation under different misspecification scenarios with varying true anomaly ratios r1 and labeled anomaly ratios r2. First, we measure the performance degradation in the situation where r2 is misspecified while r1 is fixed, and second, we measure the performance degradation in the situation where r1 is misspecified while r2 is fixed (where (r1,r. The proposed method shows considerable robustness: the method is always better than the NTL (with LOE).

By adjusting r2, we train the training set and finally evaluate each method based on its anomaly detection performance. When r2 is less than 1, it means that there is a contaminated scenario, where the accurate anomaly detection rate of NTL (with LOE) in the contaminated scenario is lower than that of the proposed method. This indicates that training using the labeled anomaly in the contaminated scenarios is more useful for the anomaly detection performance, and the proposed method has proven robustness in the contaminated scenarios.

Figure 8: r 1 mis-specification in NTL(with LOE)

Real-world study

Piazza, “Acoustic novelty detection with adversarial autoencoders,” in 2017 International Joint Conference on Neural Networks (IJCNN). Paffenroth, "Anomaly detection with robust deep autoencoders," in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. Langs, “Unsupervised anomaly detection with generative adversarial networks to guide marker discovery,” in International. conference on information processing in medical imaging.

Kloft, “Image Anomaly Detection with Generative Adversarial Networks,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Bischl, “Robust image anomaly detection using adversarial autoencoders,” in European Joint Conference on Machine Learning and Knowledge Discovery in Databases. Sun, “Learning discriminative reconstructions for unsupervised outlier removal,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, p.

Girshick, ‘Momentum contrast for unsupervised visual representation learning’, inProceedings of the IEEE/CVF conference on computer vision and patroonherkenning, 2020, pp. Efros, ‘Unsupervised visual representation learning by context forecast’, in Proceedings of the IEEE international conferentie over computer vision, 2015, pp. Favaro, ‘Unsupervised learning of visual representations bysolving jigsaw puzzels’, in de Europese conferentie over computer vision.

Efros, "Split-brain autoencoders: Unsupervised learning by cross- channel prediction," i Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. Hebert, "Shuffle and learn: unsupervised learning using temporal order verification," en europæisk konference om computersyn. Rudolph, "Neural transformationslæring til dyb anomalidetektion ud over billeder," i International Conference on Machine Learning.

Hu, "Simmim: A simple framework for masked image modeling," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. Maaten, "Self-supervised learning of pretext-invariant representations," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, p.