Subcomponent Design: Anomaly Detection Model

+extract_features()

+extract_log_key_sequence_by_session () +extract_log_key_sequence()

+create_features_add_labels() +fit_transform()

+transform() +_transform() +sample_by_session +window_size +verbose +training_mode +data_transformation +unique_keys +output_dir +name

FeatureExtractor

Figure 5.3: UML Class Diagram of theFeatureExtractor class.

5. verbose- a flag specifying whether the Feature Extractor will output information regarding its operation

6. output dir - a string specifying a path to a directory where all feature extractor outputs are to be stored

7. name - a unique name for identifying a particular instance of the Feature Extractor An additional class attribute, unique keys, is used to store the set of unique log event keys present in the log dataset that is to undergo Feature Extraction. The methods of the class implement the various processes as previously described.

LSTM

(# hidden units) LSTM

(# hidden units) Fully Connected / Linear

Network

LSTM Layer 1 LSTM Layer n

Output Vector of size:

output_size

Input Layer x_1 x_2 x_3 x_n

Accepts sequence of variable size as input.

Sequence length = window_size

Figure 5.4: Architecture of the Long Short-Term Memory Recurrent Neural Network implemented by theAnomaly Detection Model. Configurable hyperparameters may be used to

change the model architecture

Figure 5.4 shows a generic LSTM neural network architecture, rolled out in time, that consists of multiple LSTM-layers and that is bidirectional. Research into the use of LSTM neural networks for the problem of log file analysis in the form of anomaly detection has illustrated various LSTM architectures in terms of the number of LSTM layers and whether or not the LSTM is bidirectional. The number of hidden units in the LSTM layers also varies. These various architectures result in varying degrees of performance across various datasets. Since log files generated by different systems can differ substantially in structure and content, proposing a single, defined LSTM architecture for the Anomaly Detection Model would be inefficient and would lead to less than favourable performance in most cases. For the best results, a unique LSTM architecture is to be modelled and trained for each unique dataset. To facilitate this, the following hyperparameters, also shown in Figure 5.4, are tunable for the implementation of the Anomaly Detection Model:

1. input size - the number of features describing each input. For the features extracted by the Feature Extractor, each input is only described by a single feature i.e. the log event key

2. hidden size - the number of hidden units the LSTM layer of the network should have 3. output size - the size of the output of the overall model. For the Inference Engine, this is

equal to the number of unique log event keys for a given system

4. num layers - the number of LSTM layers that may be implemented in the network architecture

5. bidrectional - flag indicating whether a bidirectional LSTM should be implemented or not These hyperparameters are provided to the Anomaly Detection Model and are used to define the architecture of the LSTM model that is to be implemented. When the Inference Engine is to be used on a new system, these hyperparameters must be tuned and selected to achieve optimal performance for that particular system. The impact of these hyperparameters on Inference Engine performance, and their tuning, shall be investigated and detailed in Chapter 7.

It should be noted that while the architecture of the LSTM model may vary, design decisions are made regarding the structure of the input and output layers of the network. As shown in Figure 5.4, the LSTM network is designed to accept an input sequence of log event keys and the size of this sequence may vary. Each element in the sequence, corresponding to a single log key, is fed into the LSTM network sequentially. Since the problem is being modelled as a multi-class classification task, only the final output of the LSTM network, after the entire input sequence has been considered, is of interest. This final output, of the LSTM layers, is then fed into a Fully Connected Linear Neural Network Layer to covert the output into a vector containing probability scores for each of the classes. Each index in this vector represents one of the classes and the score represents the likelihood of that class being the target class for the given input sequence.

The Anomaly Detection Model is implemented as a Python class, named AnomalyDetectorL- STM, using the PyTorch Framework. The UML Class Diagram for this class is shown in Figure 5.5.

+forward()

+initialise_hidden_and_cell_state() +input_size

+hidden_size +output_size +num_layers +num_directions +lstm_layer +linear_output_layer

AnomalyDetectorLSTM

torch.nn.Module

Figure 5.5: UML Class Diagram for theAnomalyDetectorLSTM class that implements the Anomaly Detection Model

TheAnomalyDetectorLSTM class inherits from PyTorch’snn.Module class which is a base class for PyTorch Neural Netorks. When an instance of AnomalyDetectorLSTM is initialised, it is provided with the hyperparameters, as previously discussed, as parameters. During initialisa- tion, the LSTM and Linear layers are created, with the provided hyperparameters, using the nn.LSTM and nn.Linear classes of PyTorch. Two methods are defined for theAnomalyDetec- torLSTM class:

1. forward - this implements the forward pass function in which the input data is propagated in a forward direction through the layers of the network model to predict the output values 2. initialise hidden and cell states - this initialises the hidden and cell states of the LSTM

layers. This is performed for every new input sequence that is fed into the model

The use of PyTorch enables the model to be run, for both training and inference, on either CPU or GPU hardware if available, without any changes to the model implementation. After initialising an instance of theAnomalyDectorLSTM class to implement the Anomaly Detection Model, the forward method is used to run the model on a given set of input data to generate the desired output.

Dalam dokumen PDF Presented by: University (Halaman 97-100)