This literature review provided the necessary background and context for the problem of debug- ging complex system failure. It also presented and explored the MeerKAT Correlator Beam- former as the case study system. Motivation for the use of log file analysis for debugging such complex systems was presented and the challenges and shortcomings associated with these methods were critiqued.
A need for more efficient, effective and robust methods to debug complex system failure steered the investigation toward automated log file analysis techniques. More specifically, this review considered data-driven machine learning and deep leaning based techniques as these methods have gained increased popularity in recent years. Research has shown that various machine and deep learning techniques may be used to perform log file analysis with the goal of detecting anomalies in system log files, with LSTM networks showing particular promise in terms of modelling expected behaviour from data contained in log files. Similarly, research has also highlighted the importance of preprocessing and parsing log files before implementing techniques for performing automated log file analysis.
With the necessary foundation laid, this research project now moves toward the development of the Automated Log File Analysis Framework (ALFAF) described in Chapter 1. This framework shall consist of two major components: a Data Miner and an Inference Engine.
The Data Miner will be responsible for pre-processing and parsing the log files obtained from the system of interest. It will implement log parsing as described in Section 3.2.2. The design and implementation of the Data Miner is presented in Chapter 4. The Inference Engine will be responsible for performing anomaly detection, using the features generated by the Data Miner, to flag, identify and detect failures that occur within a given system, from the information contained in log files. It will implement anomaly detection using an LSTM Neural Network as detailed in Section 3.2.4. The design and implementation of the Inference Engine is detailed in Chapter 5. With the context of the case study system as presented in Section 3.1.4, these components are then finally integrated to realise the complete, end-to-end ALFAF in Chapter 6.
4
Design and Development: Data Miner
With the Concept Exploration Phase of the project completed, sufficient information and detail has been acquired to inform the System Design Phase. This chapter describes the design and development of the Data Miner component of the complete end-to-end automated log file analysis framework described in Chapter 1. The process followed during this phase is detailed in Section 2.2.
Through a high-level overview of the Data Miner and detailed designs of all the sub-components making up the complete Data Miner, this chapter completely describes the Data Miner compo- nent of the framework. Testing and verification elements are considered during the design but are formally detailed in Chapter 7.
Informed by the research questions and project objectives, a set of guiding design considerations and constraints for the Data Miner is identified and collated. These considerations are used to guide the design and development process of the Data Miner and are described in Section 4.1.
To initiate the design process, with reference to the design considerations and system require- ments, a high-level design of the Data Miner is presented in Section 4.2 to identify the major sub-components of the Data Miner, identify interfaces between the Data Miner, other frame- work components and the external system, and describe the high-level functionality of the Data Miner.
Following this, the design of the identified sub-components is detailed in Sections 4.3, 4.4 and 4.5. These sections detail the design of the Pre-Parser, Pre-Processor and Log Parser sub- components respectively.
Where and as applicable, the architecture of the software designed and developed for the various components and sub-components is described using UML Class and Activity diagrams.
With the design of all sub-components sufficiently detailed, the integration of these sub-components to realise the complete Data Miner component and the final design of the Data Miner is detailed in Section 4.6.
Lastly, the development of a tuning tool for tuning various log parsing algorithms is discussed in Section 4.7.
4.1 Design Considerations and System Requirements
The Data Miner is one of the components of the Automated Log File Analysis Framework (ALFAF) described in Chapter 1. The function of the Data Miner is to parse and process the raw, unstructured log files, containing log messages, from the system in question, into a structured dataset consisting of a sequence of events and extracted runtime variable parameters.
As discussed in Section 3.2.2, log parsing is an important pre-processing function in performing automated log file analysis.
Guided by the research questions outlined in Section 1.2 and the literature reviewed in Chapter 3, the following design considerations are put forward to influence the design of the Data Miner:
Input Log Files The Data Miner shall be capable of parsing and processing log files, stored and available as raw text, containing log messages produced by any system. The Data Miner shall accept raw text files, containing log messages, as input. The Data Miner shall not be limited in terms of the size of log files it can accept as input. The log messages, contained in the log files, shall have a discernable content component that contains the actual log message and this shall be distinguishable from the rest of the log message.
Output Data The Data Miner shall, after processing input log files, generate a structured dataset consisting of a sequence of event templates corresponding to the sequence of log messages in the input file, and the extracted variable runtime parameters of each log message. This dataset may be used to build behavioural models of the system to determine if the correct sequence of events was followed during system operation/runtime. The availability of runtime parameters also facilitates anomaly detection. The Data Miner shall also output an event occurrence matrix that contains the occurrences of each event identified in the log file.
InterfacesThe Data Miner shall have an interface to the Inference Engine. This interface shall be described by the format of the data passed from the Data Miner to the Inference Engine.
The format of the data shall be machine learning model-ready e.g. a comma separated values (CSV) file that can be loaded into a matrix structure.
The Data Miner shall also have an interface to the system under test to access the log files.
Depending on how the Data Miner is used, this may be a direct interface, or the interface may be realised through the encapsulating ALFAF.
Parsing Algorithm As evidenced by Section 3.2.2, log parsing algorithms typically vary in performance across datasets. The Data Miner shall implement various state-of-the-art log pars- ing algorithms to process and parse log files. The availability of multiple algorithms enables the evaluation of all algorithms in the context of the Data Miner to see how they perform. Many of the log parsing algorithms have parameters that may be tuned to affect the performance of the algorithm. The Data Miner shall expose an interface for configuring the parameters of the various log parsing algorithms that are implemented.
Usage The Data Miner shall be a modular, stand-alone tool that may be run independently of the ALFAF. This facilitates independent development and testing of the Data Miner before integrating into the framework.
Performance Metrics The Data Miner shall output various performance metrics after pro- cessing such as parsing accuracy and time taken to parse. This will aid in evaluation of the performance of the Data Miner across various datasets. These performance metrics are further detailed in Chapter 7.
These design considerations and guiding system requirements inform the design and develop- ment of the Data Miner in subsequent sections of this chapter.