• Tidak ada hasil yang ditemukan

Log Parsing

Dalam dokumen PDF Presented by: University (Halaman 53-57)

3.2 Automated Log File Analysis

3.2.2 Log Parsing

Figure 3.11: Figure illustrating the process of log parsing

Log parsing is an important pre-processing step in the effort to automate log file analysis. It is employed in various automated log file analysis methods and not just machine learning based methods which typically have data pre-processing as a prerequisite [39]. The process of log parsing, involves extracting information from log files and transforming them into a sequence of structured events [35].

Log parsing is required because the log messages contained in the log files generated by many systems are commonly unstructured as developers use free-form text to record log messages as it is more flexible [35]. This allows developers to log various events, debug information, warnings

and errors without having to conform to a strict, predefined structure. However, without any form of structure, automated analysis methods, and manual inspection methods, struggle to extract any meaningful information from the log files as there is seemingly no inherent pattern to the log messages. Log parsing addresses this by converting the logs into a data format with structure that is more efficient to work with and easier to extract patterns from. While the content of log messages varies greatly, they typically contain two major components: a static or constant event template, and the variable runtime parameters associated with each instance of an event. Consider the example log messages in Figure 3.12.

Figure 3.12: Figure showing raw logs being parsed into a event templates and arranges as a sequence of event templates. Image source: [35]

From the log message example in Figure 3.12, an example event template is Receiving block

* src: * dest: * and the associated runtime parameters for the event are blk 9047918154..., /10.251.43.219:55700 and /10.251.43.210:50010. Since events may occur frequently during program execution with different runtime parameters, log messages develop an unstructured nature. The goal of log parsing, as illustrated in Figure 3.11, is to extract the event template

and the associated runtime parameters for each log messages and to present the messages as a sequence of structured events. An example of this is shown in Figure 3.13.

Figure 3.13: Figure showing log parsing. The source code generating the log message is shown, followed by the actual log message. Finally, the parsed log split

into log event template and runtime parameters is shown. Image source: [40]

As can be seen from Figure 3.13 above, the log messages are all parsed to extract their event templates and runtime parameters. The log event templates are then arranged as a sequence of structured events as shown in Figure 3.12. Having the log files represented in this way makes patterns in the log file more readily detectable by data-driven analysis methods such as machine learning.

Early implementations of automated log parsing employed the use of regular expressions to define a search pattern to match to specific log events [41]. This yielded some success, but the the regular expressions had to be created manually for each event type expected in the logs. This proved to be quite challenging as the number of event templates for a given system increased.

Additionally, if the system in question was ever upgraded or modified, and the event templates changed, the regular expression rules would need to be manually updated as well. Other methods considered extracting the event templates directly from the source code [4]. Most log messages are implemented usingprintf,print ,or other similar statements, to print a string consisting of the event template and variables corresponding to the runtime parameters associated with that event. Therefore, extracting these from the source code is plausible. However, source code is not always available and methods often need to be tailored to a specific programming language.

As a result, these methods are neither scalable nor robust.

To fully automate the process of log parsing, research as moved toward data driven approaches that seek to automatically learn patterns from the data [40]. Such approaches commonly include Clustering [39], Frequent Pattern Mining [18] and Heuristic-based approaches [42] [43] [44].

Clustering

This approach refers to techniques that rely on clustering logs messages of similar patterns into groups, and then extracting log templates from the common components of all the log messages in each group or cluster. Research has resulted in a number of log parsing techniques based on clustering methods including LogMine[45], LogSig [46], and LKE[39]. LKE, uses a hierarchical clustering algorithm based on the weighted edit distance between pairs of log messages[39].

LogSig uses message signature based algorithm to group log messages into a pre-defined number of clusters [46]. LogMine also employs hierarchical clustering and clusters log messages into groups more efficiently by exploiting various optimisation techniques for clustering.

Frequent Pattern Mining

Frequent Pattern Mining techniques identify patterns that occur frequently in given datasets.

The log message event templates are assumed to be patterns that occur frequently in log files. As a result, frequent pattern mining can extract these event templates from logs. Early examples of frequent pattern mining techniques can be seen in the SLCT [18] and in its extension, LogCluster [47].

Heuristic-based

Heuristic-based methods refers to a group a of methods that exploit the unique characteristics or features of log files [40]. One such method, AEL, compares the occurrences between constant parts of log messages, i.e. event templates, and variable parts, and then groups the messages into clusters based on this comparison [48][49]. Another method, IPLoM, uses iterative parti- tioning to group group messages by the length of the message and the position of various tokens occurring in the message [43][42]. Drain is a log parsing technique that uses a parsing tree to represent the log messages and extract common event templates [44].

Other methods

There are also other log parsing techniques that don’t fall into the aforementioned categories.

One such method is Spell which uses the longest common subsequence algorithm to parse log messages and extract event templates [50].

Logparser: A log parsing toolkit for benchmarking and evaluation

As can be seen, there are many log parsing techniques based on various methods. All these techniques share the common goal of extracting log message event templates such that the log files can be represented as a structured sequence of events. The authors of [40] and [37] have studied and evaluated various log parsing techniques that have been implemented in recent years. Their study has resulted in an open-source toolkit, Logparser, that implements and

packages many of the most prominent log parsing techniques in recent years into a single tool that may be used for evaluation and benchmarking [7]. This tool, alongside the associated research studies, will be used to initially identify and evaluate log parsing techniques that may be considered and implemented in this research project. This is detailed in Chapter 4.

Dalam dokumen PDF Presented by: University (Halaman 53-57)