Classification Model Generation - thesis.pdf - Research Commons

7.1 Methodology

7.1.3 Classification Model Generation

Having captured and filtered samples of malicious and benign traces, the next step is to combine these traces in an FSM. Given a set Σ of tokens and two sets B, M ⊆ Σ^∗ of benign and malicious token traces, an FSM G is constructed with

positive and negative languages

L⁺(G) =MΣ^∗ ; (7.1)

L⁻(G) = Pre(B)\MΣ^∗ . (7.2)

Every malicious trace and every possible continuation of a malicious trace is classified as positive (7.1). This represents the idea that, once malicious behaviour in M has occurred, damage has been done and cannot be reverted by further ac- tion. Furthermore, all prefixes of benign traces are classified as negative, so traces that can be continued to an observed benign trace are not considered as malicious (7.2). To ensure the requirement from Section 2.5.2 that the positive and negative languages are disjoint, traces that occur both in the malicious and benign sets are classified only as positive by removing the continuations of malicious traces from the negative language (7.2). That is, ambiguous traces encountered as both malicious and benign lead to false positives rather than false negatives. This situation did not arise during the experiments carried out for this work.

To construct the FSM G, first the regular expressions representing the reduced token traces according to Section 7.1.2 are converted to nondeterministic FSMs using a standard algorithm (Hopcroft et al., 2001). These FSMs are then combined in a single nondeterministic FSM with several initial states that includes the positive and negative languages (7.1) and (7.2). This nondeterministic FSM is converted to a minimal deterministic FSM with the same positive and negative languages using subset construction and Hopcroft’s minimisation algorithm (Hopcroft et al., 2001). If the deterministic FSM from subset construction contains any states that are marked as both positive and negative, then the negative marking is removed from these states to ensure (7.2).

106

G: det(G) : S :

−

− ORF

ORF

OCWF

−

− ORF

OCWF

−

− OD

ORF

ORF +

ORF

OCWF

Σ +

−

+ OD ORF

ORF U

ORF OCWF

ORF ORF OCWF

OCWF

−

+ OCWF,OD

OCWF,U

ORF

Figure 7.2: Constructing a classification model. The nondeterministic FSM G is constructed from the malicious and benign traces, converted to a deterministic FSM det(G), and minimised by supervisor reduction to obtain the classification model S.

Finally, the minimal deterministic FSM is passed to supervisor reduction (Sec- tion 2.5.3) to obtain the classification model. The classification model is a deterministic FSM that can then be used to classify new observed behaviour as malicious or benign as follows. Starting from the initial state, system calls are observed and converted to tokens, and each token is used to update the classification model’s state. If a positive state is reached at some point, this means that malicious behaviour has been detected. It is also possible that a token is generated for which no transition is defined from the current state of the classification model—then no further tokens are processed and the behaviour is considered as benign.

Example 7.3. Consider the token set Σ = {OD,OCWF,ORF,U} and the benign

and malicious languages

B ={ORF ORF OCWF, ORF OCWF, OD ORF ORF ORF} ; (7.3)

M ={OD ORF OCWF, OD U U} . (7.4)

Figure 7.2 shows the nondeterministic FSMG constructed from these traces (with- out loop detection). The three benign traces appear on the left with all their states negative. Thus, all prefixes of benign traces lead to negative states. The two malicious traces on the right have positive end states, and the selfloops labelled Σ indicate transitions with all events in Σ to represent that continuations of malicious traces remain malicious. Figure 7.2 also shows the minimal deterministic FSM det(G) equivalent to G and a classification model S resulting from supervisor reduction. Here, comma-separated tokens on a transition indicate parallel transitions with each listed token.

It is clear that the benign trace ORF ORF OCWF takes S to a negative state, indicating classification as benign, while the malicious traceOD ORF OCWFtakesS to its positive state and is classified as malicious. The trace OD U, which takes det(G) to a “don’t care” state, takes S to its positive state and is classified as malicious. The trace U U is not accepted by S from the first step onward and is considered as benign.

As shown in the example, supervisor reduction does not only reduce the number of states to produce a more manageable classification model, it also attempts to predict and generate a best-fit model based on the “don’t care” states. The hope is that this way of minimising the number of states also leads to improved classification accuracy.

108

FSMs with positive and negative states as shown in Figure 7.2 cannot be sub- mitted directly to supervisor reduction in a supervisory control tool such as Water- s/Supremica. Therefore, the FSM model is modified to convert the classification problem to a supervisory control problem as follows. The nondeterministic FSMG is augmented with two states b and m, and with an event µ to represent the detection of malicious behaviour. For each negative state x, a transition x →^µ b is created, and for each positive statey, a transitiony →^µ m is created, and all states except b are declared to be accepting states. This ensures that, if a maximally permissive nonblocking supervisor (Ramadge and Wonham, 1989) is synthesised from the resulting deterministic FSM, the event µ is enabled in positive states and disabled in negative states that are not also positive. That is, the event µ is enabled to signal the detection of malicious behaviour.

Dalam dokumen thesis.pdf - Research Commons - University of Waikato (Halaman 118-122)