A zscore v - Advances in Computer Science and Information Technology

Outliers Detection as Network Intrusion Detection System 105 3.3 Z-Score Method

There are many methods for data objects classification. This method is simple and easy to implement for finding outliers. In z-score method, first we calculate the mean and standard deviation values for each attribute, Ai. Using these values zscores can be calculated for each object [10]. It is define as

A

106 D. Nagaraju et al.

Table 3. Z-Score values for the attribute src_bytes

protocol service flag src_bytes Class z-score

Tcp http SF 349 normal. -0.14754

Tcp ftp SF 350 normal. -0.14737

Tcp http SF 350 normal. -0.14737

Tcp http SF 351 normal. -0.14721

Tcp http SF 352 normal. -0.14704

Tcp private REJ 0 ipsweep. -0.20559

Tcp smtp SF 0 ipsweep. -0.20559

Tcp private REJ 0 ipsweep. -0.20559

Tcp finger S0 0 land. -0.20559

Tcp private S0 0 neptune. -0.20559

Udp private SF 28 teardrop. -0.20093

Icmp ecr_i SF 1032 smurf. -0.03393 Icmp ecr_i SF 1032 smurf. -0.03393 Icmp ecr_i SF 1032 smurf. -0.03393

Icmp ecr_i SF 1480 pod. 0.040585

Tcp ftp_data SF 7045 warezclient. 0.966234

Tcp http RSTR 47612 back. 7.713906

Tcp http RSTR 53168 back. 8.638057

Outliers Detection as Network Intrusion Detection System 107

where v is the value of A. The

A

, and

σ

_Aare the mean and standard deviation, respectively, of an attribute value A. By considering the majority of votes, then that particular object is determined as an outlier based on threshold value. The threshold value is determined as on training samples. The threshold value and z-score value compared to determine outliers. zscore values along the sample dataset for kddcup dataset is shown in table3.

By setting threshold value to compare z-score value, we can separate intruders from normal objects.

3.4 Multiclass Classifier 3.4.1 Bayesian Classifier

One of most effective classifiers in predictive performance is so called naïve Bayesian classifier [11]. This classifier learns from training data the conditional probability of each attribute Ai given the class label C. Classification is then done by applying Bayes rule to compute the probability of C given the particular instance of A1,. . . , An, and then predicting the class with highest posterior probability. This computation is rendered feasible by making a strong independence assumption: all the attributes Ai are conditionally independent given the value of the class C. But this assumption is unrealistic and a better method is Bayesian network.

3.4.2 Bayesian Network Classifier

These networks [12] are directed acyclic graphs that allow efficient and effective rep- resentation of the joint probability distribution over a set of random variable. Each vertex represents a random variable, and edges represent direct correlations between the variables. More precisely, the network encodes the following conditional independence statements: each variable is independent of its non-descendants in the graph given the state of its parents. These independencies are then exploited to reduce the number of parameters needed to characterize a probability distribution, and to effi- ciently compute posterior probabilities given evidence. Probabilistic parameters are encoded in a set of tables, one for each variable, in the form of local conditional distributions of a variable given its parents. Using the independence statements encoded in the network, the joint distribution is uniquely determined by these local conditional distributions.

3.4.3 Learning Bayesian Networks

Consider a finite set U = { X1, X2, . . . , Xn } of discrete random variable where each variable Xi may take on values from a finite set, denoted by Val(Xi). We use capital letters such as X, Y, Z for variable names, and lower-case letters such as x,y,z to de- note specific values taken by those variables. Sets of variables are denoted by boldface capital letters such as X, Y, Z and assignments of values to the variables in these sets are denoted by boldface lowercase letters x,y,z. Finally, let P be a joint probability distribution over the variables in U, and let X, Y, Z be subsets of U. We say that X and Y are conditionally independent given Z.

A Bayesian network [13] is an annotated directed acyclic graph that encodes a joint probability distribution over a set of random variables U. Formally, a Bayesian

108 D. Nagaraju et al.

network for U is a pair B = (G,Θ). The first component, G is a directed acyclic graph whose vertices correspond to the random variables X1, X2, . . . , Xn, and whose edges represent direct dependencies between the variables. The graph G encodes independent assumptions: each variable Xi is independent of its nondescendants given its parents in G. The second component of the pair, namely Θ, represents the set of parameters that quantifies the network. A Bayesian network B defines a unique joint probability distribution over U given by [13].

∏

= =

Π

= Π

=

ⁿ

x x n

x i B n

X X X P

X

_i _i _i

P

1 1

2 ,

,..., ) ( | ) |

( θ

⁽³⁾

3.5 K-Nearest Neighbor (k-NN) Method

There are many distance based method for finding outliers. Knn is one of the distance based approach for classification of objects. The K-Nearest Neighbors (KNNs) model [14] is a very simple, but powerful tool. It has been used in many different applica- tions and particularly in classification or in clustering tasks. K-Nearest Neighbor method is based on learning by analogy, that by computing a given test tuple with training tuples that are similar to it. When given an unknown tuple, a k-NN method searches the pattern space for the k training tuples that are closest to the unknown tuple. Closeness is defined in terms of distance metric, such as Euclidean distance.

The Euclidean distance between two points or tuples, say X1=(x11,x12,……,x1n) and X2=(x21,x22,……,x2n), is defined as

∑

−

=

ⁿ

x

x X

X

Dalam dokumen Advances in Computer Science and Information Technology (Halaman 133-136)