VNU Journal of Science, N atural Sciences and Technology 24 (2008) 71-81
A program anomaly intrusion detection scheme based on fuzzy inference
Dau Xuan H oang'’*, Minh N goc Nguyen^
^Department o f Computer Science, Faculty o f Information Technology, The Posts and Telecommunications Institute o f Technology^ (PTỈT),
Ỉ 22 Hoang Quoc Viet, Cau Giay, Hanoi, Vietnam
'Vietnam Posts and Telecommunications (VNPT), Floor, Ocean Park Building, No.] Dao Duv Anh, Dong Da, Hanoi, Vietnam
Received 31 O ctober 2007
Abstract. A m ajor problem of existing anom aly intrusion detection approaches is that they tend to produce excessive false alarm s. One reason for this is that the normal and abnorm al behaviour o f a m onitored object can overlap or be very close to each other, w hich m akes it difficult to define a clear boundary betw een the two. In this paper, we present a fuzzy-based schem e for program anomaly intrusion detection using system calls. Instead o f using crisp conditions, or fixed thresholds, fuzzy sets are used to represent the param eter space o f the program sequences o f system calls. In addition, fuzzy rules are used to combine m ultiple param eters o f each sequence, using fuzzy reasoning, in order to detemiine the sequence status. Experimental results showed that the proposed fuzzy-based detection scheme reduced false positive alarms by 48%, compared to the normal database scheme.
Keywords: anom aly intrusion detection, fuzzy logic, hidden M arkov m odel, program -based anom aly intrusion detection.
1. In tro d u c tio n alw ays be precisely defined since the normal and abnonnal behavior can overlap or be very One o f the m ost difficult tasks in anom aly close to each other [1-3]. This leads to an intrusion detection is to determ ine the increase in false alarm s and a decrease in the boundaries betw een the norm al and abnoưnal detection rate. This paper proposes a fuzzy- behavior o f a m onitored object. A w ell-defined based solution to reduce false alarms for boundary helps an anom aly detection system program anom aly detection using system calls, coưectly label the current behavior as normal or The boundaries betw een the nom ial and abnorm al. U nfortunately, the border betw een abnormal behavior o f a m onitored object can be the norm al and abnorm al behavior may not divided into two types: hard boundaries and soft boundaries. A hard boundary is usually represented in the form o f crisp conditions or
* Corresponding author. E-mail: [email protected]
71
72 D .x . M .N . N g u ỵen / V N U journal o f Science, N atural Sciences and Technoỉogỵ 24 (2008) 7 Ĩ-8 Ĩ
fixed thresholds. For exam ple, in the noưnal database detection scheme [4], a short sequence o f system calls is labeled as norm al if it is seen in the training set. O therw ise, it is classified as abnorm al. In the second test o f the hidden M arkov m odel-based (H M M ) tw o-layer detection scheme [5], a probability threshold p is used to detennine the status o f short sequences. If a sequence's probability p , generated by the HM M m odel, is equal or greater than the probability threshold ( P > P ) , it is considered normal. O therw ise, it is considered abnom ial (P < P ) .
In contrast, a soft boundary is usually represented by fuzzy sets and rules, instead o f crisp conditions, or fixed thresholds. In [1], five fuzzy sets are used to represent the space o f each input network data source. In addition, a set o f fuzzy rules is defined to combine a set o f inputs in order to produce an output w hich is the status o f cu ư en t network activity. In [6], a set o f fuzzy association rules is generated to represent the noưnal behavior o f network fraffic.
A nom aly detection approaches, based on soft boundaries in general, or fuzzy sets and rules in particular, can produce better detection results than those based on hard boundaries, because o f the following reasons:
• Since norm alcy and a b n o rm a lc y are not truly crisp concepts, it is difficult to define a hard boundary that can create a sharp distinction betw een the norm al and abnorm al. Therefore, it is natural to use fuzzy sets to define a “ soft” border between them [1,3]. In fuzzy logic ternis, the normal is represented by the degree o f normalcy.
Sim ilarly, the abnorm al is represented by the degree o f abnoưnalcy.
• A nom aly detection systems, based on fuzzy inference, can com bine inputs from multiple sources, which leads to better detection perform ance [ 1].
A lthough the application o f fuzzy inference in anom aly intrusion detection is still in an early stage, prom ising results have been reported by several fuzzy-based anom aly detection approaches. Cho [7] reported a high detection rate and a significant reduction in the false positive rate, w hen using fuzzy inference to com bine inputs from three separate HM M m odels deployed in a user anom aly detection system. Luo et al [8] presented a real-tim e anom aly intrusion detection system, in w hich a set o f fuzzy frequent episode rules w as mined from the training data to represent the abnorm ality. The proposed approach [8 reportedly had low er false positive rates than those, based on non-fuzzy frequent episode rules. G ood detection results were also reported in [1,3,6].
In this paper, we propose a fuzzy-based detection scheme w hich is based on the HM M - based tw o-layer detection schem e proposed in our previous w ork [5], and the normal database detection schem e [4], The proposed detection schem e aim s at reducing false alarm s and increasing the detection rate. W c em ploy fuzzy inference to evaluate each short sequence o f system calls, by com bining the sequence’s m ultiple param eters. Each short sequence is represented by three param eters: the sequence probability generated by the HM M m odel, the sequence distance and the sequence frequency produced by the norm al database [4]. Instead o f using crisp conditions or fixed thresholds, a group o f fuzzy sets is defined to represent each param eter's space. A set o f fuzzy rules is created to com bine these input sequence param eters, in order to produce an output o f the sequence status. Experim ental results showed that our fuzzy-based detection schemc reduced false alarm s significantly, com pared to the two- layer detection schem e [5] and the normal database detection schcm e [4].
D .x . Hoan(^, M .N . N g u yen / VNLỈ journal o f Science, N atural Sciences and Technolo<^y 24 (2008) 71-81 73
The rest o f this paper is organized as follows: Scction II gives a b rie f introduction to fuzzy logic, fuzzy sets and fuzzy rules. Section III describes the proposed fuzzy-based scheme for program anom aly intrusion detection using system calls. Section IV presents some experim ental results and discussion. Section V IS o u r c o n c lu sio n .
2. Fuzzv logic
Fuzzy logic is an extension o f Boolean logic, which specifically deals with the conccpt o f partial truth. T he m athem atical principles o f fuzzy sets and fuzzy logic were first presented in 1965 by professor L.A. Zadeh [4]. Since then, fuzzy logic has rapidly bccom e one o f the most successful technologies in the developm ent o f control systems. The application o f fuzzy logic is ranging from simple, small and em bedded m icro-controllers to large data acquisition and control systems
[9J.
W hile a truth value in classical logic can always be expressed in binary term s (0 or 1, True or False, Yes or No), a truth value in fuzzy logic is represented by the degree o f truth. The degree o f truth can be any value in the range '0 .0 , 1.0], w ith 0.0 representing absolute Falseness and 1.0 representing absolute Truth.
2.1. Fuzz}^ sets
M athem atically, a fuzzy set A is defined as follows:
A = { (x, /ì a(x) ) I X g U }
w here is the m em bership function o f the fuzzy set A, and
u
is the Universe o f Discourse.A U niverse o f D iscourse, or U niverse in short, is the range o f all possible values for an input to a fuzzy system.
2.2. Fuzzy rules
Rules in fuzzy logic are used to com bine and interpret inputs in order to produce an output. Fuzzy rules are usually expressed in the IF/THEN form as:
IF <variable> IS <fuzzy set> TH EN <output>
A rule is said to fire if its truth value is greater than 0. It is also noted that there IS no
“ ELSE” clause in a fuzzy rule. All available rules in a fuzzy control system are evaluated bccause an input can belong to more than one
fuzzy set.
Like classical logic, fuzzy logic also supports AND, OR and N O T operators, w hich can be used to create more complex fuzzy rules.
Let .V and>’ be two fuzzy variables, and /u(x) and J.ỉ(y) be the degrees o f m em bership o f X and y , respectively, AND, OR and NOT operators are defined as:
X AND y = ụ(y))
X OR y = fi(y)) N O T x = (l - ju(x))
For more on fuzzy logic, interested readers are referred to [9],[10 .
3. T h e p ro p o se d fu zzy -b ased p ro g ra m an o m aly d ctcctio n schem e
3.1. The proposed fu zzy-b a sed deteciion schem e Fig. 1 shows the proposed fuzzy-based detection schemc w hich is developed in two stages: (a) training stage and (b) testing stage.
In the training stage, the detection model is constructed from the training data w hich is nom ial traces o f system calls o f a program . In the testing stage, the constructed detection model is used to evaluate test traces o f system calls in order to find possible intrusions. The two stages o f the proposed scheme can be described as follows:
74 D .x . H oang, M .N . N g u y e n / V N U Ịoiirnaỉ o f Science, N atural Sciences and Technology 24 (2008) 71-81
N o n iia l/a b n o r n ia l tr a c e s
(a)
T r a i n in g d a ta
H M M Ir a in in g
H M M
StMs & ru le s b u ild in g
F u z z y s e ts a n d ru le s
D aĩabí4se b u ild in g
N o m ia l d a ta b a s e
(b)
Test Iraccs
ị
'ĩtísl sequence
N o n n a l d a ta b ase
U v alu alio n b y N o m ia l d n lab ase Scqucncc
disluncc D
F u /z y sets and rules
U valualion by
HMM
Sciiucncc frciiucnc) I-
Scqucncc probability p
H M M
S e q u e n ce c la ssific a tio n by F u z z y infcrcnce en g in e
N o n n a 1 o r a b n o rm al soqucnce
Fig. 1. The proposed fuzzy-based detection scheme:
(a) Training stage and (b) Testing stage.
• Training stage: a norm al database, an HM M m odel and fuzzy sets are built from training data.
- N orm al database: The database is an ordered list o f all unique short sequences o f system calls found in fraining data.
T he database is created from normal ư aces o f system calls using the method given in [4]. Each short sequence in the n oư nal database has k system calls. In addition, the o ccu ư en ce frequency o f each short sequence in training data is also recorded in the norm al database.
“ H M M m odel; The H M M model is ừ ain ed using norm al traces o f system calls, based on the HM M incremental ừ ain in g schem e, given in [ 11].
- Tlie fuzzy sets are created, as discussed in Section 0.0.
• Testing stage: First, short sequences are form ed from test traces o f system calls using
the sliding w indow m ethod [4]. The sequence length is k system calls. T hen, each short sequence is evaluated in tw o steps as follows:
- Evaluation o f the short sequence by the norm al database and by the H M M model:
In this step, the noưnal database and the H M M m odel are used to com pute the input param eters for the fuzzy inference engine.
- C lassification o f the test sequence by the fuzzy inference engine: In this step, the fuzzy inference engine applies the fuzzy sets and rules to inteipret the input param eters in order to produce the output w hich is the status o f the short sequence:
norm al or abnorm al.
3.2. Fuzzy inference fo r sequence classification As discussed in Section III.A, the fuzzy inference engine is used to evaluate each short sequence to find anom alies by com bining m ultiple sequence param eters. Fig. 2 shows the fuzzy inference engine for the classification o f short sequences o f system calls. The engine accepts the sequence's param eters as the input, and then applies the fuzzy sets and rules to produce the sequence's status as the output. The sequence param eters include the sequence probability p generated by the HM M model, and the sequence distance D and frequency F produced by the norm al database.
Creation o f fu zzy sets and rules
As show n in Fig. 2, fuzzy sets and rules are used by the fuzzy inference engine to interpret the input and generate the output.
Creation o f fu zzy sets
W e em pirically created fuzzy sets to represent the space o f each sequence parameter as follows:
D .x . H oan^, M.N. N g u ỵen / V N U lournnl o f Science, Natural Sciences and Technology 24 (2008) 71-81 75
• F o u r fu z z y s e ts , n a m e ly Very L ow , Low , High and Veỉy High, are created for the sequence probability p , to represent very low, low, high and very high sequence probabilities, respectively.
• Four fuzzy sets, nam ely Zero, Small, M edium and Large, are created to represent zero (for m atched sequences), small, m edium and large sequence distances, respectively.
• Three fuzzy sets, nam ely Low , M edium and H igh, are created to represent low, m edium and high sequence frequencies, respectively.
• Tw o fuzzy sets, nam ely N orm al and Abnorm al, are created to represent the space o f the output sequence anom aly score param eter. The anom aly score fuzzy sets are used in the defuzzification process to convert the output fuzzy set to the actual anom aly score o f the sequence.
C reation o f fu zzy rules
Since the input sequence param eters o f the fuzzy rules, w hich include probability p , distance D and frequency F, are generated by the H M M model and the norm al database, our fuzzy rules inherit the assum ptions used by the norm al database and the H M M -based detection schem es. These assum ptions are as follows:
• A sequence, which is produced with a likely probability by the HM M model, is considered to be normal.
• A sequence, which is produced w ith an unlikely probability by the H M M m odel, is considered to be abnom ial.
• A m ism atched s e q u e n c e is more suspicious than a m atched sequence. The larger the distance betw een a test sequence and normal sequences is the more likely the test sequence is abnorm al.
A m atched sequence with a low occurrence frequency is more suspicious than a sequence with high o ccuư ence frequency.
input (s e q u e n c e p a r a m e t e r s )
i
O u t p u t ( s e q u e n c e s ta t u s )
Fig. 2. The fuzzy inference engine for the classification o f short sequences o f system calls.
Based on the above assum ptions, we m anually devised a set o f 17 fuzzy rules for the sequence classification. A n exam ple o f such a rule reads “ IF probability IS L o w A N D distance IS Zero AND frequency IS L o w T H E N the test sequence IS a b n o r m a r . W e do not present all rules in this paper due to space lim itation.
Sequence classification using fu z z y reasoning
The fuzzy reasoning process, as shown in Fig. 2, evaluates each sequence o f system calls in three phases: fuzzification, fuzzy inference and defuzzification. Fuzzification is the process o f transform ing crisp input values into linguistic values w hich usually are fuzzy sets.
There are tw o tasks p erfo n n ed in the fuzzification process: input values are converted into linguistic values which are represented by fuzzy sets, and m em bership functions are applied to com pute the degree o f truth for each m atched fuzzy set.
D efuzzification is the process o f transform ing the fuzzy value into a crisp value.
In our fuzzy inference engine, the output anom aly score fuzzy set is defuzzified to produce the sequence's anom aly score. There
76 D .x . Hoang, M .N . N g u y e n / V N U journal o f Science, N atural Sciences and Technology 24 (2008) 71-81
are m any defuzzification techniques available, such as centroid m ethod, m ax-m em bership m ethod and w eighted average method. W e used the m ax-m em bership m ethod to com pute the crisp output from the output fuzzy set.
In the fuzzy inference phase, all rules in the fuzzy rule-base are applied to input param eters in order to produce an output. For each rule, first, each prem ise is evaluated, and then all prem ises connected by an A N D are com bined by taking the sm allest value o f their degree o f m em bership as the com bination value o f rule's truth value. The final output fuzzy set o f the fuzzy rule-base is the OR com bination o f results o f all individual rules that fire. It is noted that the truth value o f a rule that fires is non-zero.
The output fuzzy set is defuzzified to produce a crisp output value.
4. E x p e rim e n ta l re s u lts a n d discussion
4.1. D a ta s e t
We used sendm ail traces o f system calls collected in a synthetic environm ent, as given in [12]. The form at o f system call traces and the data collection procedures w ere discussed in [4]. The data sets include:
• N orm al traces are those collected during the program 's norm al activity. N orm al traces o f th e sendm ail p r o g ra m in c lu d e 2 tra c e s w ith the total o f 1,595,612 system calls.
• A bnorm al traces are th o se that come from a program 's abnorm al runs generated by know n intrusions. In the case o f sendm ail abnorm al ừaces, they consist o f 1 trace o f sm 5x intrusion, 1 trace o f sm 565a intrusion, 2 traces o f syslog-local, and 2 traces o f syslog-rem ote intrusion.
4.2. E xperim ental design
In order to m easure the detection rate and the false alarm rate o f our fuzzy-based detection m odel, our experim ents w ere designed as follows:
• M easurem ent o f the fa ls e positive rate'. In this test, we use the proposed fuzzy-based detection schem e to classify normal traces o f system calls, w hich were not used in the construction o f the norm al database, the H M M m odel and the fuzzy sets. Since the norm al traces do not contain any intrusions, any reported alarm s are considered false positives. This experim ent was set up as follows:
- Select first 1,000,000 system calls o f sendm ail noưnal traces as the full training set.
- Form 4 training sets which account for 30% , 50% , 80% and 100% o f the size o f the full ữ aining set.
- C onstruct norm al databases and HMM m odels from these training sets. The chosen values for the sequence length are k = 5, 11 and 15 system calls.
- For each training set and on each selected sequence length, construct membership functions to fuzzy sets o f three sequence param eters, as discussed in Section III.B.
- Select three test traces, each trace o f 50,000 system calls from the sendm ail norm al traces, w hich are not used in the training process, to test for false positive alarm s o f our scheme, the noưnal database schem e [4] and the tw o-layer schem e [5]. R eported abnormal short sequences are counted for each test trace.
• M easurem ent o f anom aly signals and the detection rate: In this test, we use the proposed fuzzy-based scheme to classify abnorm al traces o f system calls to find possible intrusions. Since the abnormal traces have been collected from the
D .x . Hoan^, M .N . N<^uyen Ị V N U journal o f Scicnce, N atural Sciences and T e ch w lo g y 24 (2008) 71-81 11
program 's abnormal runs, generated by know n intrusions, reported alarms in this case can be considered true alarms or detected intrusions. This experim ent was designed as follows:
- C onstruct a normal database and an HM M model for sendm ail program from norm al traces o f 1,000,000 system calls.
W c choose the sequence length = 11 to construct the norm al database from norm al traces, and to form short sequences from abnorm al traces for testing.
- C onstruct m em bership functions to fuzzy sets o f the three sequence param eters, as discussed in Section 0.
- U se the proposed fuzzy-based detection schcm e to evaluate abnormal traces to find abnorm al sequences.
- Use tem porally local regions to group individual abnorm al sequences to m easure the anom aly signals. The selected region length is r = 2 0 .
4.3. Experim ental results
False positive rate
Table 1 shows the overall false positive rate for three test traces w ith a total o f 150,000 system calls (each trace consisting o f 50,000 system calls), as reported by the normal database scheme [4], by the tw o-layer detection scheme [5] and by the fuzzy-based detection scheme, on different ừ ain ing sets w ith the sequence length k = 5, 11 and 15. The total num ber o f short sequences in the test ữaces is dependent on the sequence length and is also given in Table 1.
Table 1. Overall false positive rate o f the norm al database scheme, the tw o-layer detection scheme and the fuzzy-based detection schem e
with the short sequence length, k = 5, 11 and 15 Training data sets
(% o f full data set)
N orm al database schcme [1] (%)
Tw o-layer scheme [5] (%)
Fuzzy-based schem e (%) Sequence length, k =5; 3 test traces with the total o f 149,988 sequences
30% 0.131 0.112 0.067
50% 0.099 0.079 0.057
80% 0.094 0.069 0.049
100% 0.094 0.069 0.049
Sequence length, k = / / ; 3 test traces with the total o f 149,970 sequences
30% 0.194 0.170 0.099
50% 0.155 0.115 0.081
80% 0.150 0.107 0.077
100% 0.147 0.107 0.077
Sequence length, k = /5 ; 3 test traces with the total o f 149,958 sequences
30% 0.225 0.164 0.107
50% 0.176 0.121 0.091
80% 0.174 0.116 0.085
100% 0.171 0.116 0.085
It can be seen from Table 1 that the false positive rale o f the fuzzy-based detection schem e I S much lower than that o f the normal database schcmc [4]. For exam ple, the fuzzy-
based detection schem e produced 48.23% , 48.89% and 50.96% fewer false positive alarms than the norm al database schem e, for the training set o f 80% o f full set, v/ith the
78 D .x . Hoang, M.N. N g u y e n / V N U journal o f Science, N atural Sciences and Technology 24 (2008) 71-81
sequence length k - 5, k = 11 and k = 15, respectively.
It is also noted that there is a significant reduction in the false positive rate o f the fuzzy- based detection scheme, com pared to that o f the two-layer detection schem e [5]. For example, the fuzzy-based detection schem e produced 29.81%, 28.13% and 26.44% few er false positive alarms than the tw o-layer detection scheme for the ừaining set o f 80% o f full set, with sequence length k = 5, k = 11 and Ẩ: = 15, respectively (refers to Table 1).
Fig. 3 shows the dependence o f the false positive rate on the size o f the training sets with the sequence length = 11. W hen the size o f the training set increases, the false positive rate o f the normal database schem e [4] and the two- layer scheme [5] decreases considerably, especially from the training set o f 30% o f the full set to the set o f 50% o f the full set. Since the fuzzy-based scheme has already achieved a low false positive rate at the set o f 30% o f the full set, there is only a small reduction in the false positive rate when the size o f the training set increases.
Ntormal database 0 25% T
TvNO-layer Fuzzy-basecJ
0 20%
• 0.15%
010%
0.05%
0.0 0%
30% 5 0% 80%
Size of training s e t (% of full set)
100%
Fig. 3. The relationship betw een the size o f fraining sets and the false positive rate with
11
.
A nom aly signals a nd the detection rate Table 2 shows a sum m ary o f the detection r e s u lts o f th e tw o - la y e r s c h e m e a n d th e fu zz y - based schem e for som e abnorm al traces which w ere generated by some know n intrusions. The detection perform ance results o f the normal database schem e are taken from Table 3 o f [4].
Sim ilar to the anom aly signal m easurem ent m ethod described in [5], w e m easure anomaly signals based on tem porally local regions. The anom aly score A oĩdi region is com puted as the ratio o f the num ber o f detected abnormal short sequences in the region to the length o f the region r. The average o f anom aly scores is com puted over abnorm al regions that have the anom aly score A greater than the region score threshold  {A > Ẳ), w here Ầ = 40.0%.
Table 2. D etection results produced by the normal database schem e, by the tw o-layer scheme and by the fuzzy-based scheme for some abnorm al traces
Name o f test % detected % o f
abnormal
detected regions
A verage o f scores o f abnorm al regions abnoiiiial
ưaces
flbnormiil
sequences by [1] Two layer (%)
Fuzzy- based (%)
Tw o layer (%)
Fuzzy- based (%)
sm565a 0.60 38.46 76.92 68.00 88.00
sm5x 2.70 31.58 67.11 60.42 72.55
syslog-local N o.l 5.10 12.00 60.00 73.33 84.67
syslog-local N o,2 1.70 16.67 60.26 71.54 86.49
syslog-rem ote N o.l 4.00 28.26 67.39 72.31 86.53
syslog-rem ote N o.2 5.30 24,68 61.04 74.74 83.40
D.x. Hoan<;^, M .N . N g u yen / V N U Journal o f Science, N atural Sciences and Technology 24 (2008) 71-81 79
It can be seen from Table 2 that the fuzzy- based scheme produced significantly better detection results than the tw o-layer schem e [5], in terms o f the num ber o f detected abnormal regions and the generated anomaly signal level.
For the ‘‘'sm Sx'' intrusion trace, the rates o f detected abnorm al regions are 31.58% and 67.11% by the tw o-layer scheme and fuzzy- based scheme, respectively. A lso for this test trace, the fuzzy-based scheme generated the average anom aly score o f 72.55% , com pared to the average anom aly score o f 60.42% produced by the tw o-layer schem e.
Fig. 4 and Fig. 5 show the anomaly signals produced by the tw o-layer scheme [5] and the fuzzy-based scheme for syslog-local No. 1 and syslog-rem ote No. 1 abnormal ừaces, respectively, w ith the sequence length Ấ: = 11. It is noted that anom aly signals are measured based on tem porally local regions for both schemes. It can be seen from these figures that the proposed fuzzy-based scheme generated m uch stronger and clearer anomaly signals than the tw o-layer schem e [5].
Fig. 4. A nom aly signal generated ÍOX sy sĩo g -ỉo c a ỉ abnorm al trace No. 1 by the tw o-layer and fuzzy-based schemes.
Two-Layer —*— Fuzz/-based
1 0
ị
ầệ 06 i 0J0.2
00
k
• • • • • • • • • 1 1 • 1 • •
I * ; • i
It h M H
ị - * : - : • : : - i
^ u Ii;a« \ . I I — ; M M ‘ m /
R e g io n n u m b e r
Fig. 5. A nom aly signal generated for syslo g -rem o te abnorm al ừace No. 1 by the tw o-layer and fuzzy-based schem es.
80 D .x. Hoang, M .N . N g u yen ! V N U journal o f Science, N atural Sciences and Tcchnoỉogỵ 24 (2008) 71-82
4.4. D iscussion
The proposed fuzzy-based detection scheme generated much few er false positive alarms than the nonnal database schem e [4], as showTi in Table 1. For exam ple, the false positive rate o f the nonnal database scheme is 0.174% , as opposed to 0.085% o f the proposed scheme, or a reduction o f 50.96% , when using the training set o f 50% o f the full set, w ith 15.
It is also noted that the proposed detection schem e achieved a much low er false positive rate on sm all-size training sets than the normal database schcme [4]. On the training set o f 30%
o f the full set, the false positive rate o f proposed detection m odel is low er than that o f the norm al database scheme on the full training set. This m eans that the proposed detection m odel requires significant less training data to achieve a better level o f false positive rates than the norm al database scheme [4].
A ccording to experim ental results presented in Tabic 2, our detection schem e coưectly dctccted all intrusions em bedded in all abnorm al traces tested. In contrast, the normal database schcm e [4] m issed the sm 565a intrusion, with only 0 .6% o f abnormal sequences detected. This schem e [4] possibly also m issed the syslog-local intrusion, em bedded in syslog-local trace No. 2, with ju st
1.7% o f abnorm al sequences detected.
The fuzzy inference engine plays an im portant role in the reduction o f false positive alarm s and the increase o f the detection rate.
The fuzzy inference engine that incorporates m ultiple sequence inform ation, generated by the noirnal database and by the HM M model, accurately classifies the sequence. This reduces the false alarm s and increases the detection rate.
5. C o n clu sio n
In this paper, we presented a fuzzy-based scheme for program anom aly intrusion detection using system calls. The proposed fuzzy-based detection schem e is based on the tw o-layer detection schem e [5] and the norm al database detection scheme [4]. Instead o f using c risp c o n d itio n s , o r fix e d th re s h o ld s , fu zz y se ts are created to represent the space o f each sequence parameters. A set o f fuzzy rules is created to combine multiple sequence parameters in order to determine the sequence status, through a fuzzy reasoning process. Experimental results showed that the proposed detection scheme reduced false positive alarms by about 48% and 28%, compared to the normal database scheme [4] and the two-layer scheme [5], respectively. The proposed detection scheme a lso g e n e ra te d m u c h s tro n g e r a n o m a ly sig n als, compared to the normal database scheme [4] and the two-layer scheme [5].
R eferen ces
[11 J.E. D ickcrson, J. Juslin, o . Koukousoula, J.A.
Dickcrson, “ Fuzzy Intrusion D ctcciion,” in the P roceedings o f North Am erican Fuzzy information Processing Societyy Vancouver, Canada, July 25, (2001) 1506.
[2] J. Gom ez, D. D asgupta, “ Evolving Fuzzy Classifiers for Intrusion D ctcclion,” in the Third Annual IEEE Workshop on Inform ation Assurance, New Orleans, Louisiana, USA, June 17-19, 2002.
[3] J. Gom ez, F. G onzalez, D. Dasgupta, ‘‘An Immuno- Fuzzy Approach to A nom aly D ctcction,” in the ỈEEE International Conference on Fuzzy Systems, V ol.2, May 25-28 (2003) 1219.
[4] S. Forrest, s.A . Hofmcyr, A. Somayaji, T.A.
Longstaff, "A sense o f se lf for Unix processes," in the P roceedings o f Ỉ9 9 6 IEEE Sym posium on Computer Security and Privacy, 1996.
[5] X.D. Hoang, J. Hu, p. Bertok, "A multi-layer model for anom aly intrusion detection using program scqucnccs o f system calls," in IE E E International
D .x . Hoan<ị, M .N . N<^in/cn / V N U Joiirnal o f Science. N atural Scieiices and Technology 24 (200S) 7 Ĩ-8 Ĩ 81
Conference on Network ~ IEEE ỈCO N2003, Sydney, Australia, Septem ber (2003) 531.
[6] G. Florc/, S.M. Bridges, R.B. Vaughn. "An Improved Algorithm for Fu/.zy Data M ining for Intrusion D etection." in the 2002 Annual M eeting o f the North Am erican on Fuzzy hifonnaiion Processing Society, June 27-29 (2002) 457.
[7] S. Cho, ‘in co rp o ratin g Soft Com puting Techniques into a Probabilistic Inlmsion Dctcclion System ,” in ỈEEE transactions on sysiems, man, and cybernetics, Vol.32, No.2, May 2002.
[8] J. Luo, S. M. Bridges, R.B. Vaughn, "I-uzzy Frequent Episodes for Real-time Intrusion-D etection,” in ỊEEE International Conference on Fuzzy Systems, M elbourne, Australia, Dcccm bcr 2-5 (2001).
[9] L.A. Zadch, “ Fuzzv sets,” in the inform ation and C ontro!Journal^w 0\.S { \ 9 6 5 ) n s .
[10] E. Cox. “ Fuzzy fundam entals.'' in ỈEEE spectrum . V ol.29, No. 10 O ctobcr (1992) 58.
[11] X.D. Hoang, J. Hu, "An Efficient Hidden Markov Model Training Schcmc for Anom aly Intrusion Dctcction o f Server A pplications Based on System Calls," in ỊEEE International C onference on N etw ork-IE E E ỈCO N 2004, V ol.2, Singapore, Novem ber (2004) 470.
[12] University o f New M exico's C om puter Immune Systems Project web page:
hitp://w w w .cs.unm .cdu/~-im m sec/systcm calls.htm .