Nguyen Van Truong vd Dig Tap chi KHOA HQC & CONG NGHE 106(06): 41 -47
COMBINING NEGATIVE SELECTION AND POSITIVE SELECTION IN ARTIFICIAL IMMUNE SYSTEMS
Nguyen Van Truong'", Vu Thi Nguyet T h u ' , Trinh Van Ha^
'College of Education - TNU 'College of Information and Comnninicalion Technology - T.\'U S U M M A R Y
Artificial Immune System (AIS) is a diverse research area that combines the disciplines of immunology and computation. Negative Selection Algorithm (NSA) and Positive selection algorithm (PSA) are two famous models of AIS designed for anomaly detection. They all contain two stages: generating a set D of detectors from a given set S of self; detecting if a given cell is self or non-self using generated detectors. In this paper, we propose an improvement of r-chunk type detector-based NSA by combining negative selection and positive selection to reduce runtime complexity and memory complexity.
Key words: Artificial immune syslem, negative selection algorithm, positive selection algorithm, computer .security, r-chunk detector.
I N T R O D U C T I O N
The biological immune system is able to recognize which cells are its own (self) and which are foreign (non-self, sueh as bacteria or viruses). T h e representative immune cell is the T cell, which has a self-recognition component and an antigen receptor for locating and eliminating infected cells. By modeling the characteristics of the biological immune system, the system that protects from damage by external attacks and eliminates intruders in the case of c o m p u t e r perspectives is called the aitificial immune system [3].
Biological immune system is a complex, self organizing and highly distributed system, it has no centralized control and uses leaming and memorizing when solving particular tasks [ I I ] . The learning process does not require negative examples and acquired knowledge is represented in an explicit form: T cells are generated randomly and in a large number, in the hope that every pathogen that infects the host is detected by at least s o m e of these cells.
However, the host must ensure that no cell generated would turn against itself - many severe diseases are caused by sueh autoimmune reactions. H e n c e , newborn T
Tel: 0915016063. Email nvlruongln@gmad com
cells undergo the process of negative selection In a special organ, the thymus, they are shown self proteins, which belong to the host. If a T cell delects any self protein, it is destroyed.In contrast with negative selection, in positive selection, the T cells are tested for recognition of Major Histocompatibility Complex molecules expressed on the cortical epithelial cells. If a T cell fails lo recognize any of the Major Histocompatibility Complex molecules, it is discarded; otherwise, it is kept.
Forrest et al. [2, 3] analyzed the biological immune system and they found that the problem faced by the immune syslem is similar to one that today's computer systems face: It is difficult to defend a system against a various unknown danger, such as an exploit of a new security hole. The only reliable knowledge we have is the normal behavior of the syslem - the equivalent of self The idea of the negative selection classification scheme is lo mimic the T cells in the biologicalimmiine system: Generate a set of detectors that do nol match anything in self Ihenuse these detectors lo monitor the system for unusual behavior. An algorithmic abstraction of this biological process called N S A is found interesting implementations: computer virus detection, monitoring UNIX processes, anomaly detection in time series, fault analysis, process
Nguyen VSn Tru'dng vd Dig Tgp chf KHOA HQC & CONG NGH$ 106(06): 41 -47 diagnosis, nuinerical optimization,recognizing
promoters in D N A sequences or Seheduling Problem [4, 8, 9, I I , 12].
The outline o f a typical NSA contains two stages: generation and delcclion [2]. I r Ihc generation slage (Fig. La), Ihc detectors are generated by some random processes and censored by trying to malch given self samples taken from set S. Those candidates that malch are eliminated and the rest are kept
as detectors in set D. In the detection stage (Fig. 2.a), the eolleetion o f detectors (or detector set) is used lo verify whether an incoming data instance is self or non-self I f it matches any detector, it is claimed as non-self or anomaly. Each negative detector will match a subset o f the non-self set. By generating a sufficient number o f independent detcclors, good coverage o f the non-self set eould be obtained.
(;[]^Begin "J]^
No
' T
Generate random candidates
Match self samples''
Yesj
'
Accept as new detector
Enoueh detectors? No
End
a. Negative detector generation b. Positive detector generation Figure 1, Modeli, o/deteclor generalion
B'Jgin ; r Begin
a. Negative detection b. Positive detection Figure 2. Dcleclions of new inslances
Nguyen Van Trudng va Dig T^p chi KHOA HQC & CONG NGHE 106(06): 41 -47 In positive selection, positive detectors are
those that malch some samples; and an instance is clamed as self if it matches any detector. The generation and detection stages are illustrated in Fig. l.b and Fig. 2.b, respectively. Each positive detector will cover a subset of the self set. Several studies have used the concept of positive selection to model their systems [I, 4, 8, 13].
The considered negative r-chunk and r- contiguous detectors are among the most common ones in the AIS literature. Many authors originally research the negative r- contiguous detectors, and negative r-chunk detectors were later introduced to achieve better results on data where adjacent regions of the input strings are not necessarily semantieally correlated, sueh as network data packets [8, 13, 14].
Zhou Ji et al. (2007) [14] showed that there are atleast 16 representations of NSAs. All existing NSAs suffer from a worst-case exponential size of D in the total size of the input, and therefore, limit their practical applicability [7], Our contribution is to develop an r-chunk type detector-based selection algorithm by combining negative selection and positive selection, that reduces both runtime and memory complexities effectively.
The remaining of the paper is organized as follows: In the next section, we present r- chunk detector types. Some modifications of positive detection to have a false negative rate adequate to that of negative selection are discussed The subsequent section, shows our efficient approach in detail. In the lasl section, we summarize our approach and discuss the future work
NEGATIVE AND POSITIVE STRING- BASED DETECTORS
In this paper, we consider NSA and PSA as a classifier operating on a binary string space S', where E= {0, 1). The limited alphabet Z here is just for easy understanding tbe approach; our algorithm can be feasibly
adjusted to real world datasets on arbitrary alphabets. We also use the following notation:
Let s eS^ be a binary string. Then C = |s| is the length of s and s[i,.. J] is the substring of s that starts al position i with length j - i + I Definition 1 (Chunk detectors). An r-chunk detector (d,i) is a tuple of astringd eS'^and an integer! e {!,..., I - r+I}.it matches another siring s e S ' if s[i,..., i+r - I] = d and we also call s match detector (d, i) at the position i.
Definition 2 (Positive chunk detectors). Given a self set S, an r-chunk detector (d, i) is a positive chunk detector if il matches a substring s[i, .., i + r - 1] of s, s e S.
Definition 3 (Negative chunk detectors) Given a self set S, an r-chunk detector (d, i) is a negative chunk detector if it does not matches any substring s[i,..., i + r- I] ofs, s E S.
Example 1.Given a self set S having 6 binary strings, with C = 5 and r - 3 ' S = (s, =00000;
s, = OOOIO; s, = 10110; S4 = 10111; S5 = II000,S6= IIOIO}.
The set Dn of all negative 3-chunk detectors that includes (C - r + I) subsetsis Dn = Dn|U DnjU Dn3 where Dn, = {(001,1); (010,1);
(011,1); (100,1); (111,1)}, Dn, = {(010,2):
(110,2); (111,2)} and Dn-, = {(001,3); (011,3):
(100,3); (101,3)}; The non-self space covered by Dn, calledNn, contains 26 elements in {0, I}W{0000I; 00011; 00100; OOIOI; 00110:
OOlII; OiOOO; OIOOI: OIOIO; OlOIi; 01100:
01101; OHIO; 01111; lOOOO; 10001; 10010:
lOOII; lOlOO; lOIOl; IIOOI; HOII; IHOO:
HIOI; HHO; I H H } ,
The set Dp of all positive 3-chunkdeteclors is Dp =DpiU DpiU DpiwhereDpi = {(000,1), (101,1), (110,1)}, Dp, = {(000,2); (001.2):
(011,2); (100,2); (101,2)} and Dp, = {(000,3), (010,3); (110,3); (II 1,3)}. It can be seen that Dn, and Dp, is the complement of each other in space {0,1} , i = I, 2, 3. The self space delected by Dp, called Sp, contains 26 elements in {0, l}\vhich are {00000. OOOOI;
00010; 00011; OOI 10; OOHI; OIOOO, OIOOI, OIOIO; OIOII; OHIO; OIIH; 10000; lOOOI;
lOOIO, lOOII; lOIOO; lOIOI; lOllO; lOIH;
Nguyen VSn Trudng vd Dig T9P Chl KHOA HOC & CONG NGHE 106(06): 41 -47 11000; lIOOl; IIOIO; HOII; IlllO; H i l l } .
This example shows that the conjunelion of two sets Dn and Dp is not emply. In other words, the spaces delected by two lypes of 3-
• chunk detectors are not the complement of each other. The solution of this ambiguily can be found in NSA literature: if a given cell s malehcs any r-chunk negative dclcclor, it is non-self Our approach may originate lo dual method of NSA: if a given cell s docs nol match I I-chunk positive detector at all position i, 1 - 1. 2,, ., f - r +1, il is non-self.
Using this dual selection method, the sci Sp now contains only 6elcnients oFS that match C - r +1 3-chunk positive detector at all position i, i = I, 2... , C - r +1. We have now two disjoint set Sp and Nn that Sp u Nn = {0, I}^
Il means that NSA and PSA have the same false negative rale. This simple modifications of positive detection leads lo our interesting approach.as described in Ihe following section.
COMBINING NEGATIVE SELECTION AND POSITIVE SELECTION
Our approach is derived from Truong et al.'s work[8], in which onlynegative selection algorithms only are used. In our approach.
binary tree is used as data structure for combining positive selection and negative selection to reduce memory complexity, and lliercfore to reduce time complexity of detection phase. Our algorithm is first construct C r + I binary self trees corresponding to C - r + I Dp.sels. Then all complete sub-trees of these trees are deleted lo achieve a compact representation of the positive r-chunk detector set Finally, for every self tree, if the i"'lree is optimal in memory, it is selected, otherwise it will be replaced by constructing non-self tree corresponding to Dn,and is denoted as T„ i = I,..., C - r + 1 .Therefore,lhere are two type of binary tree self tree and non-self tree. The detection phase can be processed by traveling the optimal trees ileralively one by one.
In Example I, the binary trees Tj, T2, Tj built from Dpi, Dp2, and Dp,, respectively, are illustrated in Figure 3a, 3b, 3e. In these figures, the dash arrows representing sub- trees will be deleted. Moreover, the left child is labeled with 0 and the right one labeled with 1 implicitly.
d. Non-self tree of Dn-
e. Self tree of Dp f Non-self tree of Dn^
Figure 3. Binary treescon.struciedfrom Dn, and Dp,, i = 1,2, 3
Nguyen Van Trudng va £ Tap chi KHOA HQC & CONG NGHE 106(06)'4! -47 The number of nodes of the tree in figure 3.a-
3.f (after deletingredundant nodes) is 9, 10, 7, 6, 8 and 8, respectively. The three selected optimal trees are in the figure 3.a (9 nodes), 3.d (6 nodes) and 3.e or 3.f (8 nodes).
Our efficient algorithm, called PNSA (Positive-Negative Selection Algorilhm),on r- chunk detectors is presented as follows.
ProcedureP^SA;
Input: a self set S, an integer r e {I,...,C}and a cell string s* to be detected.
Output: detection ofs* as self or non-self.
begin
for i = I to C- r + 1 do begin
initialize an emply binary self tree T,;
for each s e S do insert s[i,...,r+i-I] into T„
for every non-leaf node n e T,do
if n is root of complete binary sub-tree then delete this sub-tree;
if this self tree is not optimal then create non- self tree,
end;
flag = true;
while (i<= C - r + I) and flag do Beginif (T, is self tree) and (s*does not match any conealenation of labels from root of T, lo a leaf) thenflag = false;
if (T| is non-self tree) and (s*matches any concatenation of labels from root of T, lo a leaf) then flag = false;
end;
if flag = false then output "s* is non-self else oulput/**s* is self;
end;
The procedure of generating a compact representation of a complete r-chunk detector is conducted in the outer "for" loop. The binary tree T, is constructed in the first inner loop, and the deletion of T, is compleledby the second one, i = I , . . . , C - r + I . The procedure
of detecting if a given cell string s* is self or non-self, is done by Ihe last "if , then... else"
statement.
For example, given S, r(as mentioned in Example 1), and s* = 10100 are the inputs of the algorithm. Then three binary trees are constructed as in Figure 3. The output of the algorithm is the declaration "s* is non-self because the all paths of Tido nol contain sub- string of labels s*[2...4] = 010 of s*.
t(mns)
Figu re 4. Detection time of NSA and PNSA on r We use binary tree as main data structure constructed from self set S that impacts on time complexity. It is easy lo proof that it takes time |S|(£ - r + I).r to generate all necessarytrees and (C - r + 1 ).r to verify a cell siring as self or as non-self in worst case.
Tbe following table compares our results with run limes of the NSAspublished in 2009 [7]
and 2012 [10] on some inputs.
Table 1. Comparison of lime detection Detection time ISI I r " " • " " " (mini second)
" " ' " ° ' ' ' NSA PNSA 1000 50 12 0.0000 91 92 20000 30 15 0.02583 315 299 20000 40 17 0.25885 1096 629 20000 50 20 0.36288 2371 1189 Reduced meinory is tiie ratio of nodes in binary tree developed by NSA reduced by using our PNSA, Tlie training data, S. is created randomly. This table shows that when C and r are big enough, the time to detect is reduced almost half
45
NguySn VSn Trutmg va Dig Tap chl KHOA HQC & CONG NGHE 106(06): 41 - 4 7
We have conducted another e x p e r i m e n t by choosing C = 4 0 , |S| - 2 0 0 0 0 and variable r (from 15 lo 4 0 ) . T h e Fig. 4 illustrates that Ihe detection time of P N S A is reduced very impressively. For e x a m p l e , when r vary from 20 to 34, its rcdiiciion is4.46 times lower than detection time ot N S A .
C O N C L U S I O N S
Our efficicniapproaeh can reduce time c o m p l e x i u of the detection phase and memory needed for the detector. This provides AISs with Ihe ability to cope wilh harmful intrusions more quickly. In the future, we plan lo rcpoil more detail experimental data about the algorithm on virus, spam [7] [10] and standard database of network attacks, i.e K D D C U P " 9 9 data set.
A C K N O W L E D G M E N T
This work was funded by the Vietnam's National Foundation for Science and Technology D e v e l o p m e n t ( N A F O S T E D ) via a research grant for fundamenlal sciences, grant number: I 0 2 . 0 I - 2 0 I 0 . 0 9 , by the Thai Nguyen University for university's research, code number D H 2 0 1 1 - 0 4 - 2 6 , and by Ha Noi University's research. W e would like to thank the M a n a g e m e n t Boards of these projects.
R E F E R E N C E S
[ 1 ] Fernando Esponda, Fernando Esponda, Stephanie Forrest, and Paul Helman, A Formal Framework for Positive and Negative Detection Schemes, IEEE transactions on systems, man, and cybernetics. 34 (2004). 357-372.
[2] Forrest et al, Self Nonself Discrimination in a Computer, Proceedings of IEEE Symposium on Research in Security and Privacy. Oakland, CA.
1994.
[3] Forrest, S., Hofmeyr, .3. and Somayaji, A., Computer Immunology, Communications of the ,4CA'/, 40(1997), 8 8 - 9 6 .
[4] Fuyong Zhang, Deyu Qi, A Positive Selection Algorithm for C lass i feat ion, dournal of Computational Informalion Systems, 8 (2012), 207 - 2 1 5 .
(5] Jamie Twycross et al.. Detecting Anomalous Process Behavior usingSecond Generation Artificial Immune Systems, dournal of Unconventional Computing, I (2010), 1 - 26.
[6] L. N de Castro and J. Timmis, Artificial Immune Systems A New Computational Intlhgence Approach, Springer-Veriag, 2002.
[7] M.EIberfeld. J.Textor, Efficient algorithms for string-based negative selection. Proceeding of the 8''' International Conferenceon Artificial Immune Systems, York. UK. 2009.
[8] Nguyen Van Truong, Vu Due Quang. Trinh Van Ha, A fast r-chunk deleclor-based negative selection algorilhm.Joumal of Science and Technology, Thai Nguyen University. 2(90).
2012.55-58.
[9] R. Murugesan et al., A Fast Algorithm for Solving JSSP, European dournal of Scientific /e^i'twc/;, 64 (2011), 579-586.
[10] Somayaji, A. and Forrest, S., Automated Response Using System-Call Delays, Proceedings of the 9''' L'SENIX Security Symposium. Berkeley, CA. 2000.
[ I I ] Slawomir T W ierzchoii. Generating optimal repertoire of antibody strings in an artificial immune s\stem, AcNanced tn Soft Computing, Springer-Verlag Company. 2000
[12] T Pourhabibi and R. Azmi, Anomaly Based IDS Using Variable Size Detector Generation in AIS: A Hybrid Approach, Inlernational Journal of MoLhine Learning and Computing, 2 (2012), 200- 203
[13] T Stibor et a l , An investigation of r-chunk detector generation on higher alphabets, Proceedings of Genelic and Evolutionary Compulation Conference, Seattle, WA. USA.
2004.
[14] Zhou Jl et al.. Revisiting Negative Selection Algorithms. Evolutionary Computation. 15(2007),
Nguygn v a n Truimg vd Dig T^p chi KHOA HQC & CONG NGHE
T O M T A T
K E T H g f p C H O N L O C D U ' O N G T I N H V A C H O N L O C A M T I N H T R O N G H E M I E N D I C H N H A N T A O
Nguyen V3n T r u d n g ' ' , Vu Thj Nguyet T h u ' , T r m h VSn H^^
'Tnrdng Dgi IIQC SU pham - Dll 'thai \gu\en
^Trudng DQI hpc Cong nghe thong lin vd Truyen thong - DH Thai \guyen He miln dich nhan tao \k mgt ITnh v\rc nghien ciiu phong phu ket hgp cac nguyen ly mien djch hoc V(i tinh loan. Thuat loan chpn lpc am tfnh v^ thuat toan chon loc duong tinh la hai mo hinh tinh to^n ndi tieng ciia h^ miln dich nhan t?o dugc thiet ke cho phat hien bat thuimg. Chung bao gom hai giai doan: sinh ra mgt t^p D cic bg d6 tir mpt tap te b^o S cho truoc, liep do sir dung nhung bp do n^y Ah phdt hi?n mpt te b^o cho trudc la self hay non-self Chimg toi de xuat mot cai tien thuat ciia loan chgn lpc am tinh dua trSn bo d6 loai r-chunk, bJng cdch ket hgp chpn lpc duang tinh va chpn Ipc am tinh de ldm gidm do phirc tap Ihtri gian va do phirc tap bp nho.
TCi' kh6a: He mien dich nhdn lao, ihudl lodn chon loc dm tinh, thudt todn chpn lpc duang linh. an ninh mdy llnh, bp do r-chunk.
Ngdy nhdn bar 10/4/2013; Ngdy phan bien: 22/4/2013, Ngdy duyet ddng. 26/7/2013
Tel: 0915016063, Emad- nvlruongtn@gmad com