Accepted Manuscript
Distributed attack detection scheme using deep learning approach for Internet of Things
Abebe Abeshu Diro, Naveen Chilamkurti
PII: S0167-739X(17)30848-8
DOI: http://dx.doi.org/10.1016/j.future.2017.08.043 Reference: FUTURE 3638
To appear in: Future Generation Computer Systems Received date : 1 May 2017
Revised date : 12 July 2017 Accepted date : 23 August 2017
Please cite this article as: A.A. Diro, N. Chilamkurti, Distributed attack detection scheme using deep learning approach for Internet of Things,Future Generation Computer Systems(2017), http://dx.doi.org/10.1016/j.future.2017.08.043
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form.
Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Abebe Abeshu Diro, Naveen Chilamkurti
Abebe Abeshu, Department of Computer Science and IT, La Trobe University, Melbourne, Australia. (email: a.diro@latrobe.edu.au)
Naveen Chilamkurti, Department of Computer Science and IT, La Trobe University, Melbourne, Australia. (email: n.chilamkurti@latrobe.edu.au)
Distributed Attack Detection Scheme using Deep Learning Approach for
Internet of Things
Abstract. Cybersecurity continues to be a serious issue for any sector in the cyberspace as the number of security breaches is increasing from time to time. It is known that thousands of zero- day attacks are continuously emerging because of the addition of various protocols mainly from Internet of Things (IoT). Most of these attacks are small variants of previously known cyber- attacks. This indicates that even advanced mechanisms such as traditional machine learning systems face difficulty of detecting these small mutants of attacks over time. On the other hand, the success of deep learning (DL) in various big data fields has drawn several interests in cybersecurity fields. The application of DL has been practical because of the improvement in CPU and neural network algorithms aspects. The use of DL for attack detection in the cyberspace could be a resilient mechanism to small mutations or novel attacks because of its high-level feature extraction capability. The self-taught and compression capabilities of deep learning architectures are key mechanisms for hidden pattern discovery from the training data so that attacks are discriminated from benign traffic. This research is aimed at adopting a new approach, deep learning, to cybersecurity to enable the detection of attacks in social internet of things. The performance of the deep model is compared against traditional machine learning approach, and distributed attack detection is evaluated against the centralized detection system. The experiments have shown that our distributed attack detection system is superior to centralized detection systems using deep learning model. It has also been demonstrated that the deep model is more effective in attack detection than its shallow counter parts.
Keywords: Cyber Security, Deep Learning, Internet of things, Fog Networks, Smart Cities
1. INTRODUCTION
As an emerging technology breakthroughs, IoT has enabled the collection, processing and communication of data in smart applications [1]. These novel features have attracted city designers and health professionals as IoT is gaining a massive application in the edge of networks for real time applications such as eHealth and smart cities [2]. However, the growth in the number, and sophistication of unknown cyber-attacks have cast a shadow on the adoption of these smart services. This emanates from the fact that the distribution and heterogeneity of IoT applications/services make the security of IoT complex and challenging [1],[3]. In addition, attack detections in IoT is radically different from the existing mechanisms because of the special service requirements of IoT which cannot be satisfied by the centralized cloud: low latency, resource limitations, distribution, scalability and mobility, to mention a few [4]. This means that neither cloud nor standalone attack detection solutions solve the security problems of IoT.
Because of this, a currently emerged novel distributed intelligence, known as fog computing, should be investigated for bridging the gap. Fog computing is the extension of cloud
computing towards the network edge to enable cloud-things service continuum. It is based on the principle that data processing and communication should be served closer to the data sources [5]. The principle helps in alleviating the problem of resource scarcity in IoT as costly storage, computation and control, and networking might be offloaded to nearby fog nodes. This in turn increases the effectiveness and efficiency of smart applications. Like any services, security mechanisms in IoT could be implemented and deployed at fog layer level, having fog nodes as a proxy, to offload expensive storage and computations from IoT devices. Thus, fog nodes provide a unique opportunity for IoT in deploying distributed and collaborative security mechanisms.
Though fog computing architecture can offer the necessary service requirements and distributed resources, robust security mechanisms are also needed resources to protect IoT devices.
As preventive security schemes are always with the shortcomings design and implementation flaws, detective mechanisms such as attack detection are inevitable [6]. Attack detections can be either signature based or anomaly based schemes. The signature based solution matches the incoming traffic against the already known attack types in the database while anomaly based scheme caters for attack detection as a behavioral deviation from normal traffic. The former approach has been used widely because of its high accuracy of detection and low false alarm rate, but criticized for its incapability to capture novel attacks. Anomaly detection, on the other hand, detects new attacks though it lacks high accuracy. In both approaches, classical machine learning has been used extensively [7]. With the ever increasing in the attacker’s power and resources, traditional machine learning algorithms are incapable of detecting complex cyber breaches. Most of these attacks are the small variants of previously known cyber- attacks (around 99% mutations). It is evident that even the so called novel attacks (1%) depend on the previous logics and concepts [8]. This means that traditional machine learning systems fail to recognize this small mutation as it cannot extract abstract features to distinguish novel attacks or mutants from benign. The success of deep learning in big data areas can be adopted to combat cyber threats because mutations of attacks are like small changes in, for instance, image pixels. It means that deep learning in security learns the true face (attack or legitimate) of cyber data on even small variations or changes, indicating the resiliency of deep learning to small changes in network data by creating high level invariant representations of the training data. Though the application of DL has been mainly confined to big data areas, the recent results obtained on traffic classification, and intrusion detection systems in [9],[10],[11] indicate that it could have a novel application in identification of cyber security attacks.
Deep learning (DL) has been the breakthroughs of artificial intelligence tasks in the fields of image processing, pattern recognition and computer vision. Deep networks have obtained a momentum of unprecedented improvement in accuracy of classification and predictions in these complex tasks. Deep learning is inspired by the human brain’s ability to learn from experience instinctively. Like our brain’s capability of processing raw data derived from our neuron inputs and learning the high-level features on its own, deep learning enables raw data to be fed into deep neural network, which learns to classify the instances on which it has been trained [12],[13]. DL has been improved over classical machine learning usually due to the current development in both hardware resources such as GPU, and powerful algorithms like deep neural networks. The massive generation of training data has also a tremendous contribution for the current success of deep learning as it has been witnessed in giant companies such as Google and Facebook [14],[15]. The main benefit of deep learning is the absence of manual feature engineering, unsupervised pre-training and compression capabilities which enable the application of deep learning feasible even in resource constraint networks [16]. It means that the capability of DL to self-learning results in higher accuracy and faster processing. This research is aimed at adopting a novel distributed attack detection using deep learning to enable the detection of existing or novel attacks in IoT.
The contributions of our research area:
To design and implement deep learning based distributed attack detection mechanism, which reflects the underlying distribution features of IoT
To demonstrate the effectiveness of deep learning in attack detection systems in comparison to traditional machine learning in distributed IoT applications
To compare the performance of parallel and distributed network attack detection scheme using parameters sharing with a centralized approach without parameters sharing in IoT.
2. RELATED WORK
Though research works in the application of deep learning have currently flourished in domains like pattern recognition, image processing and text processing, there are a few promising researches works around cybersecurity using deep learning approach.
One of the applications of deep learning in cybersecurity is the work of [9] on NSL-KDD dataset. This work has used self- taught deep learning scheme in which unsupervised feature
learning has been employed on training data using sparse-auto encoder. The learnt features were applied to the labelled test dataset for classification into attack and normal. The authors used n-fold cross-validation technique for performance evaluation, and the obtained result seems reasonable. This research work is like ours in terms of feature learning though it considers centralized system while our approach is distributed and parallel detection system used for fog-to-things computing. The other relevant work is the application of autoencoders for anomaly detection in [17], in which normal network profile has been learned by autoencoders through nonlinear feature reduction. In their study, the authors demonstrated that normal records in the test dataset have small reconstruction error while it produced a large reconstruction error for anomalous records in the same dataset. Intrusion detection using a deep learning approach has also been applied in vehicular security in [10]. The research has demonstrated that deep belief networks (DBN) based unsupervised pre- training could enhance the intrusion detection accuracy.
Although the work is novel in its approach, the artificial data used, and its centralized approach might limit its practicality in fog networks. Another research of this category has been conducted by [18]. The authors have proposed IDS which adaptively detect anomalies using AutoEncoders on artificial data. Both anomaly detection papers considered artificial data cases which do not reflect the malicious and normal behaviors of real time networks. Apart from that, they adopted a centralized approach which is impractical for distributed applications such as social internet of things in smart city networks.
Deep learning approach has also been applied by [11] for malicious code detection by using AutoEncoders for feature extraction and Deep Belief Networks (DBN) as a classifier for detection. The article has shown that the hybrid mechanism is more accurate and efficient in time than using a sing DBN. It strengthens that deep networks are better than shallow ones in cyber-attack detection. The main limitation of this research is that the dataset should have been more recent to come up with the conclusion. Nevertheless, the research is significantly different from ours since it does not handle the distributed training and sharing of updated parameters. The other paper which has investigated deep learning scheme for malicious code detection is [19]. It has applied denoising AutoEncoder for deeper features learning to identify malicious javascript code from normal code. The result has produced promising accuracy in the best-case scenario. Although the approach is effective in web applications, it can be hardly applied to distributed IoT/Fog systems. Our model is novel as it enables parallel training and parameters sharing by local fog nodes, and detects network attacks in distributed fog-to-things networks using deep learning approach.
3.
Th m Sm af th ef so inn ap pl co Ho co cit ca
Fi Th In Io Do wi sm of su se se ho ca att no (D
CYBER SEC
he advanceme massive number
mart city app ffected areas o is technologic ffectively infra o on. Typicall
novative, sm pproach to pub
anning. In omponents in
owever, this ollection and d
ties could bri ause the loss o
ig.1: Social Int hough the att nternet, the sca oT with limite oS attacks ar ith IoT/Fog n mart cities. Th
f smart things uch as smart c ervices via IoT ervices. The i
owever, makes an exhaust the tack causes t odes by a sing DDoS attack) [
CURITY IN SOC
ents in technol r of IoT devic plications are of public servi
cal breakthrou astructures suc ly, the integra mart city desi blic service d general, the
the smart c massive conn distribution pla ing about nov f multi-million
ternet of Thin acks in IoT s ale and simplic d protection.
e the most fr networks in s he IoT ecosyst s distributed a city. Millions T, taking adva interconnectiv s a fertile targ eir resources the denial of gle host (DoS [21],[22],[23].
CIAL IOT logies of hardw
es to be conne by far the q ces by social ugh is helpin ch as water, po ation of socia
ign is to cr delivery, infras
e social IoT city are depic nection of IoT atform in the vel or variant n dollars and h
gs: Componen seems the sam city of attack t
As the recen requently asso social internet tem consists o across a given of users are antage of it fo vity of these get for malicio
and launch D f service for S attack) or co
.
ware have ena ected to the In quickest and d
internet of thi ng cities to m
ower, transpor al IoTs and IC reate a data-
structure and T application cted in the F T devices as
emergence of t attacks whic human life [20
nts of smart ci me as in trad targets are larg nt survey [7] s
ociated attack t of things su of a massive n n geographica connected to r private and numbers of t ous adversarie DoS attacks. A
legitimate us oordinated att
abled a nternet.
deeply ings as manage rt, and CT for driven public ns in Fig. 1.
a data f smart ch can 0].
ity ditional
ger for shows, k types
uch as number
l area, social public things, es who A DoS sers or tackers
How maj The netw algo ben ove show Tab Cate Prob
R2L
U2R
DoS
wever, remote or threats as ese attacks lea
work traffic, orithms will b
ign. For inst rwhelming a ws the commo ble 1: Attack c
egory Attac be S
k
Ip th
P s
N
L W
u
g
w (w p
im u
ft F
M b
p c m
s to
R b
c s
r
lo I
p s
S s
N
B b
T u
P r
L s
e access attac s they can es ave a certain which mig be able to dist tance, DoS a single node f on attack categ
ategories of F ck Type and D Satan- probing known weaknes psweep- pingi he target’s IP Portsweep- sca services on a ho Nmap- various Warezclient- d uploaded previo guess_passwd- warezmaster-
warez) on FTP permissions
map- illegal a using vulnarabi ftp_write- creat FTP to obtain lo Multihop- mul breaks into a sy phf- CGI script commands o misconfigured w spy- breaking i
o discover imp buffer_overflow command caus shell
rootkit- enables oadmodule- ga FS
perl- creating r sets the user id smurf- flooding Neptune- flood Back- requesti backslashes fro Teardrop- caus using mis-fragm Pod- pinging w reboot or crash Land- sending source and dest
cks using ba scalate to roo
patterns and ght affect th
tinguish betw attacks are u from multiple gories [24].
Fog ecosystem Description
of a network sses
ng of multiple anning of po ost
means of netw downloading i ously by the w
guessing passw uploading il P server exploit access of loca ilities
ting .rhost fil ocal login.
lti-day scenari ystem
t enabling to e on a mach
web server.
into system vi portant informa w- the ffbconfi ses buffer flow s to access adm
aining root sh root shell by p
to root g of ICMP echo
ing of SYN on ing of a URL om a webserver
sing system r mented UDP pa with malformed
UDP packet h tination addres
ckdoors could ot access atta characteristic he way lear ween an attack usually known
e sources. Tab m
k for some wel e hosts to reve orts to discov work mapping
illegal softwa arezmaster word over telne llegal softwa ting wrong wri al user accou e in anonymou o where a us execute arbitra hine with ia vulnerabiliti ation
ig UNIX syste w leads to ro min level
hell by resettin erl attack whic o reply n port(s)
L having man r
reboot or cra ackets d packets causin
having the sam s to remote hos
d be acks.
cs in rning k and n by ble 1
ll- eal ver
are et are ite unt us ser ary a es em oot
ng ch
ny sh ng me st
4.
De sta sc hi bu cla pr ea ac lay fu tra us Th fe 1, re er ter fit J(W
∑ W of Ta
Th a s b us ar sta us gr (b (sa
OVERVIEW
eep Learning ability and calability on b erarchical fea uild a model assification).
revalent forms ach previous l ctivation funct yer n of a neu unctions are li
aining data { sually set outp he cost functio ature extractio loss functio construction e rror terms for
rm is a regula tting problem
W,b)
k
k=0
∑ ∑ ∑
Where nl repres f nodes in each able 2: Activat
he mechanism stochastic grad
| J) is a sta sing constant α re obtained by andard gradien sing sample i radient descen batch) is not e ampling subse
W OF DEEP LEA
g has been th generalizatio big data. It atures of train l which tran
Multi-layer s of deep lea layer and a b tion f to form
ural network, isted in the ta {x(1), x(2),x(3), put values to b ons to be optim on are general on is shown error specified k instances arization term
in training.
sents the numb h layer
tion functions
m of minimizin dient descent andard gradien
α as a learning averaging. E nt descent on i until the co nt of paramete
efficient, com et) simplifies t
≔ ≔
ARNING
he state of t on, and ac extracts comp ning data of nsforms input
deep netwo arning algorith
ias are compu weighted inp i.e an=f(Wnak
able 2. Given
…}, deep l be either equal mized in deep lly loss functi in which the d using the me
of training da which is used
ber of layers a
s
ng the loss fun (SGD, where nt computed v g rate. The fin Equation (2) sh updates of w onvergence is ers over the w mputing gradie the learning pr
α∂L W
∂ α∂L W, b
∂
the art for tr chieved sign plex and non
high dimens ts to outputs rks are the hms. The out uted by a non puts Wn for th
k-1) [25]. Acti n a set of unl learning algo l or less than i models during ons. In the eq e first term an of sum-of- ata, and the s d for avoiding
1 and sl is the n
nction L(W, B the gradient via backpropa nal parameters hows the iterat weight W and s obtained. A whole availabl
ent over mini rocess [15].
W, b|j b|j
raining nificant nlinear ion to s (e.g.
most tput of nlinear he next ivation abeled rithms inputs.
g deep quation is the square second g over-
number
B | j) is L(W, agation
s W, B tion of bias b As the
le data i-batch
2 In t feat acti abst chal lear the thro prob set to d non repe opti For end clas assu p=1 vect the
hө
whe and dist
5.
The atta netw supp nod and atta opti of d para para resu
training proc tures increase vation functi traction natur llenging to en rn training par
deep network ough gradient
blem. Gradien of deep netwo decrease the nlinear functio
etition yields imum.
a known num d of neural n
sses in the da umes that the 1,…,P could b
tor summed t following form
x ⋮
ere θ(1),θ(2),…
d the term tribution. The
ө
OUR APPROA
e fog nodes ar ck detection work since th
ported by soci de should be i d optimization.
ck detection imization, the data training ameters from ameters of e ulting update
ess, as the l es towards th ion and we re at every la
able the deep rameters that k. The paramet descent, whi nt descent is ork parameter gradient by c on being op
the optimizat
mber of class networks as ataset and de e probability t be estimated to 1
.
In otherm [26]
:
1 ;
⋮2 ;
| ; ∑
…,θ(K)∈Rn a
∑
cost function 1
ACH
e responsible systems at th hey are close ial internet of in place for c . In addition t n using loc
benefits of th near to the s m neighbors.
ach cooperati back to the
layer increase he answers of eight matrice
ayer of the n learning mode meet the accu ters of training ich is a nonli initiated rand rs, but it is up computing gr ptimized. The tion of the alg
es, softmax is activation fun eep network i
that P(y=p|x) as it outputs words, our e
1
1exp are the parame
is of softmax wi
exp
∑ e
for training m he edge of th er to the sm
things. The co collaborative to giving the a cal training his approach ar
ource and the The master n ive node, an
worker node
es, abstraction f the model.
es determine network, but
el to automatic uracy objectiv g are usually le inear optimiza domly by setti pdated at each
radient descen e output of gorithm to a l
s employed a nction. Havin inputs x, soft for each valu s a K-dimensi estimate hө(x)
exp 1 exp 2
⋮ exp eters of our mo
normalization ill be given by p
exp
models and ho he distributed art infrastruct oordinating m
parameter sha autonomy of l and param re the accelera e gain of upd node updates nd propagates es. This pla
n of The
the it is cally ve of earnt ation ing a
step nt of
this local
at the ng p ftmax ue of
ional take
odel, n of y:
sting d fog
tures master aring local meter ation dated s the s the ays a
sig ov pr op ne ar sy
Fi th
Th att pa fo sc pa
KD co W fo in in rep tab [1
gnificant role verheads of m rovides fast r ptimization a etworks by d rchitecture of ystem in fog-to
ig.2: Distribute ings networks
he outputs of tack detection arameters. The og node for gl cheme results arameters and
6.1. DAT DDCUP99 [2 ommonly used We used NSL-K ormat for mod
trusion datase trusions, but producible. T ble 1, and id -5]. Table 3 sh
e in offloadi models, data an response time approach cou distributing SG
our distribut o-things comp
ed attack detec s
f model traini n models and
ese local para lobal update a in better lear avoids local o 6. EV
TASET,ALGOR
7], ISCX [28]
d datasets in KDD intrusion del validation et not only ref are also it The dataset co dentified as a k
hows sample r
ing storage nd parameters e. The centra uld be exten
GD. Figure 2 ted and parall puting.
ction architect
ing on distrib d their associ ameters are s and re-propag rning as it en overfitting.
VALUATION
RITHM AND ME
and NSL-KD the intrusion n dataset whic
and evaluatio flects the traffi is modifiab omposes of th
key attack in records of NS
and computa s from IoTs w alized trainin ded to distr 2 shows a g lel attack det
ture for Fog-to
buted fog nod iated local le sent to coordi gation. This s nables to shar
ETRICS
DD [24] are the detection res ch is available ons. The NSL fic composition
le, extensible he attacks sho IoT/Fog com L-KDD datase
ational while it ng and
ributed general tection
o-
des are arning inating haring re best
e most search.
in csv L-KDD ns and e, and own in mputing
et.
Tab 0 0 0 0 0 151 0 315 240 0 0 560 The 22,5 prot traff tabl Tab Traf Nor Atta Tot Tab Traf Nor DoS Prob R2L U2R Tot
Bef enco tech feat exp atta min R2L Tab
ble 3: snapshot tcp udp icmp tcp tcp 59 tcp
tcp udp 0 tcp
tcp tcp 07 udp e original data
544 records of tocol, service, ffic distributio les 4(a) and 4(
ble 4 (a): traffi ffic
rmal ack tal
ble 4 (b): traffi ffic
rmal S
be L R tal
fore training t oded into d hnique. Becau tures and 1
eriment, we h ck) and 4-cla nority class U
L.U2R.
ble 5: the enco
t of records in ftp_data other ecr_i http http ftp private other http private private other aset consists o
f test, each wi flag, source b on of NSL-KD
(b).
c distribution Training 67343 58630 125973 ic distribution
Training 67343 45927 11656 995 52 125973
the network, discrete feat use of encod
label, as sho have taken th ass (normal, U2R is merge
ded form of o
NSL-KDD da SF 491 SF 146 SF 1480 SF 232 SF 199 SF 350 S0 0 SF 146 SF 328 S0 0 REJ 0 SF 147 of 125,973 rec ith 41 features bytes, destinat DD dataset is
of NSL-KDD Te 97 128 225 of NSL-KDD
Te 97 745 275 242 200 225
categorical fe tures using ding, we ob own in the e dataset in 2
DoS, Probe, ed to R2L to
ur dataset
ataset
0 …
0 …
0 0 …
8153 …
420 …
1185 …
0 …
105 …
275 …
0 …
0 …
105 …
cords of train s such as dura tion bytes, etc.
s shown as in
D in 2-class st
11 833 544
D in multi-clas st
11 58 54 21 0 544
eatures have 1-to-n enco tained 123 i table 5. For 2-class (norma
R2L.U2R).
o form a clas
…
…
…
…
…
…
…
…
…
…
…
… n and ation, . The n the
s
been oding input our al vs The ss of
Th tra tes ec ea ite pa wi th di up no
Al
I us as em m ra co pe ha de de m tru
he system use aining and test
sting. Suppos cosystem, and ach node havin eration. Each arallel way, bu ith the coordin e update re stributed train pdated parame ode.
lgorithm 1: lo 1. Recieve workers 2. For nod
3. Compu 4. Repeat
In the training sed for dist synchronously mployed for d metrics for atta ate (DR) and omparison bet erformance me as been added eep learning m etected by the misclassified no ue detection o
s the same tec t. At this step, se Dn are da
Wn,bn are pa ng local data node runs da ut asynchronou
nating node. T egularly. Th ning on each eters with neig
cal training an e initial or upd
s
de n in the netw Get local tr local Dn
Execute SGD Wn, biases bj
≔
≔ Compute ute , =
(1)
and testing ph tributed and y, while Kera
deep learning.
ack detection d false alarm tween deep etrics such as
a comparison model DR den
e model, whi ormal instance over total data
chnique of pre , the data is re ata across n arameters thei
Dna subset of ata training on usly exchange The coordinati he following
local fog nod ghbor nodes v
nd parameter e date of work, do in paral
raining traffic s D on local traf
ji ∈ bn
α∂L W, b|j
∂ α∂L W, b|j
∂ and and s
,
hase, Apache parallel pro as on Theano . The most im such as accu m rate (FAR) and shallow
precision, rec n between indiv
notes ratio of ile FAR repr es. Accuracy a instances. R
eprocessing fo eady for trainin
nodes in th r local param f Dn as sampl n deep netwo e learned param
ing node broa g algorithm de while excha
via coordinatin
exchange from ma llel:
sample i ∈ Dn ffic, and update j 1 j 2 send to master n
Spark [29] ha ocessing of
package [30 mportant eval uracy, the det ) were chose models. How call and F1 M
vidual classes f intrusion ins resents the ra is the percent Recall indicate
or both ng and he fog meters ,
les per orks in meters adcasts shows anging ng fog
aster to
na from e wji∈
node
as been SGD 0] was luation tection en for
wever, Measure s in the stances atio of tage of es how
man repr Mea [31]
deri
Actu L Actu
whe FN:
As mod (nor mea atta one with con nod for para sche the is t shal deep used laye laye has to a
ny of the atta resents how m asure provides ]. The mathem ived from con
tual:NORMA tual:ATTACK
ere, TP: true p : false negativ
6.2. EXPER a first attemp del, we used rmal, DoS, P asure, unseen
ck detections.
is to compare h a centrali
ducted by dep de for centrali
distributed at allelism and d eme, we varie network as a f to evaluate t llow learning p learning sys d 123 input f er neurons, 5 er with neuron
various batch avoid the overh
acks does the many of the r s the harmonic matical repres
fusion matrix Predicted:N L
TN K FN
,
, 1
positive, TN: t e
RIMENTAL ENV
pt towards exp the 2-class (n robe, R2L.U2 test data are Our experime e the result of ized system.
ploying the de zed system, a ttack detectio distribution, as ed the number function of tra the effectiven algorithms fo stem, after hyp features, 150 f 0 third layer ns equal to the h sizes in 50 ep
heating proble
model return returned attac c average of p sentation of th
as:
NORMA P
K F T
2 2
true negative,
VIRONMENT
ploring the pe normal and a 2R) categories e chosen to r
ent has two ob f our distribute
This exper eep learning m and multiple c on. To test th
s a benchmark r of machines aining accurac ness of deep
or attack dete per-parameter first layer neu
neurons and e number of cl pochs, and tra em.
n, while preci ks are correct recision and r ese metrics ca
Predicted:ATT K
FP TP ,
,
FP: false posi
erformance of attack) and 4- s. In perform represent zero bjectives. The ed attack detec riment has
model on a si coordinated n he performanc k for our detec s used for trai cy. The second learning ag ection in IoT.
optimizations urons, 120 se d the last soft
lasses. The m ained with dro
ision t. F1 recall an be
TAC
1
2
itive,
f our class mance o-day e first ction been ingle nodes ce of ction ining d one gainst The s, has cond ftmax model opout
6.3. RESULTS AND DISCUSSIONS
In the evaluation process, classification accuracy and other metrics were used to show the effectiveness of our scheme compared to shallow models in distributed IoT at fog level.
The comparison of distributed training to centralized approach in accuracy is also one of our evaluation criteria. Table 5 compares the accuracy of the deep and shallow models, while fig. 3 shows the accuracy difference between centralization and distribution.
Fig.3: Accuracy comparison of distributed and centralized models
Table 6: Accuracy of deep model (DM) and shallow model (SM)
2-class 4-class
Model Type
Accuracy (%)
DR (%)
FAR (%)
Accuracy (%)
DR (%)
FAR (%)
DM 99.20 99.27 0.85 98.27 96.5 2.57%
SM 95.22 97.50 6.57 96.75 93.66 4.97
Fig.4: comparison between DL and SL in training time
Fig.5: comparison between DL and SL in detection time Table 7 (a): Performance of 2-class
Model Type Class Precision (%) Recall (%) F1 Measure (%) Deep Model Normal 99.36 99.15 99.26
Attack 99.02 99.27 99.14 Shallow
Model
Normal 97.95 93.43 95.65 Attack 92.1 97.50 94.72 Table 7 (b): Performances of 4-class
Model Type
Class Precision (%)
Recall (%)
F1 Measure (%)
Deep Model
Normal 99.52 97.43 98.47
DoS 97 99.5 98.22
Probe 98.56 99 98.78
R2L.U2R 71 91 80
Shallow Model
Normal 99.35 95 97
DoS 96.55 99 97.77
Probe 87.44 99.48 93
R2L.U2R 42 82.49 55.55 90
92 94 96 98 100 102
0 10 20 30 40 50 60
ACCURACY
NUMBER OF FOG NODES A C C U R A C Y O F D E E P V S S H A L L O W
L E A R N I N G
Distributed Model Centralized Model
0 20 40 60 80 100 120 140
0 10 20 30 40
training time (ms)
Number of fog nodes DL run time (ms) SL run time ( ms)
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009
0 10 20 30 40
DETECTION TIME (MS)
NUMBER OF FOG NODES DL run time (ms) SL run time ( ms)
The experiment result has demonstrated double standards. The first one is that the distributed model has a better performance than the centralized model. As it can be seen from fig. 3, with the number of increased nodes in the distributed network of Fog systems, the overall accuracy of detection increased from around 96% to over 99%. The detection rate in table 6 also exhibits that deep learning is better than classic machine learning for both binary and multi-classes. This shows that distributing attack detection functions across worker fog nodes is a key mechanism for attack detection in social IoT systems such as a smart city which needs real time detection. The increase in accuracy on distributed scheme could be because of collaborative sharing of learning parameters which avoids overfitting of local parameters, and hence, contributes to the accuracies of each other. On the other hand, the accuracy of the deep model is greater than that of shallow model, as shown in the table 6. In addition, table 6 shows the false alarm rate of the deep model, 0.85% is much less than that of machine learning model (6.57%). As shown in table 7 (a) and (b), the performance of deep learning is better than the normal machine learning model for each class of attack. For instance, the recall of deep model is 99.27%, while the traditional model has a recall of 97.50% for a binary classification.
Similarly, the average recall of DM is 96.5% whereas SM has scored average recall of 93.66% in multi-classification.
However, fig.4 shows that deep learning takes longer learning time than traditional machine learning algorithms while the detection rates (fig.5) of the both algorithms are significantly the same. It is expected that deep networks consume larger time in training because of the size of parameters used in learning. The main issue for attack detection systems focuses more on the detection speed than the learning speed. Thus, this indicates that deep learning has a huge potential to transform the direction of cybersecurity as attack detection in distributed environments such as IoT/Fog systems has indicated a promising result.
7. CONCLUSION AND FUTURE WORK
We proposed a distributed deep learning based IoT/Fog network attack detection system. The experiment has shown the successful adoption of artificial intelligence to cybersecurity, and designed and implemented the system for attack detection in distributed architecture of IoT applications such as smart cities. The evaluation process has employed accuracy, the detection rate, false alarm rate, etc as performance metrics to show the effectiveness of deep models over shallow models. The experiment has demonstrated that distributed attack detection can better detect cyber-attacks than centralized algorithms because of the sharing of parameters which can avoid local minima in training. It has also been
demonstrated that our deep model has excelled the traditional machine learning systems such as softmax for the network data classification into normal/attack when evaluated on already unseen test data. In the future, we will compare distributed deep learning IDS for on another dataset and different traditional machine learning algorithms such as SVM, decision trees and other neural networks. Additionally, network payload data, will be investigated to detect intrusion as it might provide a crucial pattern for differentiation.
ACKNOWLEDGEMENTS
This work was supported by La Trobe University’s research enhancement scheme fund.
REFERENCES
[1.] Securing the Internet of Things: A Proposed Framework, http://www.cisco.com/c/en/us/about/security-center/secure-iot- proposed-framework.html , 2016
[2.] M.Ibrahim, “Octopus: An Edge-Fog Mutual Authentication Scheme”, Journal of Network Security, 18(6), 2016
[3.] I.Stojemovic and S.Wen, “ The fog computing paradigm:
Scenarios and security issues”, IEEE federated conference on Computer Science and Information Systems,2014
[4.] A. Alrawais, A. Alhothaily, C. Hu and X. Cheng, "Fog Computing for the Internet of Things: Security and Privacy Issues," in IEEE Internet Computing, vol. 21, no. 2, pp. 34-42, Mar.-Apr. 2017.
[5.] S. Yi, Z. Qin and Q. Li, “Security and privacy issues of fog computing: A survey”, International Conference on Wireless Algorithms, Systems and Applications (WASA), 2015
[6.] V. L. L. Thing, "IEEE 802.11 Network Anomaly Detection and Attack Classification: A Deep Learning Approach," 2017 IEEE Wireless Communications and Networking Conference (WCNC), San Francisco, CA, 2017, pp. 1-6
[7.] C. Kolias, G. Kambourakis, A. Stavrou, and S. Gritzalis,
“Intrusion detection in 802.11 networks: Empirical evaluation of threats and a public dataset,” in IEEE Communications Surveys and Tutorials, Vol. 18, No. 1, pp. 184-208, 2016.
[8.] Guy Caspi, Introducing Deep Learning: Boosting Cybersecurity With An Artificial Brain, http://www.darkreading.com/analytics/introducing-deep- learning-boosting-cybersecurity-with-an-artificial- brain/a/d-id/1326824, last accessed on July 1, 2017.
[9.] Quamar Niyaz, Weiqing Sun, Ahmad Y Javaid, and Mansoor Alam, Deep Learning Approach for Network Intrusion Detection System, ACM 9th EAI International Conference on Bio-inspired Information and Communications Technologies, New York,2016
[1
[1
[12
[1
[14
[1
[1
[1
[1
[1
[2
[2
[22
[2
[24
0.] Kang M-J, K Deep Neura ONE 11(6):
1.] Y. Li, R. Ma method base Security and 2.] BENGIO, Y training of d 1, S. 153–16 3.] DENG, LI:
applications Signal and I e2 — ISBN 4.] LECUN, YA
HAFFNER, Document R (1998), Nr. 1 5.] J. Dean, G.S
MZ. Mao, M Ng. Large Sc 6.] VINCENT,
ISABELLE ANTOINE:
Representati Criterion. In (2010), Nr. 3 7.] Mayu Sakur Using Autoe In Proceedin Learning fo Rahman, Jer York, NY DOI=http://d 8.] Chao W AdaptiveAno COGNITIVE Advanced C 2015 9.] Wang, Y., C
detecting Networks, 9 0.] Diro, Abebe
"Lightweigh Cryptograph Networks an 1.] http://www.c journal/back access on 14 2.] Charalampos
Distributed Journal, 7(4) 3.] Costa Gond Nascimento Approach fo of Service on 4.] M. Tavallaee Analysis of IEEE Symp
Kang J-W (201 al Network for
e0155781. doi:
a, and R. Jiao. A ed on deep learn d Its Application YOSHUA; LAM deep networks. I 60 — ISBN 026 A tutorial surv for deep lear nformation Pro 2048-7703 ANN ; BOTTO
PATRICK: G Recognition. In 11, S. 2278–232 S. Corrado, R.
M’A. Ranzato, A cale Distributed PASCAL ; L
; BENGIO, Y Stacked Denoi ions in a Deep n: Journal of M 3, S. 3371–3408 rada and Takeh encoders with N ngs of the MLS or Sensory Da remiah Deng, Y, USA, dx.doi.org/10.11 Wu, Yike
omaliesDetectio E 2015 : The S Cognitive Tech Cai, W., and We malicious Ja : 1520–1534, 20 e Abeshu, Nave ht Cybersecurit hy in Publish- nd Applications cisco.com/c/en/u k-issues/table-co 4 Nov, 2016
s Patrikakis, M Denial of Serv ),2004 dim JJ, de Oliv
A, García Villa or Assessing Am n the Internet of e, E. Bagheri, W
the KDD CUP posium on Com
6) Intrusion De In-Vehicle Net 10.1371/journa A hybrid malicio ning. In Internat ns, volume 9, pa MBLIN, PASCA In: Advances in 62195682 vey of architect
rning. In: APS ocessing Bd. 3 ( OU, LEON ; B radient Based n: Proceedings 24 — ISBN 001 Monga, K. Che A. Senior, P. Tuc d Deep Network LAROCHELLE YOSHUA ; MA
ising Autoencod p Network wit Machine Learn 8 — ISBN 1532 hisa Yairi. 2014 Nonlinear Dime DA 2014 2nd W ata Analysis (M
and Jiuyong L , Pages 145/2689746.26
Guo an onwithDeepNet Seventh Interna hnologies and A ei, P. , A deep avaScript cod
016. doi: 10.10 een Chilamkurt ty Schemes U -Subscribe fog
(2017): 1-11 us/about/press/i ontents-30/dos-a ichalis Masikos vice Attacks - veira Albuquerq
alba LJ, Kim T mplified Reflect f Things, 16(11 W. Lu, and A. G P 99 Data Set,”
mputational Inte
etection System twork Security l.pone.0155781 ous code detecti
tional Journal of ages 205–216, 2 AL: Greedy laye n neural … (200 tures, algorithm SIPA Transactio (2014), Nr. Janu BENGIO, YOS
Learning App of the IEEE B 18-9219
en, M. Devin, Q cker, K. Yang, a ks
, HUGO ; LA ANZAGOL, PIE ders: Learning th a Local Den ning Research B 2-4435
4. Anomaly De ensionality Red Workshop on M MLSDA'14), As Li (Eds.). ACM 4 , 8 689747
nd Yajie twork,
ational Confere Applications, I learning approa de. Security C 002/sec.1441
ti, and Neeraj K Using Elliptic Computing." M internet-protoco attacks.html , la s, and Olga Zou The Internet P que R, Clayton TH., A Methodo ion Distributed ), 2016 Ghorbani, “A D
” Submitted to S elligence for S
m Using y. PLoS
1 on f 2015
er-wise 07), Nr.
ms, and ons on uary, S.
HUA ; plied to Bd. 86 QV. Le, and AY.
AJOIE, ERRE-
Useful noising Bd. 11 etection duction.
Machine shfaqur M, New pages.
Ma, on ence on
IARIA, ach for Comm.
Kumar.
Curve Mobile ol- ast
uraraki, Protocol n Alves ological Denial Detailed Second Security
[25.
[26.
[27.
[28.
[29.
[30.
[31.
rese Thin and
area com secu secu
and Defense A ] SALAKHUTD
Deep Boltzm International C (2009), Nr. 3, ] http://ufldl.stan
, last accessed ] Knowledge
archive, htt ddcup99.htm ] Ali Shiravi, Ghorbani, Tow benchmark d Security, Volu 0167-4048, 10 ] Matei Zaharia
Shenker, and working sets.
Hot topics Association, B ] Keras deep
(2013), https:/
on 30, Nov, 20 ] Tang, TA, Mh (2016) Deep Detection UNSPECIFIE Networks and Oct 2016, Fez Abeb the D Trobe degre Univ Univ Deve arch interests i ngs, Cybersecu
Big Data.
Navee Progra Inform Melbou degree as include int mputing, vehicu
urity, wireless m urity.
Applications (CI DINOV, RUSL mann Machines Conference on S. 448–455 — nford.edu/tutori
on 31, May, 20 discovery tp://kdd.ics.u ml
Hadi Shirav ward developing datasets for in ume 31, Issue 3 0.1016/j.cose.20 a, Mosharaf Cho Ion Stoica. 20 In Proceedings in cloud co Berkeley, CA, U learning, //github.com/cha 016
hamdi, L, McL Learning Ap in Softwar D The Interna d Mobile Comm
, Morocco. IEE be Abeshu Dir Department of e University, A ee in Comput
ersity, Ethiopia ersity from 20 elopment, and L
nclude Softwar urity, Advanced
n Chilamkurt am Coordinat mation Techno urne, VIC, A e from La Trob telligent transp ular communic multimedia, wir
ISDA), 2009.
LAN ; HINTON s. In: Proceed
Artificial Inte ISBN 97814244 ial/supervised/S 017
in data uci.edu/datab
vi, Mahbod g a systematic a ntrusion detecti 3, May 2012, Pa 011.12.012.
owdhury, Micha 10. Spark: clus s of the 2nd USE omputing (HotC USA, 10-10.
P.W.D. Charl arlespwd/projec Lernon, D et al
pproach for re Defined ational Confer munications (W EE . (In Press)
ro is currently IT Computer Australia. He
ter Science fr a in 2010. He 007 to 2013 as Lecturer in Com re Defined Netw d Networking,
ti is currently tor, Compute ology, La Australia. He o
be University. H port systems
cations, Vehic reless sensor ne
N, GEOFFREY dings of The
lligence and S 439911 SoftmaxRegress
abases DA ases/kddcup9 Tavallaee, Ali approach to gen ion, Computer ages 357-374, I ael J. Franklin, ster computing SENIX conferenc loud'10). USE les, Project ct-title, last acce l. (2 more aut
Network Intru Networkin rence on Wir WINCOM'16), 2
a PhD candida Science and IT
received his M from Addis A
worked at Wo s a Director of mputer Science working, Intern Machine Lear
y the Cybersec er Science Trobe Unive obtained his P His current res
(ITS), Smart cular cloud, C etworks, and M
Y E.:
12th tatics sion/
ARPA 99/k i A.
nerate rs &
ISSN Scott with ce on ENIX Title, essed thors) usion ng. In:
reless 26-29
ate in T, La M.Sc.
Ababa ollega f ICT e. His net of rning,
curity and ersity, Ph.D.
earch grid Cyber Mobile
Highlights
Deep learning has been proposed for cyber-attack detection in IoT using fog ecosystem
We demonstrated that distributed attack detection at fog level is more scalable than centralized cloud for IoT applications
It has also been shown that deep models have excelled shallow machine learning models in cyber-attack detection in accuracy.
In the future, other datasets and algorithms as well as network payload data will be investigated for comparisons and further enhancements.