Distributed Attack Detection Scheme using Deep Learning Approach for

(1)

Accepted Manuscript

Distributed attack detection scheme using deep learning approach for Internet of Things

Abebe Abeshu Diro, Naveen Chilamkurti

PII: S0167-739X(17)30848-8

DOI: http://dx.doi.org/10.1016/j.future.2017.08.043 Reference: FUTURE 3638

To appear in: Future Generation Computer Systems Received date : 1 May 2017

Revised date : 12 July 2017 Accepted date : 23 August 2017

Please cite this article as: A.A. Diro, N. Chilamkurti, Distributed attack detection scheme using deep learning approach for Internet of Things,Future Generation Computer Systems(2017), http://dx.doi.org/10.1016/j.future.2017.08.043

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form.

Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

(2)

Abebe Abeshu Diro, Naveen Chilamkurti



Abebe Abeshu, Department of Computer Science and IT, La Trobe University, Melbourne, Australia. (email: a.diro@latrobe.edu.au)

Naveen Chilamkurti, Department of Computer Science and IT, La Trobe University, Melbourne, Australia. (email: n.chilamkurti@latrobe.edu.au)

Distributed Attack Detection Scheme using Deep Learning Approach for

Internet of Things

(3)

Abstract. Cybersecurity continues to be a serious issue for any sector in the cyberspace as the number of security breaches is increasing from time to time. It is known that thousands of zero- day attacks are continuously emerging because of the addition of various protocols mainly from Internet of Things (IoT). Most of these attacks are small variants of previously known cyber- attacks. This indicates that even advanced mechanisms such as traditional machine learning systems face difficulty of detecting these small mutants of attacks over time. On the other hand, the success of deep learning (DL) in various big data fields has drawn several interests in cybersecurity fields. The application of DL has been practical because of the improvement in CPU and neural network algorithms aspects. The use of DL for attack detection in the cyberspace could be a resilient mechanism to small mutations or novel attacks because of its high-level feature extraction capability. The self-taught and compression capabilities of deep learning architectures are key mechanisms for hidden pattern discovery from the training data so that attacks are discriminated from benign traffic. This research is aimed at adopting a new approach, deep learning, to cybersecurity to enable the detection of attacks in social internet of things. The performance of the deep model is compared against traditional machine learning approach, and distributed attack detection is evaluated against the centralized detection system. The experiments have shown that our distributed attack detection system is superior to centralized detection systems using deep learning model. It has also been demonstrated that the deep model is more effective in attack detection than its shallow counter parts.

Keywords: Cyber Security, Deep Learning, Internet of things, Fog Networks, Smart Cities

1. INTRODUCTION

As an emerging technology breakthroughs, IoT has enabled the collection, processing and communication of data in smart applications [1]. These novel features have attracted city designers and health professionals as IoT is gaining a massive application in the edge of networks for real time applications such as eHealth and smart cities [2]. However, the growth in the number, and sophistication of unknown cyber-attacks have cast a shadow on the adoption of these smart services. This emanates from the fact that the distribution and heterogeneity of IoT applications/services make the security of IoT complex and challenging [1],[3]. In addition, attack detections in IoT is radically different from the existing mechanisms because of the special service requirements of IoT which cannot be satisfied by the centralized cloud: low latency, resource limitations, distribution, scalability and mobility, to mention a few [4]. This means that neither cloud nor standalone attack detection solutions solve the security problems of IoT.

Because of this, a currently emerged novel distributed intelligence, known as fog computing, should be investigated for bridging the gap. Fog computing is the extension of cloud

computing towards the network edge to enable cloud-things service continuum. It is based on the principle that data processing and communication should be served closer to the data sources [5]. The principle helps in alleviating the problem of resource scarcity in IoT as costly storage, computation and control, and networking might be offloaded to nearby fog nodes. This in turn increases the effectiveness and efficiency of smart applications. Like any services, security mechanisms in IoT could be implemented and deployed at fog layer level, having fog nodes as a proxy, to offload expensive storage and computations from IoT devices. Thus, fog nodes provide a unique opportunity for IoT in deploying distributed and collaborative security mechanisms.

Though fog computing architecture can offer the necessary service requirements and distributed resources, robust security mechanisms are also needed resources to protect IoT devices.

As preventive security schemes are always with the shortcomings design and implementation flaws, detective mechanisms such as attack detection are inevitable [6]. Attack detections can be either signature based or anomaly based schemes. The signature based solution matches the incoming traffic against the already known attack types in the database while anomaly based scheme caters for attack detection as a behavioral deviation from normal traffic. The former approach has been used widely because of its high accuracy of detection and low false alarm rate, but criticized for its incapability to capture novel attacks. Anomaly detection, on the other hand, detects new attacks though it lacks high accuracy. In both approaches, classical machine learning has been used extensively [7]. With the ever increasing in the attacker’s power and resources, traditional machine learning algorithms are incapable of detecting complex cyber breaches. Most of these attacks are the small variants of previously known cyber- attacks (around 99% mutations). It is evident that even the so called novel attacks (1%) depend on the previous logics and concepts [8]. This means that traditional machine learning systems fail to recognize this small mutation as it cannot extract abstract features to distinguish novel attacks or mutants from benign. The success of deep learning in big data areas can be adopted to combat cyber threats because mutations of attacks are like small changes in, for instance, image pixels. It means that deep learning in security learns the true face (attack or legitimate) of cyber data on even small variations or changes, indicating the resiliency of deep learning to small changes in network data by creating high level invariant representations of the training data. Though the application of DL has been mainly confined to big data areas, the recent results obtained on traffic classification, and intrusion detection systems in [9],[10],[11] indicate that it could have a novel application in identification of cyber security attacks.

(4)

Deep learning (DL) has been the breakthroughs of artificial intelligence tasks in the fields of image processing, pattern recognition and computer vision. Deep networks have obtained a momentum of unprecedented improvement in accuracy of classification and predictions in these complex tasks. Deep learning is inspired by the human brain’s ability to learn from experience instinctively. Like our brain’s capability of processing raw data derived from our neuron inputs and learning the high-level features on its own, deep learning enables raw data to be fed into deep neural network, which learns to classify the instances on which it has been trained [12],[13]. DL has been improved over classical machine learning usually due to the current development in both hardware resources such as GPU, and powerful algorithms like deep neural networks. The massive generation of training data has also a tremendous contribution for the current success of deep learning as it has been witnessed in giant companies such as Google and Facebook [14],[15]. The main benefit of deep learning is the absence of manual feature engineering, unsupervised pre-training and compression capabilities which enable the application of deep learning feasible even in resource constraint networks [16]. It means that the capability of DL to self-learning results in higher accuracy and faster processing. This research is aimed at adopting a novel distributed attack detection using deep learning to enable the detection of existing or novel attacks in IoT.

The contributions of our research area:

 To design and implement deep learning based distributed attack detection mechanism, which reflects the underlying distribution features of IoT

 To demonstrate the effectiveness of deep learning in attack detection systems in comparison to traditional machine learning in distributed IoT applications

 To compare the performance of parallel and distributed network attack detection scheme using parameters sharing with a centralized approach without parameters sharing in IoT.

2. RELATED WORK

Though research works in the application of deep learning have currently flourished in domains like pattern recognition, image processing and text processing, there are a few promising researches works around cybersecurity using deep learning approach.

One of the applications of deep learning in cybersecurity is the work of [9] on NSL-KDD dataset. This work has used self- taught deep learning scheme in which unsupervised feature

learning has been employed on training data using sparse-auto encoder. The learnt features were applied to the labelled test dataset for classification into attack and normal. The authors used n-fold cross-validation technique for performance evaluation, and the obtained result seems reasonable. This research work is like ours in terms of feature learning though it considers centralized system while our approach is distributed and parallel detection system used for fog-to-things computing. The other relevant work is the application of autoencoders for anomaly detection in [17], in which normal network profile has been learned by autoencoders through nonlinear feature reduction. In their study, the authors demonstrated that normal records in the test dataset have small reconstruction error while it produced a large reconstruction error for anomalous records in the same dataset. Intrusion detection using a deep learning approach has also been applied in vehicular security in [10]. The research has demonstrated that deep belief networks (DBN) based unsupervised pre- training could enhance the intrusion detection accuracy.

Although the work is novel in its approach, the artificial data used, and its centralized approach might limit its practicality in fog networks. Another research of this category has been conducted by [18]. The authors have proposed IDS which adaptively detect anomalies using AutoEncoders on artificial data. Both anomaly detection papers considered artificial data cases which do not reflect the malicious and normal behaviors of real time networks. Apart from that, they adopted a centralized approach which is impractical for distributed applications such as social internet of things in smart city networks.

Deep learning approach has also been applied by [11] for malicious code detection by using AutoEncoders for feature extraction and Deep Belief Networks (DBN) as a classifier for detection. The article has shown that the hybrid mechanism is more accurate and efficient in time than using a sing DBN. It strengthens that deep networks are better than shallow ones in cyber-attack detection. The main limitation of this research is that the dataset should have been more recent to come up with the conclusion. Nevertheless, the research is significantly different from ours since it does not handle the distributed training and sharing of updated parameters. The other paper which has investigated deep learning scheme for malicious code detection is [19]. It has applied denoising AutoEncoder for deeper features learning to identify malicious javascript code from normal code. The result has produced promising accuracy in the best-case scenario. Although the approach is effective in web applications, it can be hardly applied to distributed IoT/Fog systems. Our model is novel as it enables parallel training and parameters sharing by local fog nodes, and detects network attacks in distributed fog-to-things networks using deep learning approach.

(5)

3.

Th m Sm af th ef so inn ap pl co Ho co cit ca

Fi Th In Io Do wi sm of su se se ho ca att no (D

CYBER SEC

he advanceme massive number

mart city app ffected areas o is technologic ffectively infra o on. Typicall

novative, sm pproach to pub

anning. In omponents in

owever, this ollection and d

ties could bri ause the loss o

ig.1: Social Int hough the att nternet, the sca oT with limite oS attacks ar ith IoT/Fog n mart cities. Th

f smart things uch as smart c ervices via IoT ervices. The i

owever, makes an exhaust the tack causes t odes by a sing DDoS attack) [

CURITY IN SOC

ents in technol r of IoT devic plications are of public servi

cal breakthrou astructures suc ly, the integra mart city desi blic service d general, the

the smart c massive conn distribution pla ing about nov f multi-million

ternet of Thin acks in IoT s ale and simplic d protection.

e the most fr networks in s he IoT ecosyst s distributed a city. Millions T, taking adva interconnectiv s a fertile targ eir resources the denial of gle host (DoS [21],[22],[23].

CIAL IOT logies of hardw

es to be conne by far the q ces by social ugh is helpin ch as water, po ation of socia

ign is to cr delivery, infras

e social IoT city are depic nection of IoT atform in the vel or variant n dollars and h

gs: Componen seems the sam city of attack t

As the recen requently asso social internet tem consists o across a given of users are antage of it fo vity of these get for malicio

and launch D f service for S attack) or co

.

ware have ena ected to the In quickest and d

internet of thi ng cities to m

ower, transpor al IoTs and IC reate a data-

structure and T application cted in the F T devices as

emergence of t attacks whic human life [20

nts of smart ci me as in trad targets are larg nt survey [7] s

ociated attack t of things su of a massive n n geographica connected to r private and numbers of t ous adversarie DoS attacks. A

legitimate us oordinated att

abled a nternet.

deeply ings as manage rt, and CT for driven public ns in Fig. 1.

a data f smart ch can 0].

ity ditional

ger for shows, k types

uch as number

l area, social public things, es who A DoS sers or tackers

How maj The netw algo ben ove show Tab Cate Prob

R2L

U2R

DoS

wever, remote or threats as ese attacks lea

work traffic, orithms will b

ign. For inst rwhelming a ws the commo ble 1: Attack c

egory Attac be  S

k

 Ip th

 P s

 N

L  W

u

 g

 w (w p

 im u

 ft F

 M b

 p c m

 s to

R  b

c s

 r

 lo I

 p s

S  s

 N

 B b

 T u

 P r

 L s

e access attac s they can es ave a certain which mig be able to dist tance, DoS a single node f on attack categ

ategories of F ck Type and D Satan- probing known weaknes psweep- pingi he target’s IP Portsweep- sca services on a ho Nmap- various Warezclient- d uploaded previo guess_passwd- warezmaster-

warez) on FTP permissions

map- illegal a using vulnarabi ftp_write- creat FTP to obtain lo Multihop- mul breaks into a sy phf- CGI script commands o misconfigured w spy- breaking i

o discover imp buffer_overflow command caus shell

rootkit- enables oadmodule- ga FS

perl- creating r sets the user id smurf- flooding Neptune- flood Back- requesti backslashes fro Teardrop- caus using mis-fragm Pod- pinging w reboot or crash Land- sending source and dest

cks using ba scalate to roo

patterns and ght affect th

tinguish betw attacks are u from multiple gories [24].

Fog ecosystem Description

of a network sses

ng of multiple anning of po ost

means of netw downloading i ously by the w

guessing passw uploading il P server exploit access of loca ilities

ting .rhost fil ocal login.

lti-day scenari ystem

t enabling to e on a mach

web server.

into system vi portant informa w- the ffbconfi ses buffer flow s to access adm

aining root sh root shell by p

to root g of ICMP echo

ing of SYN on ing of a URL om a webserver

sing system r mented UDP pa with malformed

UDP packet h tination addres

ckdoors could ot access atta characteristic he way lear ween an attack usually known

e sources. Tab m

k for some wel e hosts to reve orts to discov work mapping

illegal softwa arezmaster word over telne llegal softwa ting wrong wri al user accou e in anonymou o where a us execute arbitra hine with ia vulnerabiliti ation

ig UNIX syste w leads to ro min level

hell by resettin erl attack whic o reply n port(s)

L having man r

reboot or cra ackets d packets causin

having the sam s to remote hos

d be acks.

cs in rning k and n by ble 1

ll- eal ver

are et are ite unt us ser ary a es em oot

ng ch

ny sh ng me st

(6)

4.

De sta sc hi bu cla pr ea ac lay fu tra us Th fe 1, re er ter fit J(W

∑ W of Ta

Th a s b us ar sta us gr (b (sa

OVERVIEW

eep Learning ability and calability on b erarchical fea uild a model assification).

revalent forms ach previous l ctivation funct yer n of a neu unctions are li

aining data { sually set outp he cost functio ature extractio loss functio construction e rror terms for

rm is a regula tting problem

W,b)

k

k=0

∑ ∑ ∑

Where nl repres f nodes in each able 2: Activat

he mechanism stochastic grad

| J) is a sta sing constant α re obtained by andard gradien sing sample i radient descen batch) is not e ampling subse

W OF DEEP LEA

g has been th generalizatio big data. It atures of train l which tran

Multi-layer s of deep lea layer and a b tion f to form

ural network, isted in the ta {x⁽¹⁾, x⁽²⁾,x⁽³⁾, put values to b ons to be optim on are general on is shown error specified k instances arization term

in training.

sents the numb h layer

tion functions

m of minimizin dient descent andard gradien

α as a learning averaging. E nt descent on i until the co nt of paramete

efficient, com et) simplifies t

≔ ≔

ARNING

he state of t on, and ac extracts comp ning data of nsforms input

deep netwo arning algorith

ias are compu weighted inp i.e an=f(Wnak

able 2. Given

…}, deep l be either equal mized in deep lly loss functi in which the d using the me

of training da which is used

ber of layers a

s

ng the loss fun (SGD, where nt computed v g rate. The fin Equation (2) sh updates of w onvergence is ers over the w mputing gradie the learning pr

α∂L W

∂ α∂L W, b

∂

the art for tr chieved sign plex and non

high dimens ts to outputs rks are the hms. The out uted by a non puts Wn for th

k-1) [25]. Acti n a set of unl learning algo l or less than i models during ons. In the eq e first term an of sum-of- ata, and the s d for avoiding

1 and sl is the n

nction L(W, B the gradient via backpropa nal parameters hows the iterat weight W and s obtained. A whole availabl

ent over mini rocess [15].

W, b|j b|j

raining nificant nlinear ion to s (e.g.

most tput of nlinear he next ivation abeled rithms inputs.

g deep quation is the square second g over-

number

B | j) is L(W, agation

s W, B tion of bias b As the

le data i-batch

2 In t feat acti abst chal lear the thro prob set to d non repe opti For end clas assu p=1 vect the

hө

whe and dist

5.

The atta netw supp nod and atta opti of d para para resu

training proc tures increase vation functi traction natur llenging to en rn training par

deep network ough gradient

blem. Gradien of deep netwo decrease the nlinear functio

etition yields imum.

a known num d of neural n

sses in the da umes that the 1,…,P could b

tor summed t following form

x ⋮

ere θ(1),θ(2),…

d the term tribution. The

ө

OUR APPROA

e fog nodes ar ck detection work since th

ported by soci de should be i d optimization.

ck detection imization, the data training ameters from ameters of e ulting update

ess, as the l es towards th ion and we re at every la

able the deep rameters that k. The paramet descent, whi nt descent is ork parameter gradient by c on being op

the optimizat

mber of class networks as ataset and de e probability t be estimated to 1

.

In other

m [26]

:

1 ;

⋮2 ;

| ; ^∑

…,θ(K)∈Rⁿ a

∑

cost function 1

ACH

e responsible systems at th hey are close ial internet of in place for c . In addition t n using loc

benefits of th near to the s m neighbors.

ach cooperati back to the

layer increase he answers of eight matrice

ayer of the n learning mode meet the accu ters of training ich is a nonli initiated rand rs, but it is up computing gr ptimized. The tion of the alg

es, softmax is activation fun eep network i

that P(y=p|x) as it outputs words, our e

1

1exp are the parame

is of softmax wi

exp

∑ e

for training m he edge of th er to the sm

things. The co collaborative to giving the a cal training his approach ar

ource and the The master n ive node, an

worker node

es, abstraction f the model.

es determine network, but

el to automatic uracy objectiv g are usually le inear optimiza domly by setti pdated at each

radient descen e output of gorithm to a l

s employed a nction. Havin inputs x, soft for each valu s a K-dimensi estimate hө(x)

exp ¹ exp ²

⋮ exp eters of our mo

normalization ill be given by p

exp

models and ho he distributed art infrastruct oordinating m

parameter sha autonomy of l and param re the accelera e gain of upd node updates nd propagates es. This pla

n of The

the it is cally ve of earnt ation ing a

step nt of

this local

at the ng p ftmax ue of

ional take

odel, n of y:

sting d fog

tures master aring local meter ation dated s the s the ays a

(7)

sig ov pr op ne ar sy

Fi th

Th att pa fo sc pa

KD co W fo in in rep tab [1

gnificant role verheads of m rovides fast r ptimization a etworks by d rchitecture of ystem in fog-to

ig.2: Distribute ings networks

he outputs of tack detection arameters. The og node for gl cheme results arameters and

6.1. ^DAT DDCUP99 [2 ommonly used We used NSL-K ormat for mod

trusion datase trusions, but producible. T ble 1, and id -5]. Table 3 sh

e in offloadi models, data an response time approach cou distributing SG

our distribut o-things comp

ed attack detec s

f model traini n models and

ese local para lobal update a in better lear avoids local o 6. EV

TASET,ALGOR

7], ISCX [28]

d datasets in KDD intrusion del validation et not only ref are also it The dataset co dentified as a k

hows sample r

ing storage nd parameters e. The centra uld be exten

GD. Figure 2 ted and parall puting.

ction architect

ing on distrib d their associ ameters are s and re-propag rning as it en overfitting.

VALUATION

RITHM AND ME

and NSL-KD the intrusion n dataset whic

and evaluatio flects the traffi is modifiab omposes of th

key attack in records of NS

and computa s from IoTs w alized trainin ded to distr 2 shows a g lel attack det

ture for Fog-to

buted fog nod iated local le sent to coordi gation. This s nables to shar

ETRICS

DD [24] are the detection res ch is available ons. The NSL fic composition

le, extensible he attacks sho IoT/Fog com L-KDD datase

ational while it ng and

ributed general tection

o-

des are arning inating haring re best

e most search.

in csv L-KDD ns and e, and own in mputing

et.

Tab 0 0 0 0 0 151 0 315 240 0 0 560 The 22,5 prot traff tabl Tab Traf Nor Atta Tot Tab Traf Nor DoS Prob R2L U2R Tot

Bef enco tech feat exp atta min R2L Tab

ble 3: snapshot tcp udp icmp tcp tcp 59 tcp

tcp udp 0 tcp

tcp tcp 07 udp e original data

544 records of tocol, service, ffic distributio les 4(a) and 4(

ble 4 (a): traffi ffic

rmal ack tal

ble 4 (b): traffi ffic

rmal S

be L R tal

fore training t oded into d hnique. Becau tures and 1

eriment, we h ck) and 4-cla nority class U

L.U2R.

ble 5: the enco

t of records in ftp_data other ecr_i http http ftp private other http private private other aset consists o

f test, each wi flag, source b on of NSL-KD

(b).

c distribution Training 67343 58630 125973 ic distribution

Training 67343 45927 11656 995 52 125973

the network, discrete feat use of encod

label, as sho have taken th ass (normal, U2R is merge

ded form of o

NSL-KDD da SF 491 SF 146 SF 1480 SF 232 SF 199 SF 350 S0 0 SF 146 SF 328 S0 0 REJ 0 SF 147 of 125,973 rec ith 41 features bytes, destinat DD dataset is

of NSL-KDD Te 97 128 225 of NSL-KDD

Te 97 745 275 242 200 225

categorical fe tures using ding, we ob own in the e dataset in 2

DoS, Probe, ed to R2L to

ur dataset

ataset

0 …

0 0 …

8153 …

420 …

1185 …

0 …

105 …

275 …

0 …

105 …

cords of train s such as dura tion bytes, etc.

s shown as in

D in 2-class st

11 833 544

D in multi-clas st

11 58 54 21 0 544

eatures have 1-to-n enco tained 123 i table 5. For 2-class (norma

R2L.U2R).

o form a clas

…

… n and ation, . The n the

s

been oding input our al vs The ss of

(8)

Th tra tes ec ea ite pa wi th di up no

Al

I us as em m ra co pe ha de de m tru

he system use aining and test

sting. Suppos cosystem, and ach node havin eration. Each arallel way, bu ith the coordin e update re stributed train pdated parame ode.

lgorithm 1: lo 1. Recieve workers 2. For nod



 3. Compu 4. Repeat

In the training sed for dist synchronously mployed for d metrics for atta ate (DR) and omparison bet erformance me as been added eep learning m etected by the misclassified no ue detection o

s the same tec t. At this step, se Dn are da

Wn,bn are pa ng local data node runs da ut asynchronou

nating node. T egularly. Th ning on each eters with neig

cal training an e initial or upd

s

de n in the netw Get local tr local Dn

Execute SGD Wn, biases bj

≔

≔ Compute  ute , =

(1)

and testing ph tributed and y, while Kera

deep learning.

ack detection d false alarm tween deep etrics such as

a comparison model DR den

e model, whi ormal instance over total data

chnique of pre , the data is re ata across n arameters thei

Dna subset of ata training on usly exchange The coordinati he following

local fog nod ghbor nodes v

nd parameter e date of work, do in paral

raining traffic s D on local traf

ji ∈ b_n

α∂L W, b|j

∂ α∂L W, b|j

∂ and  and s

 , 

hase, Apache parallel pro as on Theano . The most im such as accu m rate (FAR) and shallow

precision, rec n between indiv

notes ratio of ile FAR repr es. Accuracy a instances. R

eprocessing fo eady for trainin

nodes in th r local param f Dn as sampl n deep netwo e learned param

ing node broa g algorithm de while excha

via coordinatin

exchange from ma llel:

sample i ∈ D_n ffic, and update j 1 j 2 send to master n



Spark [29] ha ocessing of

package [30 mportant eval uracy, the det ) were chose models. How call and F1 M

vidual classes f intrusion ins resents the ra is the percent Recall indicate

or both ng and he fog meters ,

les per orks in meters adcasts shows anging ng fog

aster to

na from e wji∈

node

as been SGD 0] was luation tection en for

wever, Measure s in the stances atio of tage of es how

man repr Mea [31]

deri

Actu L Actu

whe FN:

As mod (nor mea atta one with con nod for para sche the is t shal deep used laye laye has to a

ny of the atta resents how m asure provides ]. The mathem ived from con

tual:NORMA tual:ATTACK

ere, TP: true p : false negativ

6.2. ^EXPER a first attemp del, we used rmal, DoS, P asure, unseen

ck detections.

is to compare h a centrali

ducted by dep de for centrali

distributed at allelism and d eme, we varie network as a f to evaluate t llow learning p learning sys d 123 input f er neurons, 5 er with neuron

various batch avoid the overh

acks does the many of the r s the harmonic matical repres

fusion matrix Predicted:N L

TN K FN

,

, 1

positive, TN: t e

RIMENTAL ENV

pt towards exp the 2-class (n robe, R2L.U2 test data are Our experime e the result of ized system.

ploying the de zed system, a ttack detectio distribution, as ed the number function of tra the effectiven algorithms fo stem, after hyp features, 150 f 0 third layer ns equal to the h sizes in 50 ep

heating proble

model return returned attac c average of p sentation of th

as:

NORMA P

K F T

2 2

true negative,

VIRONMENT

ploring the pe normal and a 2R) categories e chosen to r

ent has two ob f our distribute

This exper eep learning m and multiple c on. To test th

s a benchmark r of machines aining accurac ness of deep

or attack dete per-parameter first layer neu

neurons and e number of cl pochs, and tra em.

n, while preci ks are correct recision and r ese metrics ca

Predicted:ATT K

FP TP ,

,

FP: false posi

erformance of attack) and 4- s. In perform represent zero bjectives. The ed attack detec riment has

model on a si coordinated n he performanc k for our detec s used for trai cy. The second learning ag ection in IoT.

optimizations urons, 120 se d the last soft

lasses. The m ained with dro

ision t. F1 recall an be

TAC

1

2

itive,

f our class mance o-day e first ction been ingle nodes ce of ction ining d one gainst The s, has cond ftmax model opout

(9)

6.3. RESULTS AND DISCUSSIONS

In the evaluation process, classification accuracy and other metrics were used to show the effectiveness of our scheme compared to shallow models in distributed IoT at fog level.

The comparison of distributed training to centralized approach in accuracy is also one of our evaluation criteria. Table 5 compares the accuracy of the deep and shallow models, while fig. 3 shows the accuracy difference between centralization and distribution.

Fig.3: Accuracy comparison of distributed and centralized models

Table 6: Accuracy of deep model (DM) and shallow model (SM)

2-class 4-class

Model Type

Accuracy (%)

DR (%)

FAR (%)

Accuracy (%)

DR (%)

FAR (%)

DM 99.20 99.27 0.85 98.27 96.5 2.57%

SM 95.22 97.50 6.57 96.75 93.66 4.97

Fig.4: comparison between DL and SL in training time

Fig.5: comparison between DL and SL in detection time Table 7 (a): Performance of 2-class

Model Type ^Class Precision (%) Recall (%) F1 Measure (%) Deep Model Normal 99.36 99.15 99.26

Attack 99.02 99.27 99.14 Shallow

Model

Normal 97.95 93.43 95.65 Attack 92.1 97.50 94.72 Table 7 (b): Performances of 4-class

Model Type

Class Precision (%)

Recall (%)

F1 Measure (%)

Deep Model

Normal 99.52 97.43 98.47

DoS 97 99.5 98.22

Probe 98.56 99 98.78

R2L.U2R 71 91 80

Shallow Model

Normal 99.35 95 97

DoS 96.55 99 97.77

Probe 87.44 99.48 93

R2L.U2R 42 82.49 55.55 90

92 94 96 98 100 102

0 10 20 30 40 50 60

ACCURACY

NUMBER OF FOG NODES A C C U R A C Y O F D E E P V S S H A L L O W

L E A R N I N G

Distributed Model Centralized Model

0 20 40 60 80 100 120 140

0 10 20 30 40

training time (ms)

Number of fog nodes DL run time (ms) SL run time ( ms)

0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009

0 10 20 30 40

DETECTION TIME (MS)

NUMBER OF FOG NODES DL run time (ms) SL run time ( ms)

(10)

The experiment result has demonstrated double standards. The first one is that the distributed model has a better performance than the centralized model. As it can be seen from fig. 3, with the number of increased nodes in the distributed network of Fog systems, the overall accuracy of detection increased from around 96% to over 99%. The detection rate in table 6 also exhibits that deep learning is better than classic machine learning for both binary and multi-classes. This shows that distributing attack detection functions across worker fog nodes is a key mechanism for attack detection in social IoT systems such as a smart city which needs real time detection. The increase in accuracy on distributed scheme could be because of collaborative sharing of learning parameters which avoids overfitting of local parameters, and hence, contributes to the accuracies of each other. On the other hand, the accuracy of the deep model is greater than that of shallow model, as shown in the table 6. In addition, table 6 shows the false alarm rate of the deep model, 0.85% is much less than that of machine learning model (6.57%). As shown in table 7 (a) and (b), the performance of deep learning is better than the normal machine learning model for each class of attack. For instance, the recall of deep model is 99.27%, while the traditional model has a recall of 97.50% for a binary classification.

Similarly, the average recall of DM is 96.5% whereas SM has scored average recall of 93.66% in multi-classification.

However, fig.4 shows that deep learning takes longer learning time than traditional machine learning algorithms while the detection rates (fig.5) of the both algorithms are significantly the same. It is expected that deep networks consume larger time in training because of the size of parameters used in learning. The main issue for attack detection systems focuses more on the detection speed than the learning speed. Thus, this indicates that deep learning has a huge potential to transform the direction of cybersecurity as attack detection in distributed environments such as IoT/Fog systems has indicated a promising result.

7. CONCLUSION AND FUTURE WORK

We proposed a distributed deep learning based IoT/Fog network attack detection system. The experiment has shown the successful adoption of artificial intelligence to cybersecurity, and designed and implemented the system for attack detection in distributed architecture of IoT applications such as smart cities. The evaluation process has employed accuracy, the detection rate, false alarm rate, etc as performance metrics to show the effectiveness of deep models over shallow models. The experiment has demonstrated that distributed attack detection can better detect cyber-attacks than centralized algorithms because of the sharing of parameters which can avoid local minima in training. It has also been

demonstrated that our deep model has excelled the traditional machine learning systems such as softmax for the network data classification into normal/attack when evaluated on already unseen test data. In the future, we will compare distributed deep learning IDS for on another dataset and different traditional machine learning algorithms such as SVM, decision trees and other neural networks. Additionally, network payload data, will be investigated to detect intrusion as it might provide a crucial pattern for differentiation.

ACKNOWLEDGEMENTS

This work was supported by La Trobe University’s research enhancement scheme fund.

REFERENCES

[1.] Securing the Internet of Things: A Proposed Framework, http://www.cisco.com/c/en/us/about/security-center/secure-iot- proposed-framework.html , 2016

[2.] M.Ibrahim, “Octopus: An Edge-Fog Mutual Authentication Scheme”, Journal of Network Security, 18(6), 2016

[3.] I.Stojemovic and S.Wen, “ The fog computing paradigm:

Scenarios and security issues”, IEEE federated conference on Computer Science and Information Systems,2014

[4.] A. Alrawais, A. Alhothaily, C. Hu and X. Cheng, "Fog Computing for the Internet of Things: Security and Privacy Issues," in IEEE Internet Computing, vol. 21, no. 2, pp. 34-42, Mar.-Apr. 2017.

[5.] S. Yi, Z. Qin and Q. Li, “Security and privacy issues of fog computing: A survey”, International Conference on Wireless Algorithms, Systems and Applications (WASA), 2015

[6.] V. L. L. Thing, "IEEE 802.11 Network Anomaly Detection and Attack Classification: A Deep Learning Approach," 2017 IEEE Wireless Communications and Networking Conference (WCNC), San Francisco, CA, 2017, pp. 1-6

[7.] C. Kolias, G. Kambourakis, A. Stavrou, and S. Gritzalis,

“Intrusion detection in 802.11 networks: Empirical evaluation of threats and a public dataset,” in IEEE Communications Surveys and Tutorials, Vol. 18, No. 1, pp. 184-208, 2016.

[8.] Guy Caspi, Introducing Deep Learning: Boosting Cybersecurity With An Artificial Brain, http://www.darkreading.com/analytics/introducing-deep- learning-boosting-cybersecurity-with-an-artificial- brain/a/d-id/1326824, last accessed on July 1, 2017.

[9.] Quamar Niyaz, Weiqing Sun, Ahmad Y Javaid, and Mansoor Alam, Deep Learning Approach for Network Intrusion Detection System, ACM 9th EAI International Conference on Bio-inspired Information and Communications Technologies, New York,2016

(11)

[1

[12

[1

[14

[1

[2

[22

[2

[24

0.] Kang M-J, K Deep Neura ONE 11(6):

1.] Y. Li, R. Ma method base Security and 2.] BENGIO, Y training of d 1, S. 153–16 3.] DENG, LI:

applications Signal and I e2 — ISBN 4.] LECUN, YA

HAFFNER, Document R (1998), Nr. 1 5.] J. Dean, G.S

MZ. Mao, M Ng. Large Sc 6.] VINCENT,

ISABELLE ANTOINE:

Representati Criterion. In (2010), Nr. 3 7.] Mayu Sakur Using Autoe In Proceedin Learning fo Rahman, Jer York, NY DOI=http://d 8.] Chao W AdaptiveAno COGNITIVE Advanced C 2015 9.] Wang, Y., C

detecting Networks, 9 0.] Diro, Abebe

"Lightweigh Cryptograph Networks an 1.] http://www.c journal/back access on 14 2.] Charalampos

Distributed Journal, 7(4) 3.] Costa Gond Nascimento Approach fo of Service on 4.] M. Tavallaee Analysis of IEEE Symp

Kang J-W (201 al Network for

e0155781. doi:

a, and R. Jiao. A ed on deep learn d Its Application YOSHUA; LAM deep networks. I 60 — ISBN 026 A tutorial surv for deep lear nformation Pro 2048-7703 ANN ; BOTTO

PATRICK: G Recognition. In 11, S. 2278–232 S. Corrado, R.

M’A. Ranzato, A cale Distributed PASCAL ; L

; BENGIO, Y Stacked Denoi ions in a Deep n: Journal of M 3, S. 3371–3408 rada and Takeh encoders with N ngs of the MLS or Sensory Da remiah Deng, Y, USA, dx.doi.org/10.11 Wu, Yike

omaliesDetectio E 2015 : The S Cognitive Tech Cai, W., and We malicious Ja : 1520–1534, 20 e Abeshu, Nave ht Cybersecurit hy in Publish- nd Applications cisco.com/c/en/u k-issues/table-co 4 Nov, 2016

s Patrikakis, M Denial of Serv ),2004 dim JJ, de Oliv

A, García Villa or Assessing Am n the Internet of e, E. Bagheri, W

the KDD CUP posium on Com

6) Intrusion De In-Vehicle Net 10.1371/journa A hybrid malicio ning. In Internat ns, volume 9, pa MBLIN, PASCA In: Advances in 62195682 vey of architect

rning. In: APS ocessing Bd. 3 ( OU, LEON ; B radient Based n: Proceedings 24 — ISBN 001 Monga, K. Che A. Senior, P. Tuc d Deep Network LAROCHELLE YOSHUA ; MA

ising Autoencod p Network wit Machine Learn 8 — ISBN 1532 hisa Yairi. 2014 Nonlinear Dime DA 2014 2nd W ata Analysis (M

and Jiuyong L , Pages 145/2689746.26

Guo an onwithDeepNet Seventh Interna hnologies and A ei, P. , A deep avaScript cod

016. doi: 10.10 een Chilamkurt ty Schemes U -Subscribe fog

(2017): 1-11 us/about/press/i ontents-30/dos-a ichalis Masikos vice Attacks - veira Albuquerq

alba LJ, Kim T mplified Reflect f Things, 16(11 W. Lu, and A. G P 99 Data Set,”

mputational Inte

etection System twork Security l.pone.0155781 ous code detecti

tional Journal of ages 205–216, 2 AL: Greedy laye n neural … (200 tures, algorithm SIPA Transactio (2014), Nr. Janu BENGIO, YOS

Learning App of the IEEE B 18-9219

en, M. Devin, Q cker, K. Yang, a ks

, HUGO ; LA ANZAGOL, PIE ders: Learning th a Local Den ning Research B 2-4435

4. Anomaly De ensionality Red Workshop on M MLSDA'14), As Li (Eds.). ACM 4 , 8 689747

nd Yajie twork,

ational Confere Applications, I learning approa de. Security C 002/sec.1441

ti, and Neeraj K Using Elliptic Computing." M internet-protoco attacks.html , la s, and Olga Zou The Internet P que R, Clayton TH., A Methodo ion Distributed ), 2016 Ghorbani, “A D

” Submitted to S elligence for S

m Using y. PLoS

1 on f 2015

er-wise 07), Nr.

ms, and ons on uary, S.

HUA ; plied to Bd. 86 QV. Le, and AY.

AJOIE, ERRE-

Useful noising Bd. 11 etection duction.

Machine shfaqur M, New pages.

Ma, on ence on

IARIA, ach for Comm.

Kumar.

Curve Mobile ol- ast

uraraki, Protocol n Alves ological Denial Detailed Second Security

[25.

[26.

[27.

[28.

[29.

[30.

[31.

rese Thin and

area com secu secu

and Defense A ] SALAKHUTD

Deep Boltzm International C (2009), Nr. 3, ] http://ufldl.stan

, last accessed ] Knowledge

archive, htt ddcup99.htm ] Ali Shiravi, Ghorbani, Tow benchmark d Security, Volu 0167-4048, 10 ] Matei Zaharia

Shenker, and working sets.

Hot topics Association, B ] Keras deep

(2013), https:/

on 30, Nov, 20 ] Tang, TA, Mh (2016) Deep Detection UNSPECIFIE Networks and Oct 2016, Fez Abeb the D Trobe degre Univ Univ Deve arch interests i ngs, Cybersecu

Big Data.

Navee Progra Inform Melbou degree as include int mputing, vehicu

urity, wireless m urity.

Applications (CI DINOV, RUSL mann Machines Conference on S. 448–455 — nford.edu/tutori

on 31, May, 20 discovery tp://kdd.ics.u ml

Hadi Shirav ward developing datasets for in ume 31, Issue 3 0.1016/j.cose.20 a, Mosharaf Cho Ion Stoica. 20 In Proceedings in cloud co Berkeley, CA, U learning, //github.com/cha 016

hamdi, L, McL Learning Ap in Softwar D The Interna d Mobile Comm

, Morocco. IEE be Abeshu Dir Department of e University, A ee in Comput

ersity, Ethiopia ersity from 20 elopment, and L

nclude Softwar urity, Advanced

n Chilamkurt am Coordinat mation Techno urne, VIC, A e from La Trob telligent transp ular communic multimedia, wir

ISDA), 2009.

LAN ; HINTON s. In: Proceed

Artificial Inte ISBN 97814244 ial/supervised/S 017

in data uci.edu/datab

vi, Mahbod g a systematic a ntrusion detecti 3, May 2012, Pa 011.12.012.

owdhury, Micha 10. Spark: clus s of the 2nd USE omputing (HotC USA, 10-10.

P.W.D. Charl arlespwd/projec Lernon, D et al

pproach for re Defined ational Confer munications (W EE . (In Press)

ro is currently IT Computer Australia. He

ter Science fr a in 2010. He 007 to 2013 as Lecturer in Com re Defined Netw d Networking,

ti is currently tor, Compute ology, La Australia. He o

be University. H port systems

cations, Vehic reless sensor ne

N, GEOFFREY dings of The

lligence and S 439911 SoftmaxRegress

abases DA ases/kddcup9 Tavallaee, Ali approach to gen ion, Computer ages 357-374, I ael J. Franklin, ster computing SENIX conferenc loud'10). USE les, Project ct-title, last acce l. (2 more aut

Network Intru Networkin rence on Wir WINCOM'16), 2

a PhD candida Science and IT

received his M from Addis A

worked at Wo s a Director of mputer Science working, Intern Machine Lear

y the Cybersec er Science Trobe Unive obtained his P His current res

(ITS), Smart cular cloud, C etworks, and M

Y E.:

12th tatics sion/

ARPA 99/k i A.

nerate rs &

ISSN Scott with ce on ENIX Title, essed thors) usion ng. In:

reless 26-29

ate in T, La M.Sc.

Ababa ollega f ICT e. His net of rning,

curity and ersity, Ph.D.

earch grid Cyber Mobile

(12)

Highlights

 Deep learning has been proposed for cyber-attack detection in IoT using fog ecosystem

 We demonstrated that distributed attack detection at fog level is more scalable than centralized cloud for IoT applications

 It has also been shown that deep models have excelled shallow machine learning models in cyber-attack detection in accuracy.

 In the future, other datasets and algorithms as well as network payload data will be investigated for comparisons and further enhancements.