International Journal on Advanced Computer Theory and Engineering(IJACTE)
________________________________________________________________________
SURVEY OF DETECTION OF MALICIOUS NODES IN PEER TO PEER
1Nishant Kumar Jha, 2D.D.Gatade Department of Computer Engineering,
Sinhgad College of Engineering, Vadgoan Bk, Pune 41.
Email : 1[email protected], 2[email protected]
Abstract-- Peer-to-peer streaming has witnessed a great success, thanks to the possibility of aggregating resources from all participants. However, performance of the entire system may be highly degraded due to the presence of malicious peers that share bogus data on purpose. In a peer-to-peer (P2P) streaming system, since each peer needs to participate in uploading data to other peers, malicious peers may choose to upload bogus data so as to damage the playback and degrade the performance of normal peers in the system. This is known as pollution attack in P2P networks, and it can cause harsh impact to the performance of P2P streaming systems. There are different approaches, different use a statistical inference technique and general & fully distributed detection framework discussed in this paper, which can be executed by each legitimate node in Peer to Peer network to identify its malicious neighbors.
Keywords --- Peer to Peer, Malicious nodes, Pollution attack, Detection of malicious node, recommendation algorithm, MIS scheme.
I. INTRODUCTION
In recent years, Peer-to-Peer (P2P) file sharing systems for the Internet have gained rapidly increasing popularity. Meanwhile, more than 70% of the traffic in the backbone of the Internet is generated by such P2P systems. The major advantages of P2P mechanisms compared to traditional client/server mechanisms lie in the increased scalability for supporting millions of users as well as achieving more dependability through decentralization. However, the P2P concept is not only restricted on file-sharing, but may also be deployed for decentralized coordination and communication in a
variety of other distributed applications. P2P network is vulnerable to various attacks due to the existence of malicious nodes which do not comply with the network protocol so as to achieve their own purposes. In a peer- to-peer (P2P) streaming system, since each peer needs to participate in uploading data to other peers, malicious peers may choose to upload bogus data so as to damage the playback and degrade the watching experience of normal peers in the system. This is known as pollution attack in P2P networks, and it can cause severe impact to the performance of P2P streaming systems. The successes of several commercial P2Pstreaming products, have demonstrated that P2P streaming is a promising solution to efficiently distributing live video streams at a large scale. But the above problem of malicious peers are motivating the researchers to find the methodology or some framework to overcome from this problem as it is gained rapidly increasing popularity in recent years.[6]
This paper is organized as follows: Section II introduces the background and Detections. Section III describes MIS scheme. Section IV presents the techniques for detecting malicious peers. Section V comparison of different algorithms Section .VI conclusion.
II.BACKGROUND AND DETECTION
A. Background of Peer to Peer(P2P)
A P2P computer network is one in which each computer in the network can act as a client or server for the other computers in the network, allowing shared access to various resources such as files, peripherals,
________________________________________________________________________
and sensors without the need for a central server. P2P networks can be set up within the home, a business, or over the Internet. Each network type requires all computers in the network to use the same or a compatible program to connect to each other and access files and other resources found on the other computer.
P2P networks can be used for sharing content such as audio, video, data, or anything in digital format. P2P is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged participants in the application. Each computer in the network is referred to as a node. The owner of each computer on a P2P network would set aside a portion of its resources—such as processing power, disk storage, or network bandwidth—to be made directly available to other network participants, without the need for central coordination by servers or stable hosts. With this model, peers are both suppliers and consumers of resources and also it can supplement the user detection and recovery and improve quality assurance activity of the product. In contrast to the traditional client–server model where only the server supply (send), and clients consume (receive). Emerging collaborative P2P systems are going beyond the era of peers doing similar things while sharing resources, and are looking for diverse peers that can bring in unique resources and capabilities to a virtual community thereby empowering it to engage in greater tasks beyond those that can be accomplished by individual peers, yet that are beneficial to all the peers.[8]
B. Detecting malicious action Malicious nodes in peer to peer
Many network services consist of a large set of independent nodes, and these nodes are required to follow some rules to cooperate so as to achieve a given network functionality. To realize such network service, every node must communicate or provide local services with a subset of other nodes which are called neighbors, e.g., upload packets to neighbors, download packets from neighbors and forward packets for neighbors and so on. To guarantee the correct functionality of the network service such that every node can get service with desired performance, nodes must follow the predefined protocols when they participate in the communication with their neighbors.
There are number of scenarios that nodes may not follow the protocols:
(1) Some nodes may violate the rules and behave maliciously so as to obtain some personal gain. (2) It is possible that a third party may inject some misbehaving nodes so as to disrupt the network service to gain a competitive edge in a commercial market.
In P2P networks, peers must cooperate with others and participate in uploading and downloading.
Specifically, when a peer wants to playback a piece of video file, it must first download all required sub-pieces from its neighbors who have these data. On the other hand, if its neighbors request some sub-pieces that it possesses, it also needs to upload the sub-pieces to its neighbors. Therefore, every node follows this rule to participate in uploading and downloading so that it can get the desired data for playback. However, some nodes may not follow these cooperative rules, e.g., they may upload polluted sub-pieces to their neighbors instead of uploading correct sub-pieces, and this is the well known pollution attack in P2P networks. Many P2P streaming systems focus on improving streaming quality and assume that all nodes in the system cooperate as desired.
However, this may not be true in the open environment of the Internet. Some nodes in the system may be selfish and unwilling to upload data to others. Some may have abnormal actions such as frequent rebooting, which adversely affect their neighbors. More terribly, some nodes may cheat their neighbors, launch attacks, or distribute viruses to disrupt the service. This type of uncooperative, abnormal, or attacking behavior is known as malicious actions and the corresponding nodes are known as malicious nodes.
Pollution attack is known to have a disastrous effect on existing P2P infrastructures: it can reduce the number of legitimate P2P users by as much as 85%, and it generates abundant bogus data which may deplete the communication bandwidth.[8]
Reputation Computing
Each node needs to generate monitoring messages to report the performance of its neighbors in each submission period. Now in this section it is studied about how to compute the reputation of a streaming node A, REP(A), from monitoring messages. For ease of illustration, take one of the malicious
actions which is discussed above i.e, detecting an eclipse attack, nodes need to report their connections with parents. If one message reports a connection between the nodes while the other one claims no connections between them, the two messages are both suspicious.
Define Ns(A) and Nn(A) as the number of suspicious messages and non-suspicious messages generated by A.
The credibility level of A, Credit(A), is given by Nn(A)/(Ns(A) + Nn(A)). It indicates to what extend it should believe that the messages generated by A. Since dishonest nodes often have low credit values, their opinion on the evaluation of others should be accordingly reduced. Hence compute REP(A) as:
REP(A)=Average{Credit(X) }, (1)
where Corrupt(A, X, t) and Total(A, X, t) are the amounts of corrupt data and total data received by X
from A in period t as reported by X , respectively. The valid periods exclude outdated messages. The Average function computes the average value over all available X .
Now, let’s discuss another example of how to detect an eclipse attack. As legitimate nodes can choose their neighbors from the overlay nodes whose indegree is lower than a certain threshold. In addition, attacking nodes may consume the indegree of legitimate nodes and prevent other legitimate nodes from pointing to them. Thus, it is also necessary to bound node outdegree. In summary, the In order to detect an eclipse attack, each node is required to periodically report its connection to the parent. In other words, node reputation can be computed as follows:
Where F(A,X,t)={1 if X reports A as the child during period t;
0 Otherwise.
Where (A,X,t)={1 if X reports A as the child during period t;
0 Otherwise.
(2)
If a node’s REPindegree is higher than a certain threshold, or its REPoutdegree is higher than another threshold, then the node is classified as an attacking node.[8]
Reputation-Based Peer Selection
Reputation values are used to guide peer selection during streaming. A higher reputation value indicates that the node is
more trustworthy in terms of the collective evaluation by peers that have had data exchange with it. There are various
ways to use reputation values. For example, a peer that issues
downloading requests may receive several responses. It can compare the reputation of the responding peers and choose the one with the highest reputation to download data. This reduces the risk of receiving abnormal service from malicious nodes. In another case, if some node is considered as malicious, say because its reputation is lower than a certain threshold, its attached monitoring node can broadcast this information to other peers so that they may block it. [8]
III MIS: MALICIOUS NODE IDENTIFICATION SCHEME
In network-coding pollution attacks, malicious nodes send bogus blocks to their downstream neighbors. An innocent node that receives a bogus block is infected, and the blocks it produces with this bogus block are also corrupted and may further infect its downstream peers.
The goal of this scheme is to track the origin of corrupted blocks.
MIS Description
The first step of this scheme is to detect the existence of malicious nodes. Let each decoding node detect corrupted blocks by checking if the decoding result matches the specific formats of video streams, and any node having an inconsistent decoding result will send an alert to the server to trigger the process of identifying malicious nodes. The alert contains the sequence number of the polluted segment, a time stamp, and the reporting node’s ID. To prevent malicious injection or modification on transmitted alerts, each alert is appended with a HMAC computed with a secret key shared between the reporting node and the server (assuming that each node registers a secret key at the server when it joins the overlay). After receiving an alert, the server computes a checksum based on the original blocks in the polluted segment. With this checksum, infected nodes can identify which neighbor has sent it a corrupted block.
The checksum is signed by the server and disseminated to the overlay. The correctness of the above process of identifying malicious nodes relies on the condition that no one can lie when reporting a suspicious node.
Fig 1 An example of identifying malicious nodes in MIS schemes.[5]
________________________________________________________________________
One way to achieve these requirements a light-weight non-repudiation transmission protocol is designed based on efficient one-way hash functions, which can satisfy the above requirements with significantly higher efficiency in terms of both computational costs and communication overheads.
Non-Repudiation Transmission Protocol:
Let X be the suspicious node, and Y be the reporting node. Let e denote the block that X transmits to Y , and Φ(e) denote the evidence (referred to as evidence code) associated with e and Seq(e). Then Φ(e) should convince the server that e is indeed sent by X, but not forged by Y .[5]
IV. TECHNIQUES FOR DETECTING MALICIOUS PEERS
Many P2P streaming systems focus on improving streaming quality and assume that all nodes in the system cooperate as desired. However, this may not be true in the open environment of the Internet. Some nodes in the system may be selfish and unwilling to upload data to others. Some may have abnormal actions such as frequent rebooting, which adversely affect their neighbors. More terribly, some nodes may cheat their neighbors, launch attacks, or distribute viruses to disrupt the service.
In this a section, it is discussed about how to detect malicious nodes in a P2P streaming system. There are different framework and algorithms are used in this section to detect the malicious peers. [3]
Attack model and proposed enriched architecture Malicious peers may misbehave in several ways:
• They can modify the payload of a block.
• They can lie when sending checks to the monitor node.
• They can avoid sending checks to their monitor;
• They can churn by alternating between connection and disconnection
The system consist NP malicious peers: these peers join the swarm and follow the application level protocol for overlay organization and data exchange but they can intentionally modify the payload of blocks that are forwarded to their neighbors. The effect of this unpleasant action is to make invalid chunk reconstruction of the receiving peers thus preventing them from reproducing the original content. The architecture also comprises a set of NM trusted monitor nodes. The tracker assigns each peer to only one monitor node based on the chosen overlay-wide unique identifier: Assume that the peer identifier is an integer ip the tracker assigns it to the monitor whose identifier is equal to ip mod NM. It follows that a peer can only
report its checks to the same monitor node for the whole duration of the video stream.
As soon as a peer reconstructs a chunk it is assumed that it is able to detect if the chunk is polluted or not. If the chunk is polluted it is not shared with the peer neighbors and a positive check is sent to the monitor assigned by the tracker. On the other hand, a negative check is sent to the monitor upon reconstructing a valid chunk. A check contains the list of peer identifiers that uploaded blocks of that chunk and a binary flag to indicate a positive or negative outcome.[4]
General Detection Framework
The above attack model and architecture is a simple attack model for small class networks, but what if there is a large class of networks, then the above architecture is not able to identify malicious nodes. So, to overcome from that problem the general and fully distributed detection framework is applied to identify malicious nodes so as to defend against malicious attack and this framework can be executed by every legitimate node so as to identify its malicious neighbors.
Consider a network consisting of a set of independent nodes, and each node interacts with a subset of other nodes which are called its neighbors. Nodes are required to obey some predefined rules for cooperation so as to achieve some network functionality. However, some nodes may not follow the rules to collaborate, and conversely, they carry disruptive operations when interact with their neighbors, and such nodes are called the malicious nodes. Correspondingly, the nodes which follow the rules are called good nodes or legitimate nodes. The purpose of this seminar is to identify the malicious nodes and block them for further communication so as to guarantee the functionality of the network. To achieve it, design a fully distributed detection algorithm which can be executed by every good node to identify its malicious neighbors.[10]
Let’s focus on a particular good node, say node i.
Define Ni as the neighboring set of node i, i.e., node i only interacts with nodes in Ni. Since only two types of nodes exist in the network, good nodes and malicious nodes, every node j in Ni must be either good or malicious, tj is used to denote its type, and tj ∈ {good;
malicious}.
The interaction between node i and its neighbors is a repeated process and time proceeds in rounds. At each round, node i may interact with some of its neighbors in Ni to achieve some objective.
The set of such neighbors are called the active set and use A(t) to denote the active set at round t. Intuitively, every node in A(t) contributes to node i at round t.
However, if a malicious node exists, node i may not achieve its objective due to the interrupt operations carried out by the malicious node.
The payoff of node i for each interaction with its neighbor j as follow.
Uij= (3)
In a realistic system, nodes do not have global information. Specifically, node i can only observe the behaviors of its neighbors but not their types, i.e., node i does not know the payoff gained from each individual neighbor. In this work, assume that node i can measure the total payoff gained at each round, which is the sum of the payoff gained from each individual neighbor in the active set. U(t) is defined as the total payoff at round t.
U(t) =Σj∈A(t) Uij . (4) Since Uij is either 1 or −∞ based on Equation (1), U(t) can be either |A(t)| or −∞. If U(t) equals to |A(t)|, then it means that all Uij are equal to one. In other words, all nodes in the active set at time t must be good nodes. On the other hand, if U(t) is equal to −∞, then at least one Uij equals to −∞, i.e., the active set A(t) contains at least one malicious node. Secure set is defined as the set of nodes which are identified as good nodes, and use S(t) to denote it at round t.
S(t) = (5)
Now, the general framework of detection algorithms can be described. Observe that, initially, there may be several malicious nodes existing in the neighboring set Ni. Furthermore, node i does not know which neighbors are malicious nodes and which neighbors are good nodes. In other words, every node in the neighboring set is potentially malicious. Define M(t) as the set of nodes which are potentially malicious until round t, i.e., all neighboring nodes that are not in M(t) are considered as good nodes by node i. Initially, M(0) = Ni. M(t) called as a malicious set at time t. To identify the malicious nodes in Ni, observe that at each round, node i is sure that all nodes in the secure set S(t) are good nodes and they can be removed from the malicious set. If this algorithm is executed for sufficient number of rounds, one can expect that all good nodes are removed from the malicious set and M(t) only contains malicious nodes.
M(t) =M(t − 1) − S(t). (6) The implementation of the above idea is stated as follows[10]:
Alg. A: General Detection Framework for Node i 1. t ← 0;
2. M(0) = Ni;
3. while (M(t) contains good nodes){
/* shrink the malicious set */
4. t ← t + 1;
5. M(t) ←M(t − 1) − S(t);
}
remove all nodes in M(t);
Since the detection algorithm is fully distributed, i.e., every good node can execute it to identify its malicious neighbors. Therefore, all malicious nodes in the whole network can be identified.
There are three performance measures which measures the quality of detection framework:
• Pfn(t), probability of false negative until round t,
• Pfp(t), probability of false positive until round t, and
• E[R], expected number of rounds needed for detection.
The first two performance measures quantify the accuracy of the detection framework, while the third one quantifies the efficiency. Formally, Pfn(t) is defined as the probability that a malicious node is wrongly regarded as a good node, which is actually the probability that a malicious node is wrongly removed from the malicious set M(t) since only nodes in the malicious set are regarded as malicious nodes. On the other hand, Pfp(t) is defined as the probability that a good node is wrongly regarded as a malicious node.
Since all nodes in the malicious set M(t) are regarded as malicious nodes, Pfp(t) is the probability that a good node remains in the malicious set M(t). Lastly, the random variable (r.v) R denotes the number of detection rounds until no good node exists in M(t).[10]
GET RECOMMENDATIONS The reputation metric measures a stranger’s trustworthiness
based on recommendations.[1]
1:
2: <
3:
4:
5: rset <
6: while and |rset| < do
________________________________________________________________________
7: for all pk do
8: if then
9: rec < RequestRecommendation( ) 10: rset <
11: end if 12: end for 13:
14:
15: end while 16: return rset
V COMPARISON OF DIFFERENT ALGORITHMS
A. Performance Evaluation
There are some parameters on basis of which performance of different algorithms can be evaluated (1) Probability of false positive
(2) Probability of false negative (3) Complexity
Performance with different lie ratios
FPR is a false positive rate, it is defined as the number of non-malicious nodes evaluated as malicious divided by the total number of non-malicious nodes, and define false negative rate (FNR) as the number of malicious nodes evaluated as non-
malicious divided by the total number of malicious nodes.
Figure 2 shows FPR and FNR values versus monitoring time (in the unit of submission period) given different lie ratios, where the group size (the number of streaming nodes) is set to 5000.
(a)
(b)
Fig 2. Performance with different lie ratios. (a) FPR.
(b) FNR[8]
The monitoring duration does not need to be long (about 30 periods) to achieve good enough accuracy. The larger lie ratio, the larger FPR and FNR. In the best case of β = 0, FPR is always 0. On the other hand, FNR values is much larger, as shown in Figure 2(b). [8]
Performance of the detection algorithm
In this simulation, it is assumed that peer i has 50 neighbors, i.e., Ni = 50, and five of them are malicious attackers, i.e., k = 5. Moreover, at each round, each neighbor of peer I is chosen to upload with probability 0.1, i.e., the uploading probability ᾳ is set as 0:1. Figure 3a shows the results of probability of false positive. The curve with stars shows the theoretic result and the curve with circles shows the simulation result. Probability of false positive converges to 0 eventually, which means that malicious set only contains malicious attackers as long as detection time is long enough. Moreover, probability of false positive decreases to a reasonably small value, say 0.1, even if the algorithm has only run a small number of rounds, say 50 rounds.The result on the distribution of number of detection rounds is shown in Figure 3b. The results shown in Figure 3 show the effectiveness and efficiency of detection algorithm.[8]
Fig 3. Performance of the detection algorithm[8]
Robustness to colluding attacks
Fig. 4: h(t) as a function of time in case of a colluding attack[6].
Fig. 5: h(t) as a function of time in case of a colluding attack for increasing number of malicious
nodes[6].
Fig. 5 shows that accuracy is high for all considered settings. Furthermore, reactivity slightly increases as the number of malicious nodes increases as summarized in Tab. 1.
Reactivity is only slightly affected: TSR(1) = 23.3, CI [15.9, 30.6], in the colluding attack while we obtain TSR(1) = 20.6, CI [14.1, 27.1] in the reference scenario.
[6]
TABLE 1: Reaction times in case of a colluding attack for increasing number of malicious nodes.[6]
TSR(1) (CI) 90
180 270
23.3([15.9,30.6]) 23.7([16.2,31.2]) 25.9([17.7,34.1])
VI. CONCLUSION
Many P2P streaming systems assume that nodes cooperate to cache and relay data. However, this may not be true in the open Internet. Due to the intrinsic property that nodes need to interact with each other, Peer to Peer streaming are vulnerable to various attacks which are launched by malicious nodes. In this paper, it has shown that it is possible to recast the identification of malicious nodes attempting to pollute a P2P application or simply it is like to study how to detect malicious nodes in a P2P streaming system. So to identify the malicious node, each node required to keep monitoring its neighbors and to periodically generate
monitoring messages. The messages are collected and analyzed at some trustworthy nodes in a monitoring overlay. There are several key components in this framework, including reputation computing in the presence of node lying efficient structure of the monitoring overlay. These algorithms and framework allow each node independently perform the detection to identify its malicious neighbors.
ACKNOWLEDGMENT
We take this opportunity to express our deep sense of gratitude towards those, who have helped us in various ways, for preparing this paper. We would like to thank the reviewers of this paper for their constructive comments.
REFERENCES
[1] Ahmet Burak Can and Bharat Bhargava, Fellow, IEEE ―SORT: A Self-ORganizing Trust Model for Peer-to-Peer Systems‖ ieee transactions on dependable and secure computing, vol. 10, no. 1, january/february 2013
[2] Haiying Shen*, Senior Member, IEEE, Guoxin Liu, Student Member, IEEE, Jill Gemmill, Member, IEEE, Lee Ward‖ A P2P-based Infrastructure for Adaptive Trustworthy and Efficient Communication in Wide-Area Distributed Systems‖ 1045-9219/13/$31.00 © 2013 IEEE
[3] Haiying Shen*, Member, IEEE, Yuhua Lin, Ze Li, Student Member, IEEE‖ Refining Reputation to Truly Select High-QoS Servers in Peer-to-Peer Networks‖ 1045-9219/12/ © 2012 IEEE
[4] Ming-Chang Huang,‖ Introduction to Two P2P Network Based Reputation Systems” 978-0- 7695-4696-4/12 © 2012 IEEE DOI 10.1109/ISBAST.2012.12
[5] Q. Wang, L. Vu, K. Nahrstedt, and H. Khurana,
―MIS: Malicious nodes identification scheme in network-coding-based peer-topeer streaming,‖ in INFOCOM, 2010 Proceedings IEEE,march 2010, pp. 1 –5.
[6] Rossano Gaeta and Marco Grangetto
―Identification of malicious nodes in peer-to-peer streaming: a belief propagation based technique‖
IEEE Transaction on Parallel and Distributed systems 2012, 1045-9219/12/ © 2012 IEEE.
[7] Shuzhen Xu, Mingchu Li, ―A Cooperation Scheme based on Reputation for Opportunistic Networks‖ 978-1-4673-2088-7/13/ ©2013 IEEE [8] X. Jin and S.-H. G. Chan, ―Detecting malicious
nodes in peer-to peer streaming by peer- based monitoring,‖ ACM Trans. Multimedia Comput.
________________________________________________________________________
Commun. Appl., vol. 6, pp. 9:1– 9:18, March 2010.
[9] Y. Li and J. C. Lui, ―Stochastic analysis of a randomized detection algorithm for pollution attack in P2P live streaming systems,‖
Performance Evaluation, vol. 67, no. 11, pp. 1273 – 1288, 2010
[10] Yongkun Li, John C.S. Lui, ―On Detecting Malicious Behaviors in Interactive Networks:
Algorithms and Analysis‖, 978-1-4673-0298- 2/12/ c⃝ 2012 IEEE
[11] Ze Li, Student Member, IEEE, Haiying Shen, Senior Member, IEEE, and Karan Sapra, Student Member, IEEE,‖ Leveraging Social Networks to Combat Collusion in Reputation Systems for Peer-to-Peer Networks‖ ieee transactions on computers, vol. 62, no. 9, september 2013.
◙◙◙