View of SURVEY “PEER-TO-PEER COMPUTING NETWORKS”

(1)

SURVEY “PEER-TO-PEER COMPUTING NETWORKS”

Dr. S.P Singh¹, Ram Kumar Sharma²

1Government Post Graduate College, Noida, GB Nagar, UP Department of Computer Science

2Raj Kumar Goel Institute of Technology, Ghaziabad, UP Department of Computer Science

Abstract - In today’s world most of the Internet applications are based on client/server model. The advantage of thisarchitecture is that the clients can be low computing power machines with small memory. Most of the client machines on the Internet today have the computing capability more than 100 times of a supercomputer of 1990. It has been observed that 95% computing power of these client machines remains unused [1]. The huge amounts of computing resources are available on the Internet for further use. On second sidebig volume of data generated from various scientific applications for analysis which require super computers for processing. The super computers are very costly in terms of money.

At present the scientific computing society has beenfocusing towards lower computational costs. The advancement in new trend of modern architectures results inPeer-to-Peer (P2P).

P2P model is developing as a new model because of its features to harvest the computing and storage power of sites connected to the network to make their underutilized resources available to others. In this paper we survey different frameworks of P2P network in which idle cycles from desktops or home PCs connected to the Internet will be used for solving the scientific computing problems. We also study about different issues like load balancing, security, incentives and fairness in P2P networks.

Keywords: P2P Networks, computing networks, distributed computing, decentralized system, SETI@home, Napster.

1. INTRODUCTION

In the present scenario maximum Internet applications are based on client/server model such as World Wide Web, Email, E- commerce, and core banking. The client/server model works in two distinct manners. The client connects to the server to request for services, and the server after getting the request provides the desired services to the client.

Normally deploying and maintaining a server – sideprogram is complex and expensive. Deploying a web server requires machine with high computing power, large memory, server class software and high- speed Internet connection, which are costly. The advantage of client/server architecture is that on the client side it has no additional requirement. Clients can be low computing power machines, with small memory and web browser.

When looking towards current scenario, most of the client machines on the Internet today have the computing capability more than 100times of a supercomputer of 1990. Statistics results shows therewere more than 1,966,514,816 Internet users by the month-end June 2010 [35]. In most of the research it is found that 95%

computing power of these client machines remains unused [1]. Imagine that if at any one time 10 million 400MHz machines are connected to the Internet and only 10% of their resources remain unused. Then these 10% unused resources represent 40 X 10⁸ MHz computing power and lot of memory space. These figures show that huge amount of computing resources is available on the Internet that must be used otherwise they are wasted [1]-[3].

Currently most of the clients have much computing or processing power but their processing powers are not fully utilized. Very large volumes of jobs or data from various scientific applications such as astronomical measurements, multimedia streaming, bio-medical studies, academic research, weather forecast and data mining applications are available for processing. These applications require high computing machines to process data or jobs which are expensive in terms of money.

Presently the scientific computing community has been focusing towards lower computational costs. The advancement in new trend of modern

(2)

architectures results in Peer-to-Peer (P2P).

The P2P systems and layouts aredescribed by straight access between site computers, rather than through a centralized server.

The P2P model is developing as a new distributed model because of its features to harvest the computing and storage power of hosts connected to the Internet to make their underutilized resources available to others. The P2P paradigm decreases the requirement of expensive infrastructure by allowing straightcommunication between peers and resource harvesting. It also increases the expansionand reliability by removing the need of a centralized host point (server).

The applications based on P2P networking are the best solutions to use the spare computing resources of theInternet clients [15]. The P2P network architecture allows the peers to directly share their computing resourcesto each other. To improve scalability feature, trueP2P network architecture divides the computational resources into groups.

These groups can be built on the basis of different parameters like functioning, qualitative, quantitative, distance and geographic location of peers [16]-[17].

Choosing one of these parameters is also an issue in design of P2P computing system. The qualitative and quantitative criteria improve the performance of the system however groups based on geographic location of peers increase the reliability of system.

The decentralized nature of pure P2P network has advantages like reliability and scalability but it also poses some challenges related to job distribution or load balancing and fair allocation of resources [18]-[20]. Security is an additional challenge in pure P2P network because no single point of control is there in the network where classical approaches like firewalls can be applied [21]. New security mechanisms are needed so that the usershave confidence in the P2P systems. The issue of fair allocation of resources also exists in P2P systems. The systems should be fair enough so that a user utilizing computing resources should also be ready to share its resources for other peers on the network.

In this paper we survey different frameworks of P2P network in which idle

cycles from desktops or home PCs connected to the Internet will be used for solving the scientific computing problems. We also study about different issues like load balancing, security, incentives and fairness in P2P networks.

2. RELATED WORK

In last few years, a number of P2P networking systems have been developed for various applications like filesharing and distributed computing. Napster, Gnutella, FreeNet are the example of filesharing systems while SETI@home is the example of the P2P computing system which is used to process the astronomical data only. SETI@home is the most successful application with the large number of contributors because of their exciting application theme“search for an extraterrestrial intelligence”. Initial SETI project used thespecial super computers for the analysis of bulk data received from telescope. In 1995, David Gedye proposed the concept of a virtual super computer formed by large number of Internet connected computers for the analysis of the large amount of data received from telescope. The idea of David Gedye is implemented as SETI@home project whichwas launched in May 1999 and running successfully. In this project most of data received from telescopes are distributed to many PCs for processing by a centralized server and after completion the results are sent back to the centralized server [7]-[8].The online file sharing system called Napster is another example of P2P network. It is an online music file sharing system developed by Shawn Fanning. This system grants its users to upload and download MP3 files without any restriction. Napster maintains a directory of shared file at central location and to download a file peer issue queries to the directory server to find which peer hold the desired file.Napster’s search mechanism is centralized but its file sharing mechanism is decentralized. The downloading of files is done directly between the peers. The Napster was launched in July 1999 and closed by the courtorder in July 2001 due to violation of copyright rules [9]- [12]. Many researchershave been made to addressoperations and functional issues in P2P computing networks. A brief survey

(3)

is presented below. Jerome Verbeke, NeelakanthNadgir et al. in [16] presented a decentralized P2P computing framework for large-scale computation problems named as JNGI. In this model the computational resources are divided into groups according to their functionality. They suggest three peer groups: the monitor group, the worker group, and the task dispatcher group. The architecture of framework controls communication to small peer groups that enables the architecture to scale to a very large number of peers. Jean-Baptiste et al. in [17] added new types of groups called similarity groups into the JNGI project.

These new groups were built on the basis of two criteria, qualitative (structural) or quantitative (performance).

The qualitative criteria included OS type or JVM version whereas quantitative criteria included physical characteristics such as CPU speed, bandwidth, and RAM size. However, peer grouping totally based on geographic location parameter needs to be considered to improve the reliability.

This middleware includes a hybrid job distribution scheme for load balancing over the P2P networks. This middleware considered the P2P network as flat organization without hierarchy or node grouping. ZhimingDai1, Zhiyi Fang et al., in [23] further proposed a JXTA based distributed computing system. In this they combined the load balancing and dynamic task allocation mechanisms. This framework distributes the job over the network for processing according to the processing efficiency and response speed of the peers.

A distributed packet processing algorithm on a P2P network is proposed by Jungian Yao and Laxmi Bhuyan in [24]. This algorithm is known as resource sharing distributed load processing (RSDLP) algorithm. In this algorithm the workload of a host is distributed to the other nodes by structuring the nodes into a tree (efficient resource tree). The authors also proposed an efficient open P2P cycle sharing framework known as Maximum Efficient Tree (MET) in [25]. This framework is designed in such a way so that real time applications can usethis framework to process their data. A dynamic multi-level resource tree is built in framework having

sufficient power to process the job before its deadline. In above study’s authors form a tree structure to distribute the load however they do not consider the peer groups of the pure P2P networks. Edi E., Kechadi T., Mcnulty R. in [26], [27]

further presented a hierarchical virtual network topology and developed a distributed computing environment in the form of computing grid. The distributed resources on the Internet are added to grid by using P2P technology. In this research authors combined the P2P computing and Grid computing paradigms thereby not considering the pure P2P architecture. Kolja Eger and Ulrich Killat in [28]-[29], proposed a distributed resource allocation algorithm.

This algorithm uses the congestion pricing principle of IP networks. In this algorithm the service providers allocate their services according to the price offers of the requesting service customers. It ensures some form of fairness in P2P networks.

Lidong Zhou, Lintao Zhang et al.

in [30] discovers the feasibility of a self- defense infrastructure inside a P2Pnetwork to control the worms. Balfe S., Lakhani A.D., K.G. Paterson in [31]

shows the use of trustedcomputing in securing P2P networks. They use Trusted Computing Group (TCG) specifications to design a pseudonymous authentication scheme in P2P networks to establish secure ommunication links betweenparticipating peers. Vasileios Vlachos, Stephanos Androutsellis et al. in [32] proposed a security mechanism for open networks which uses P2P architecture. In the network the peers are required to communicate each other and interchange the information regarding security threats that theyhave received. On the basis of thisinformation the system can change the security measures. Anna Satsiou, Leandros Tassiulas, in [33] proposed a distributed trust-based exchange frame work for P2P networks. They proposed reputation

metrics which show

the satisfaction values for the services provided by a peer. Esther Palomar, Juan M. Estevez-Tapiador et al. in[34], presented a certificate base access control in pure P2P network. The content providers apply the restriction that who

(4)

can access this content. They define the different security levels for their contents and the user who wants to access this content must have the authentication of the same level as the content.It has been seen that most of the research work in the area of P2P computing is based on hybrid architecture, very few of them are considering the pure P2P architecture.

Most of the load balancing algorithms and security mechanisms so far developed consider a centralized system for indexing purpose. However, do not consider the decentralized nature of pure P2P network systems.

3. ISSUES IN P2P COMPUTING NETWORKS

In the previous couple of years, the main target of research adds the realm of P2P computing was to develop communication protocols and frameworks to accomplish data sharing and data exchange. Most of the protocols and frameworks have been developed to support the information sharing within the P2P file sharing systems. There are only a few mechanisms available that use the computing power of the remote systems for computing purposes. one in all the P2P computing network systems such as SETI@home has a centralized website for distribution of jobs for processing and result collection associated with astronomical data only.

SETI@home is predicated on hybrid P2P network architecture. To increase the reliability and scalability decentralized systems are required. The pure P2P architecture is decentralized in nature. In such network scalability is improved by dividing the computational resources into groups. Selecting criteria for grouping the computing resources is that the beginning in designing of pure P2P ADP system. Various criteria are used for grouping peers like physical location of peers, qualitative like operating platforms and technology, and quantitative purposes like CPU speed, memory size etc. The problem is that if in an exceedingly group all the peers belong to same geographic location than just in case of any natural disaster or equipment failure therein area the full group is collapsed which affects the working of system thereby requiring

grouping criteria supported physical locations also.

The pure P2P systems also require mechanisms for choosing resources from the big number of Internet connected nodes. In P2P system some peers may heavily loaded while a number of them remain idle for many of the time. To avoid this case decentralized load balancing mechanisms are needed. If real time applications want to use P2P computing networks to process data, they require predictable performance because tasks in these types of applications have deadline to be met. to raised utilize the idle resources in pure P2P computing network, efficient job partitioning, resource identification and cargo balancing mechanisms are required.Providing security in P2P networks could be a challenge. If any user wants to participate in P2P distributed processing application, it required to download and run an executable file on its own machine. The users don't trust one another. In client/server systems, the safety mechanism like firewall is employed to shield data and system from intruders and attacks. These styles of mechanisms don't seem to be successful in P2P systems because they need no such centralized point where the firewall is often deployed. In decentralized systems there's no central authority that may verify the protection and reliability of the files shared or job distributed. Any user on the network can share or distribute any file or code which will contain harmful contents like viruses. Therefore, new security concepts and mechanisms are required for the P2P computing systems.

P2P computing networks also face the problems of free riding; peers only consume resources without contributing anything to the network.

Most users are free riders, which appear only as service customers and don't share their available resources.

One important question is that - why should peers provide their CPU cycles for the computation to other peers? For this issue to produce the motivation to the peers some incentive or pricing mechanisms have to be developed.

(5)

We identify the subsequent issues which require the eye for further research:

(a) It's required to style a replacement framework for pure P2P computing network to process large computation jobs and develop a grouping mechanism for the framework supported Geographic location together with Qualitative and Quantitative criteria.

(b) It's required to design an effective load sharing mechanism for pure P2P computing framework to reduce the entire time interval in order that the framework is additionally useful to process the important time jobs.

(c) It's required to develop a security mechanism for the pure P2P computing framework to secure against threats, and malicious attacks, and to scale back the risks of loss of data.

(d) It's required to develop a mechanism for fair allocation of resources and pricing of resources within the framework.

4. CONCLUSION

The majority of research adds the world of P2P networking has been dedicated for the event of file sharing networks.

There are only a few mechanisms available for sharing of CPU cycles and use of those shared cycles for the big computing problems. Most of the P2P cycle sharing systems are partially decentralized within the nature and need a centralized system for the indexing and resource allocation process. To enhance the reliability and scalability of this sort of systems complete decentralization is required.

The Napster is employed for file sharing however it's supported hybrid P2P spec [13]-[14]. The SETI@home is another hybrid P2P network application used for distributed computing.

SETI@home system strongly relies on its server to distribute jobs to every participating peer and to gather results after processing is done. Thanks to the centralization in system the reliability and scalability issues are affected.

SETI@home system is dedicated for the

particular task only i.e., to process the info from astronomical dimensions.

General users are not allowed to submit their own jobs for processing over SETI@home network. New systems are required where general users can submit their own job for processing and take the advantage of spare resources available on Internet.

The issues in P2P networks like load sharing, security, and fair allocation of resources consider hybrid or partially centralized P2P networks however to enhance the reliability and scalability of P2P network systems, complete decentralization is required.

REFERENCES

1. Ian J. Taylor, From P2P to Web Services and Grids Peers in a Client/Server World, pp.23-41, SpringerPublication London, 2005.

2. Harwood A., Balsys R.J., “Service Networks - Distributed P2P Middleware,” In Proceedings of APAC Conference and Exhibition on Advanced Computing, Grid Applications and eResearch, 2003.

3. C.Lv, P.Cao, ECohen, K.Li, and S. Shenker,

“Search and Replication in Unstructured P2P Networks,” In Proceedings of ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, pp.258-259, 2002.

4. Dejan S. Milojicic, Vana Kalogeraki, RajanLukose, Kiran Nagraja, Jim Prune, Bruno Richard, SamiRollins, Zhichen Xu,

“P2P Computing,” HP Lab Technical Report, HPL-2002-57R1, March 2002.

5. INTEL, “P2P-Enabled Distributed Computing,” Intel White Paper, 2001.

6. V. Lo, D. Zappala, D. Zhou, Y. Liu, and S.

Zhao, “Cluster Computing on the Fly: P2P Scheduling ofIdle Cycles in the Internet,”

In Proceedings of 4th IEEE International Conference on P2P Systems (P2P'04), 2004.

7. David P. Anderson, Jeff Cobb, Eric Korpela, Matt Lebofsky, Dan Werthimer,

“SETI@home: An Experiment in Public- Resource Computing,” In Communications of the ACM, Volume 45, Issue 11, pp.56- 61, November 2002.

8. Korpela, E., Werthimer, D., Anderson, D., Cobb, J., and Lebofsky,“ SETI@home : Massively Distributed Computing for SETI,” In Journal on Computing in Science and Engineering, Volume 3, Issue 1, pp.78 - 83, 2001.

9. Siu Man Lui, Sai Ho Kwok,

“Interoperability of P2P File Sharing Protocols,” ACM SIGecom Exchanges, Volume 3 No.3, pp.25-33, 2002.

10. S. Androutsellis-Theotokis and D.

Spinellis, “A Survey of P2P File Sharing Technologies,” White Paper, Electronic Trading Research Unit (ELTRUN), Athens University of Economics and Business, 2002.

(6)

11. Ulrike Lechner, “P2P Beyond File Sharing,”

In Proceedings of the 2nd International Workshop on Innovative Internet Computing Systems (IICS), pp. 229-249, 2002.

12. Stefan Saroiu, P. Krishna Gummadi, and Steven D. Gribble, “A Measurement Study of P2PFile Sharing Systems,” In Proceedings of ACM/SPIE Multimedia Computing and Networking (MMCN ’02), 2002.

13. B. Pourebrahimi, K. Bertels, S. Vassiliadis,

“A Survey of P2P Networks,” Proceedings of the 16th annual Workshop on Circuits, Systems and Signal Processing, ProRisc 2005, November 2005.

14. Beverly Yang and Hector Garcia-Molina,

“Designing a Super-Peer Network,” In Proceedings of 19^th International Conference on Data Engineering, pp. 49, 2003.

15. Talia D., Trunfio P., “Toward a Synergy Between P2P and Grids,” In IEEE Journal on Internet Computing, Volume 7, Issue 4, pp. 96 - 95, 2003.

16. Jerome Verbeke, Neelakanth Nadgir, Greg Ruetsch, Ilya Sharapov, “Framework for P2P Distributed Computing in a Heterogeneous, Decentralized Environment,” In Proceedings of the 3rd International Workshop on Grid Computing, pp.1-12, Year 2002.

17. Jean-Baptiste Ernst-Desmulier, Julien Bourgeois and Francois Spies, Jerome Verbeke, “Adding New Features in A P2P Distributed Computing Framework,” In Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (Euromicro- PDP’05), pp.34 - 41, 2005.

18. David R. Karger and Matthias Ruhl,

“Simple Efficient Load Balancing Algorithms for P2P Systems,” In Proceedings of the 16th annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’04), pp. 36-43, 2004.

19. Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard M. Karp, and Ion Stoica, “Load Balancing in Structured P2P Systems,” In International Workshop on P2P Systems (IPTPS), pp. 68-79, 2003.

20. Yingwu Zhu and Yiming Hu, “Efficient, Proximity-Aware Load Balancing for Structured P2P Systems,” In Proceedings of the 3rd International Conference on P2P Computing (P2P ’03), pp. 220, 2003.

21. Dan S.Wallach, “A Survey of P2P Security Issues,” In International Symposium on Software Security (ISSS), pp. 42-57, 2002.

22. Thomas Fischer, Stephan Fudeus, and Peter Merz, “A Middleware for Job Distribution in P2P Networks,” In Applied Parallel Computing - State of the Art in Scientific Computing, Volume 4699, LNCS, pp. 1147-1157, 2007.

23. Zhiming Dai1, Zhiyi Fang, Xiao Han, Fengyuan XU and Hongjun Yang,

“Performance Evaluation of JXTA Based

P2P Distributed Computing System,” In Proceedings of the 15th International Conference on Computing (CIC’06), pp.391-398, 2006.

24. Jingnan Yao and Laxmi Bhuyan,

“Distributed Packet Processing in P2P Networks,” In Proceedings of IEEE Global Telecommunications Conference (GLOBECOM), pp.142-147, 2005.

25. Jingnan Yao, Jian Zhou, Laxmi Bhuyan,

“Computing Real Time Jobs in P2P Networks,” In Proceedings of 31st IEEE Conference on Local Computer Networks, pp.107-114, 2006.

26. Edi E., Kechadi T., Mcnulty R. “Virtual Structured P2P Network Topology for Distributed Computing,” In Proceedings of IEEE International Conference on Cluster Computing, pp.1-9, 2006.

27. [027]. B. Hudzia, M. Kechadi, and A.

Ottewill, “Treep: A Tree Based P2P Network Architecture,” In Proceedings of the IEEE International Conference on Cluster Computing (Cluster 2005), pp.1- 13, 2005.

28. Kolja Eger and Ulrich Killat, “Fair Resource Allocation in P2P Networks,” In Proceedings of International Symposium on Performance Evaluation of Computer &

Telecommunication Systems (SPECT’06), pp.39-45, 2006.

29. Kolja Eger and Ulrich Killat, “Resource Pricing in P2P Networks,” IEEE Communications Letters, Volume 11, Issue 1, pp.82 - 84, Jan. 2007.

30. Lidong Zhou, Lintao Zhang, Frank Mcsherry, Nicole Immorlica, Manuel Costa, Steve Chien , “A First Look at P2P Worms:

Threats and Defenses,” In International Symposium on Peer-to-peer Systems(IPTPS), pp.24-35, 2005.

31. Balfe S., Lakhani A.D., K.G. Paterson,

“Trusted Computing: Providing Security for Peer-to-Peer Networks,” In Proceedings of 5^thIEEE International Conference on Peer- to-Peer Computing, P2P 2005, pp.117- 124, 2005.

32. Vasileios Vlachos, Stephanos Androutsellis- Theotokis, Diomidis Spinellis, “Security Applications of P2P Networks”, In International Journal on Computer and Telecommunications Networking (Computer Networks), Volume 45, Issue 2, pp.195 - 205, June 2004.

33. Anna Satsiou, Leandros Tassiulas, “A Trust-Based Exchange Framework for Multiple Services in P2P Systems,” In Proceedings of 7th IEEE International Conference on Peer-to-Peer Computing (P2P 2007), pp. v -vii, 2007.

34. Esther Palomar, Juan M. Estevez- Tapiador, Julio C. Hernandez-Castro, Arturo Ribagorda, “Certificate Based Access Control in Pure P2P Networks,” In Proceedings of the 6th IEEE International Conference on Peer-to-Peer Computing (P2P'06), pp.177 - 184, 2006.

35. http://www.internetworldstats.com/stats.

htm