View of A COMPREHENSIVE SURVEY OF SOCIAL NETWORK ANALYSIS-BASED ANOMALY DETECTION TECHNIQUES WITH SOFT COMPUTING

(1)

4 A COMPREHENSIVE SURVEY OF SOCIAL NETWORK ANALYSIS-BASED ANOMALY

DETECTION TECHNIQUES WITH SOFT COMPUTING Govind Singh Mahara

(Research scholar), Department of Computer Science& Application, RKDF University, Bhopal, MP, India

Dr. Sharad Gangele

(Professor), Department of Computer Science & Application, RKDF University, Bhopal, MP, India

Abstract - Online social networks have documented an extreme rise of interest in the last decade due to the evolution of the Internet. They have become main targets for malicious users who attempted to perform illegitimate activities and cause damage to other users it has a great impact because of its fundamental features such as: Social Influences, Means of Communication, and Exploring Digital Social Space. The mass usage of Online Social Networks (OSNs) in different domains has given rise to critical threats such as vulnerabilities, mobile threats, etc. Moreover, there are massive anomalies. For instance;

identity theft, hack account, fake account, spams and many other illegitimate activities, for this reason, there is a need for an approach to detect these anomalies. This paper presents a comprehensive survey of social network analysis-based anomaly detection techniques with soft computing technique.

Keywords: Social Networking, Social media, Social Networking Analysis, Anomaly Detection, Soft computing.

Corresponding Author: Dr. Sharad Gangele Professor, Department of Computer Science &

Application RKDF University, Bhopal, MP, India 1. INTRODUCTION

Huge amounts of raw data, especially social interactions in large groups, are a major feature of today's Information Age. Online social networks give everyone with a place to hang out and share their personal information, images, and videos with others. There are many helpful patterns that can be gleaned from this flow of data, and there are methods to extract relevant information about an individual or a group by studying the individual behavior of its members or the links to peers in the network. As a result, criminals may take use of this cutting-edge technology. Online social networks have grown tremendously owing to its key characteristics such as: Personal space management, social connections, communication methods, and exploration of the digital social space Social networking is a method of communicating with other people, exchanging images, and doing anything else you want using your computer. This sort of cooperation is referred to as social media, and the web has offered a platform known as social networking sites to make it possible for everyone to work together (SNS). It may also be seen as a virtual community or as a personal profile site for users. As a result of the enormous rise of social media and online culture. Many social networks, including Facebook, Twitter, and other online social systems, have become a primary target for unscrupulous people. A large number of harmful actions have been documented in the recent past. Social network analysis is a method that is used to evaluate the various behaviors of online communities. A wide range of high- impact applications in security, banking, healthcare, law enforcement, and other fields may be found in this issue area. There are a multitude of strategies available for identifying suspicious accounts on online social networking sites. Every online social network has its own structure and character, which differs from one another. Traditional and new approaches to identifying anomalous activity across all online social networks, including Facebook, Twitter, Viber, WhatsApp, YouTube, and WeChat.

2. LITERATURE REVIEW

Social media is widely used in current time, the amount of data it produced in high rate and rapidity with verity social networking is always hither Kaur (2017) [1] analysis can be done in social networking metrics while there are lot of its disadvantages which are discussed in many paper Kaur (2016)[13] discussed about graph based techniques to detect

(2)

5 anomalies while user fake profile identification in online social networking. Sharma (2019) [12] has highlighted issues with fake profile. He had shown that researcher have observed that 20% to 40% profiles in online social network like facebook are fake. Fake profiling is not only a problem for social media however its creating problem with all digital platforms.

How it‟s in impacting online platforms other than social networking shown by Gangele et al.

(2017) [5]. Fake news is one other problem social networking sites pushes to world. Aldwairi (2018) [10] defines fake news as „fictitious articles deliberately fabricated to deceive readers”

and how a group of people can influence system. He used clickbait‟s for profiling and his results shows 99.4% of accuracy. Xiny defines study of fake news in brodely 4 categories.

Bourgonje (2017)[11] defined a system which can be used to find fake news by using click bait. As data around the social networking is always in huge volume, velocity, variety we can say it will be big data. Shu (2017) [14] also defined social media produces data that is big , incomplete, unstructured and noisy Sadhasivam (2020)[15] defines anomaly activities as also known as malicious activities which spread all over the social network. Janni (2021)[9] in social networking how much data is user -generated data (text, video, image and audio) data heterogeneity and high speed generation rate also defined how big data is involved. For anomaly detection a lot of researcher has written for it. Yu (2016) [3] wrote about anomaly categorization typically defines in 2 categories point anomaly and group anomaly. Mehdi (2016) [16] discussed about structural analysis graph mapping and calculated anomaly.

3. SOCIAL NETWORKING ANALYSIS

Social network analysis [SNA] is the act of mapping and quantifying the linkages and flows between individuals, groups, organisations, computers, and other entities that process information/knowledge [5]. Connections Dimensions (Number of nodes), Density (number of ties present/total number of ties possible), Exceptional (Sum of connections from an actor to others), In-depth (Sum of connections to an actor), Walk a short distance (A sequence of actors and relations that begins and ends with actors), Geodesic separation (The number of relations in the shortest possible walk from one actor to another ), Maximum rate of flow ( The amount of different actors in the neighborhood of a source that lead to pathways to a target Some Measures of Power and Prestige), Degree (Sum of transitive weighted connections from or to an actor) (authority, focal point, and page rank); Closeness centrality (distance between an actor and all other players in the network) and Betweenness centrality (number indicating how often an actor is between the geodesic routes of other actors) [6].

Considering this worldwide user base, privacy is an apparent and crucial concern when it comes to SNSs. Numerous privacy concerns are exacerbated by SNSs, such as surveillance, in which the social realm of SNSs transforms into a commercial sphere and SNS service providers monitor user behavior for the purpose of market force access control. Phishing, Spam, Cross-Site Scripting, Modern Threats, Clickjacking, De-anonymization, Fake Profiles, Identity Clone Attacks, Inference Attacks, Information Leakage, Location Leakage, Cyberstalking, User Profiling [7, 8].

There is no indication of an anomaly. Using the information contained inside the network/graph structure while detecting contextual anomalies, two properties of the data object define the data set [7]:

4. ANOMALIES IN SOCIAL NETWORKING

According to Anand (2017) [2], an anomaly in an Online Social Network is an observation that deviates from the bulk of observations. In other words, an unusual or irregular conduct that deviates from the bulk of social network members. Malicious actions in online social networks have evolved beyond basic spamming and have evolved into reasonably clever assaults that tend to compromise users' privacy. Malicious activity identification is critical to preventing user privacy breaches in OSN, and it is a serious worry for online platforms as well as today's retail portals [5].

4.1. Based on Nature of Anomalies

Anomalies are grouped primarily into three groups depending on their nature and breadth.

(3)

6 4.1.1 Point Anomalies

If a data set contains point abnormalities, which are sometimes referred to as global anomalies, they will be discovered. Despite the fact that point anomalies are the easiest kind of anomaly to detect, one of the most difficult problems associated with identifying point anomalies is determining an appropriate measurement in the object's divergence from other objects. Allow us to make the assumption that in a regular network, each hub must have a minimum of two neighbors that are connected to it. Group V2 has hubs that form this kind of network and, in this way, represents typical behaviour, while Group V1 contains points that are isolated from one another. As a consequence of their varying responses to various hubs, it is projected that they would exhibit abnormal behaviour. Aside from that, we may also encounter local oddities that are investigated in the context of their immediate surroundings, as it were. For example, if we form a group of individuals based on their connections in the network as friends and check their income (or some other parameter), a particular individual, let us say A, may have a genuinely low income compared to his friends, indicating a local anomaly, while his income may be insignificant in the global context, the same number of individuals may have comparative income, indicating normal behavior, and so on.

4.1.2 Contextual Anomalies

If a data item deviates significantly from its intended use in a given context, they are sometimes referred to as conditional anomalies, and they are accessible in a data collection as a result. Consider the temperature, which may be seen as a contextual abnormality. On the off chance that, for example, the temperature today is 28 degrees Celsius. The time and place are important factors in determining if something is abnormal or not. When considering Toronto's winters, it is seen as an anomalous occurrence. However, given Toronto's summers, this level of heat is considered usual, and as a result, the data object's size and the data object's type.

● Attributes of Contextual: These properties provide the object's context. For instance, date and location are contextual characteristics in a temperature graphic.

● Attributes of Behaviour: These properties are used to characterise an object's characteristics, which in turn aids in identifying an object's abnormal behaviour in relation to its surroundings. Temperature, humidity, and so forth may all be regarded behaviour traits in the temperature instance.

4.1.3 Collective Anomalies

Collective anomalies occur when a group of data items has a different behaviour than the rest, even though the individual data pieces are not unusual. One of the fundamental concepts for detecting collective anomalies is to take into account the behaviour of the group of objects in addition to the background knowledge about their connection.

4.1.4. Horizontal Anomalies

Recently, a new kind of anomaly dubbed a horizontal anomaly has emerged in social networks, where the existence of an item exhibits a different behaviour than the rest of the anomalies, depending on the numerous data sources available. For instance, a similar individual may exist in many communities across multiple social media platforms. In essence, a person may have similar sorts of friends on many social networks (e.g., Facebook, Google+), but entirely different categories of friends on another social network (e.g. Twitter). This illustrates an unexpected activity that might be classified as abnormal.

4.2. Based on Static/Dynamic Nature of Network/Graph Structure

Anomalies are further classified depending on the network topology in use to determine whether they are static or dynamic. Static networks, such as bibliographic networks, allow for gradual changes over time, while dynamic networks, such as mobile apps, provide for speedier communications and continual network modifications.

(4)

7 4.2.1. Dynamic Anomalies

A dynamic anomaly occurs in relation to prior network activity, in which the network evolves through time. For example, it might include changes to the way interactions occur in the network.

4.2.2. Static Anomalies

Without regard for the time component, a static anomaly arises in relation to the rest of the network. Only the current behaviour of a node is analysed in relation to the behaviour of other nodes in the network. Anomalies may be labeled or unlabeled depending on the kind of information provided at a node or an edge.

4.2.3. Labeled/ Unlabeled Anomalies

Anomalies are labelled using both the network's topology and the information received from vertex or edge properties. For example, labels on nodes may denote the characteristics of the persons participating in the communication flow, while those on edges may indicate their interaction behaviour. Anomalies that are not indicated are only related to the network structure. No node or edge property is considered. Their categorization is typically as follows, and separate methodologies for detecting these sorts of anomalies have been devised and implemented.

4.2.4. Static labeled/unlabeled Anomalies

This kind of anomaly happens when an individual's behaviour stays static and its properties, such as the time of the persons participating, the type of interactions, and their length, are disregarded due to the network's unlabeled nature, in which labels on nodes and edges are ignored. The manner in which the contact happened is critical. When labels on the vertices and edges of the network structure are considered in addition to the network structure, the anomalous substructures discovered are referred to as static labelled anomalies. Anomalies with static labels are employed in spam detection, for example, to identify opinion spam (which involves the fake product reviews). Normally, a set of hidden labels is added to the vertices and edges to indicate which people will provide bad evaluations, however fraudulent users are understood to do the opposite.

4.2.5 Dynamic unlabeled/labeled Anomalies

When we have dynamic networks that evolve over time, this kind of anomaly occurs. The behaviour of the data item varies according to the previous day and age in relation to the network structure. For example, when examining just the pattern of interactions, a maximum clique may develop in one of six ways; by decreasing, increasing, splitting, merging, appearing, or disappearing. These include comparing the current network structure to the network structure prevailing in a bygone era. Occasionally, normal activity does not result in network alterations; in this case, any region changes may also predict abnormal behaviour. When anomalous behaviour is detected in a dynamic network while taking the labels of the vertices and edges into account, the observed anomalies are categorised as dynamic labelled anomalies. Dynamic networks are studied by evaluating the network's structure at regular time intervals and treating them identically to a static network.

4.3. Based on Behavior

To be particular, "white crow anomalies" and "in-disguise anomalies" are presented here.

4.3.1 White crow: It occurs when a single data item deviates significantly from several observations, fitting the definition of an essential anomaly. For example, while analysing student records, if a record is discovered in which a student's height is recorded as 56 ft, which is impossible, it is classified as a white crow anomaly.

Generally, these abnormalities are discovered as specific nodes, edges, or subgraphs corresponding to the anomalous behaviour.

(5)

8 4.3.2 In-disguise

It is seen as a little variation from the established pattern. For example, somebody seeking to spy into another person's social media account will have no wish to be discovered; hence, he will strive to behave indistinguishably from a regular user. Such as those that are updated iteratively. In the product review framework, a bipartite network is constructed with one subset of vertices representing people and other representing products, with the edges connecting the subsets representing product reviews. Both people and items are allocated hidden labels. For people, the name might be honest or dishonest, while for goods, it can be fantastic or dreadful. A typical honest user would provide correct results, i.e. they will respond positively to excellent items and negatively to bad products. Anomalies are recognized via strange patterns, which may also involve uncommon nodes or entity modifications. These are difficult to detect since they are concealed inside the network.

5. BIG DATA IN SOCIAL NETWORKING

To better comprehend the connection between Big Data research and Social Networks (SN), it's worth recalling why SN are critical suppliers of Big Data. Indeed, SNs create a large amount of diverse data on a continual basis, capturing the most significant information:

usage behaviors. This enormous quantity of data is used by the "Do ut Des" approach of the large businesses (e.g., Amazon, Apple, Facebook, Google, or Microsoft), who provide several services for free in exchange for user data. In this regard, it's worth recalling briefly why SNs perfectly fit the Big Data definition [8]

Volume: There are 4 Billion SN active users;

Velocity: 5 Billion contents are posted every (2-5 new users per second) day;

Variety: Posts contains texts, images, videos;

Variability: Post content is quite heterogeneous;

Veracity: Contents has to be checked as they come mainly from not verified sources;

Value: The market value is 20 Billion Dollar per year mainly spent for social media advertising furthermore, other V‟s have been defined for SNs:

Virality: It refers to the wide use of re-posting that is an easy “cut and paste” strategy for sharing interesting information

Viscosity: It tries to evaluate how much the information diffusion triggers user reactions;

Visualization: As the visual representation of information, intuitively, makes sense of a phenomenon and triggers (sometimes wrong) decisions.

6. DETECTION OF ANOMALY

Three distinct sorts of data mining techniques are employed to identify unusual people in online social networks [10].

6.1. Supervised Learning Techniques

Anomaly supervised both normal and aberrant behaviour is modeled using techniques.

These strategies need pre-labeled data that has been classed as normal or abnormal for the purpose of detecting anomalies. Different training models are used to determine if a dataset contains normal or aberrant data. Supervised procedures use two distinct strategies:

1. The training model is compared to the dataset in order to identify analogues from the data set classed as normal data.

2. In contrast to the preceding procedure, some anomalous data is compared to the training model in order to identify abnormal data in the dataset.

6.2. Unsupervised Learning Techniques.

Unsupervised approaches rely on a phenomenon known as clustering. These techniques do not include data that has been pre-labeled as normal or abnormal. These approaches identify clusters of nodes that exhibit similar behaviour to a group. Occasionally, this assumption is incorrect, since several anomalies also form clusters with a similar pattern.

As a consequence, unsupervised procedures are inefficient in terms of obtaining correct findings.

(6)

9 6.3. Semi-Supervised Learning Techniques

In semi-supervised approaches, the data set is labelled with a single label. The training model automatically detects aberrant classes in the dataset. [11]

7. SOFT COMPUTING

Soft computing is well-defined as a group of computational techniques based on artificial intelligence and natural selection that offers rapid and cost effective solution to very complex problems for hard computing designs do not exist. Soft computing is a consortium of methodologies which work with real life difficulties and offers flexible information processing capabilities for handling real-life and complex circumstances. The primary principle of soft computing is to explore the tolerance for imprecision, uncertainty and partial truth to achieve controllability, strength and low solution cost that are not controlled with conventional hard computing. Initially, soft computing is comprises of three main branches: fuzzy systems, evolutionary computation, artificial neural computing. Till date, many new methods or techniques have been proposed for imprecision, uncertainty and partial truth, which are belong to soft computing.

7.1. Approximate Reasoning

Approximate Reasoning is the process or processes by which a possible indefinite conclusion is construed from a collection of imprecise premises.

7.2. Multivalued Fuzzy Logics

In this mode of approximate reasoning, the antecedents and consequents have fuzzy linguistic variables; the input-output relationship of a system is expressed as a collection of fuzzy IF-THEN rules. This reasoning is mainly used in control system analysis.

7.3. Neural Networks

Neural Network is an information processing paradigm which is inspired by biological nervous system, such as the brain, processed information. it is basically a functional approximator which transform inputs into outputs to the best of its capability.

7.4. Evolutionary Logarithm

Evolutionary algorithms are characteristically used to provide good approximate solutions to problems that cannot be solved easily using other techniques. Many optimization problems come under this category. It may be also computationally-intensive to find an exact solution but sometimes a near-optimal solution is sufficient.

8. CONCLUSION

This paper reviewed and introduced the different ways in which the problem of anomaly detection has been formulated in literature, and have attempted to provide an overview of the huge literature on various techniques associated with social networking. A comprehensive survey on anomaly detection should allow a researcher to not only understand the motivation behind using a particular anomaly detection technique, but also provide a comparative analysis of various techniques in social networking. In this paper many techniques discussed in this survey require the entire test data before detecting anomalies. Recently, techniques have been proposed that can operate in an online fashion. Another upcoming area where anomaly detection is finding more and more applicability is in complex systems.

REFERENCES

1. Kaur, S. & Kaur, P. "Review of different types of Anomalies and Anomaly detection techniques in Social Networks based on Graphs" International Journal of Computer Trends and Technology (IJCTT) , Vol. 47 No.

2, pp. 116-121, 2017 DOI-10.14445/22312803/IJCTT-V47P116

2. K. Anand, J.Kumar & K. Anand "Anomaly Detection in Online Social Network: A Survey" 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), IEEE explore, pp-456- 459,2017,DOI: 10.1109/ICICCT.2017.7975239

3. Rose, Y., Huida, Q., Zhen W., Ching-Yung L. and Yan L., ACM SIGKDD Explorations Newsletter, Vol. 18, Issue 1, pp. 1–14, 2016, DOI-org/10.1145/2980765.2980767

(7)

10 4. Gangele, S., Pathak, D. & Verma, D. , “The Unified Resource Utilization Techniques and Analytical Model in E-

Commerce”, International Journal of Scientific Research in Computer Science, Vol. 2, Issue 5, pp. 1053-1057, 2017

5. Gangele, S., Pathak, D. & Verma, D., “The Analysis of Security Issues and Threat Prevention Model in E- Commerce”, International Journal of Scientific Research in Science and Technology (IJSRST), Vol. 3, Issue 8, pp. 291-296, 2017

6. Raich, V., Verma, N. & Gangele, S., “A Fuzzification of Follicle Stimulating Hormone- Computerized Prediction”, International Journal of Computer Application, Vol. 7, No.5, pp.71-79, 2017.

7. Pathak, D. & Gangele, S., “An Enhanced Fault Tolerant Scheduling Algorithm for Grid Environment”, International Journal of Engineering Sciences & Management, Vol.7, No.2, pp. 189-195, 2017.

8. M.A., Doostari, R., Zeinali, Hamed L., and Mehrana, A., “Anomaly Detection in Cliques of Online Social Networks Using Fuzzy Node-Fuzzy Graph,” Journal of Basic and Applied Scientific Research, Vol. 3, No. 8, pp.

614-626, 2013.

9. Michele I. , M. & G. Sperlí "A survey of Big Data dimensions vs Social Networks analysis" , Journal of Intelligent Information Systems ,Vol. 57, pp 73–100 ,2021, DIO 10.1007/s10844-020-00629

10. M., Aldwairi, A., Alwahedi " Detecting Fake News in Social Media Networks", Science Direct, Vol. 141, pp. 215- 222, 2018, DOI-10.1016/j.procs.2018.10.171

11. P., Bourgonje , J., Moreno Schneider, G., Rehm" From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles" Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism, pp. 84–89 DIO10.18653/v1/W17-4215

12. Sharma S.k., Siva, K. P. bhavya K devi Manaswi "Fake Profile Identification in online Social Networks”, International Journal of Recent Technology and Engineering (IJRTE), Vol.-8, Issue-4, 2019.

13. Kaur, R., Kaur M. , and S. Singh "A Novel Graph Centrality Based Approach to Analyze Anomalous Nodes with Negative Behavior", Procedia Computer Science, Vol. 78, pp 556-562, 2016,DOI- 10.1016/j.procs.2016.02.102

14. Kai S., A. Sliva, S. Wang, J. Tang, H. Liu "Fake News Detection on Social Media: A Data Mining Perspective”, ACM SIGKDD Explorations Newsletter Vol.19, Issue 1,2017, pp. 22–36, DIO 10.1145/3137597.3137600 15. Sadhasivam S., Valarmathie P., Dinakaran K. "Discovering and Expansion the irregular Manners of users in

Online social networks using data mining techniques", Journal of Critical Reviews, Vol. 7, Issue 4, pp. 331- 333, 2020.

16. Mehdi, El. G., N. Zrira, Soufiana M., Imade Benelallam, El H., Bouyakhf " Outlier and anomalous behavior detection in social networks using constraint programming" 2016 IEEE/ACS 13^th International Conference of Computer Systems and Applications (AICCSA) pp. 1-8, DOI: 10.1109/AICCSA.2016.7945699.