Cybercrime Analysis and Data Mining Methodologies

(1)

International Journal on Advanced Computer Theory and Engineering (IJACTE)

_______________________________________________________________________________________________

Cybercrime Analysis and Data Mining Methodologies

1Deepti Gaur, ²Neha Aggarwal

Department of Computer Science & Information Technology ITM University, Gurgaon, Haryana, India.

Abstract— In this paper authors presented about the crime data mining a latest emerging area in the field of information security. Paper also include the complete survey of all the mining methodologies available along with the of data mining steps involved in the crime. Crime can be national or international but its always a distractive process in the society.

Index Terms—Crime Data Mining, Precision, Recall, Hotspots, Techniques ,CRISP-DM methodology.

I. INTRODUCTION

Crime is identified as an act which is punishable by legislation in accordance with Thakur[9]. However, an act that is considered as a crime in one place and time may not be true in another place or time. According to Andargachew (1988), a criminal is an individual person who has violated the legally forbidden act. In fact, there are some factors that have to be taken into account to convict whether a person should be considered as a criminal or not. Among these, an individual should be of competent age in light with the law ; and there must be a well-predefined punishment for the particular act committed.

Offense has increasingly become as complex as human nature. Contemporary technological improvement and huge development in communication have facilitated criminals of every place of the planet to spend a crime applying advanced equipment in one single place and then escape to a different place[9]. Now adays the globe is facing the proliferation of problems such as for example illicit drug trafficking, smuggling, hijacking, kidnapping, and terrorism.

The level of crime also depends upon the situation and also varies from state to state .

Crime Prevention

The causes for the growing rate of crime include unemployment, economic backwardness, over population, illiteracy and inadequate equipment of the police force. The form of seriousness and size of the crime, may rely on the form of a society and thus its nature changes with the growth and development of the social system. In every generation it has its own most

critical, new and special problems of crime, although the crime problem is as old as man himself. In addition to this, the techniques employed to commit crime are new in the sense that they make use of modern knowledge and technique. The rise in crime both national and international is generally thought as the result of interplay between socio-economic changes. The circumstances surrounding the individual offender such as his personality, physical characteristics intelligence, family background, environmental surrounding such as peer groups, neighbors etc have been subject of the study of crime. (Andargachew, 1988). So by, understanding the attributes of criminals will be helpful to design and implement proper crime prevention strategies. The Governments usually establish organizations such as courts, prosecutions and police, which are responsible for the maintenance of law and order in their respective country. These agencies and other related organizations are responsible to curb the rate and occurrence of crimes. The crime prevention agencies need to issue and implement crime prevention strategies[8]:

 Prevention safeguards the life and property of the society whom the authorities are in duty to protect.

 It avoids much of difficulty to the prey equally bodily and mental.

 Crime elimination rules out litigation, which follows along the way of sensing a crime.

 Prevention also saves the authorities from the difficulty of producing crime at all strange hours of the afternoon and evening and of using immediate activity for the investigation.

II. DATA MINIG

Data Mining could be the computational procedure for exploring patterns in large information sets involving practices at the junction of synthetic intelligence, machine understanding, data, and repository programs.

The entire goal of the info mining process is always to remove information from a information collection and convert it in to an understandable framework for further

(2)

use. Besides the organic examination step, it requires repository and information management aspects, information pre-processing, product and inference considerations, interestingness metrics, complexity considerations, post-processing of found structures, visualization, and on the web updating.

The actual information mining job could be the automated or semi-automatic examination of large amounts of information to remove previously unknown interesting patterns such as for instance categories of information records (cluster examination), unusual records (anomaly detection) and dependencies (association concept mining). That usually requires applying repository methods such as spatial indices.

These patterns will then be seen as a kind of overview of the feedback information, and may be used in further examination, for instance, in machine understanding and predictive analytics. For example, the info mining step may recognize multiple communities in the info, which will then be properly used to acquire more precise prediction effects by a choice support program. Neither the information variety, information preparation, or effect model and revealing are part of the information mining step, but do fit in with the general KDD process as additional steps[3][7]. This research paper contain the following sections: Data Generation that describes the data set ; Handling of information; techniques involved in Data Mining.

1. Data Generation

The research data was gleaned from multiple cities agencies. Every real data entry is a record for an crime or related event. Each record contains the type of crime , the location of crime in longitude and latitude, and time - date of the crime incident happened . Before beginning with data mining , a preprocessing is required to make it suitable for classification.

A. Data Grid

For the deployment of this crime prediction model the police-department requirement is to forecast the crime such as residential burglary over space and time.

Accordingly, across a uniform grid the model classifies burglaries monthly. The city is divided into checkerboard-like cells by the help of grid. Now each cell contain data combined into six categories namely Arrest, Residential Burglary[4], Commercial Burglary , Motor Vehicle Larceny and Street Robbery, Foreclosure. On a monthly basis each cell is populated.

The researched data was of two resolutions . The first measure is 24-by-20 square grid cells and the other measure is 41-by-40. The cells in the 24-by-20 grid measure distance is one-half mile square. In 41-by-40 grid, the distance measure is over one-quarter mile square. In both cases, data set is a matrix on monthly basis of the six earlier mentioned categories. The two resolutions as finer resolution make grid to be interrogated with more detail toward the inherent spatial information in the dataset. Conversely, lower resolution has effect of generalizing the spatial knowledge.

B. Empty Grid Cells

Empty grid cells need to be taken from the datasets because they have a detrimental yet counter instinctive area effect. They enhance the efficiency of the classifiers. It is simple for almost any given classifier to precisely estimate that nothing may happen in an empty grid cell. That ―intelligence‖ is really artificial. An empty grid cell is defined as missing any rely for the reason that cell in some of the investigated classes around the entire schedule being analyzed. Many empty grid cells have two explanations. One, the limits of the city aren't rectangular like the grid getting used is, and two, there are many places within the city limits such as for example airport runways, bodies of water, and community start spaces wherever these activities only don't happen. The result is empty grid cells that have to be removed[5][6].

2. Handling Information

One challenge in offense prediction, just like different unusual occasion prediction, is that locations and cool places are unbalanced. That's cool places are a whole lot more widespread than hotspots. Inside our dataset, that is especially true with the bigger quality 41-by-40 grid.

It has the consequence of puzzling the necessary measures of detail, recall, and F1. In particular, the F1 report of locations is far less than the F1 report of cool places as the classifiers are properly qualified on cool spots. The computation on F1 report inside our examine is defined the following:

F1= (2*precision*Recall)/ Precision + Recall Where,

Precision = TP / (TP+FP) Recall = TP / (TP+FN ) Where,

TP= predicts the true Hotspots i.e., no. Of true positives FP= predicts the false Hotspots i.e., no. Of false positives

FN= predicts the false Coldspots i.e., no. Of false negatives

To solve this matter, we adjust the weight of hotspots and cold spots. By raising the weight of hotspots on the basis of the proportion between hotspots and coldspots, the information set may be balanced ahead of the classification process. The weight function is identified by these:

where,

C = Total number of coldspots and H = total number of hotspots

(3)

_______________________________________________________________________________________________

III. DATA MINING IN CRIME

Most law enforcement agencies today are faced with large volume of data that must be processed and transformed into useful information (Brown, 2003).

Data mining can greatly improve crime analysis and aid in reducing and preventing crime. Brown (2003) stated

"no field is in greater need of data mining technology than law enforcement." One potential area of application is spatial data mining tools which provides law enforcement agencies with significant capabilities to learn crime trends on where, how and why crimes are committed (Veenendaal and Houweling, 2003). Brown (2003) developed a spatial data mining tool known as the Regional Crime Analysis Program (ReCAP), which is designed to aid local police forces (e.g. University of Virginia (UVA), City of Charlottesville, and Albemarle County) in the analysis and prevention of crime. This system provides crime analysts with the capability to sift on data to catch criminals. It provides spatial, temporal, and attribute matching techniques for pattern extraction[10].

Data mining is just a powerful software that permits offender investigators who may possibly absence considerable training as data analysts to investigate big listings rapidly and efficiently[1].

Table 1explains some types of offense, such as for example traffic violations and arson, primarily problem police at the town, district, and state levels.

Table 1: Crime data national and international level

IV. CRIME DATA MINING TECHNIQUES

By increasing performance and lowering errors, offense data mining practices can aid police function and permit investigators to spend their time to other useful tasks. A number of the practices are standard and some are currently in used .The flow graph of practices is show below that assist in showing the practice involved in Crime Data Mining as follows:

FLOW CHART:

Entity Extraction determines unique styles from knowledge such as for example text, images, or sound materials[2]. FIt has been used to instantly identify individuals, addresses, vehicles, and particular faculties from police narrative reports. In pc forensics, the removal of pc software metrics including the information design, program movement, organization and level of remarks, and usage of variable names- can help more research by, for instance, group related applications published by hackers and searching their behavior. Entity Extraction gives simple information for crime analysis, but their performance depends greatly on the availability of extensive levels of clear insight data.

Clustering methods group knowledge goods in to courses with related faculties to maximize or reduce intraclass similarity- for instance, to recognize suspects who perform violations in related methods or separate

(4)

among groups belonging to different gangs. These methods do not have some predefined courses for assigning items. Some experts utilize the statistics-based idea place algorithm to instantly connect different things such as for example individuals, companies, and vehicles in crime records. Using link analysis methods to recognize related transactions, the Financial Crimes Enforcement Network AI Program exploits Bank Secrecy Act knowledge to guide the detection and analysis of money laundering and different economic crimes. Clustering crime incidents can automate a major part of crime analysis but is limited by the high computational depth an average of required.

Association rule mining finds often occurring product sets in a repository and gifts the styles as rules. That method has been applied in system intrusion detection to obtain association rules from consumers' connection history. Investigators can also use this method to system criminals' users to help find possible future system attacks. Much like association rule mining, consecutive sample mining finds often occurring sequences of goods around some transactions that happened at different times. In system intrusion detection, this approach can identify intrusion styles among time-stamped data.

Featuring concealed styles benefits crime analysis, but to obtain significant results involves rich and very structured data.

Deviation detection employs unique actions to study knowledge that varies markedly from the remaining portion of the data. Also called outlier detection, investigators can use this method to fraud detection, system intrusion detection, and different crime analyses.

Nevertheless, such activities will often seem to be standard, rendering it difficult to recognize outliers.

Classification finds frequent attributes among different crime entities and organizes them in to predefined classes. That method has been used to recognize the origin of e-mail spamming based on the sender's linguistic styles and structural features. Often used to predict crime tendencies, classification can lower the full time needed to recognize crime entities.

Nevertheless, the method requires a predefined classification scheme. Classification also involves fairly complete instruction and testing knowledge must be high degree of missing knowledge could restrict forecast accuracy.

String comparator methods assess the textual fields in sets of repository files and compute the likeness between the records. These methods can find misleading information—such as for example name, handle, and Cultural Safety number-in criminal records.

Investigators may use string comparators to analyze textual knowledge, but the methods often require intensive computation.

Social system analysis explains the roles of and relationships among nodes in a conceptual network.

Investigators can make use of this method to create a system that shows thieves roles, the movement of real

and intangible things and information, and associations among these entities. More analysis can show important roles and subgroups and vulnerabilities in the network.

This approach permits visualization of criminal networks, but investigators however mightn't manage to uncover the network's true leaders when they hold a reduced profile.

Similarity Measures :Whether two entities are similar is semantically dependent on application and is defined by the user[5]. There are different similarity measures for different types of data. For quantitative data, we can use Euclidian distance, Minkowski distance and other measures to measure the similarity. For qualitative attributes, a simple and commonly used approach is binary similarity measure. Suppose ai and bi are the values of the i-th attributes of A and B respectively. Let si (A, B) denote the similarity on the i-th attribute between A and B. si(A, B)=1 if ai=bi and 0 if ai≠bi. In this way, qualitative data can be converted into quantitative data and some similarity measures for quantitative data can be used. If the sets have a weighted structure, the similarity is defined by taking into account the values of weights wi:

Now We look into current methodologies for crime data mining, which are available in current crime data mining literature. CRISP-DM methodology (CRISP-DM: Cross- Industry Standard Process for Data Mining) like SEMMA methodology (SEMMA: Sample, Explore, Modify, Model, Assess) refers to more general process of data mining. CIA intelligence methodology refers to life cycle of converting data into intelligence, which is also a well-known methodology. Van der Hulst's methodology is specifically developed for criminal networks, including specific steps for identifying and analysing criminal networks. Last but not the least, AMP A(Actionable Mining and Predictive Analytics) methodology is developed by McCue for better understanding of crime data mining.Table2 include details of available methodologies.

(5)

_______________________________________________________________________________________________

V. CONCLUSION:

In this paper author presented the systematic method of crime detection at national and international level. As crime data is increasing now to control the crime is again become a difficult task so to solve this problem author is presenting a systematic way of mining crime data detection classification in such a way so that it’s become easy to solve the crime problem throughout the world.

REFERENCES

[1] U.M. Fayyad and R. Uthurusamy, ―Evolving Data Mining into Solutions for Insights,‖

Comm. ACM, Aug. 2002, pp. 28-31.

[2] W. Chang et al., ―An International Perspective on Fighting Cybercrime,‖ Proc. 1st NSF/NIJ Symp. Intelligence and Security Informatics, LNCS 2665, Springer-Verlag, 2003, pp. 379- 384.

[3] C. Morselli, inside Criminal Networks, New York - USA, Springer Science+Business Media LLC.2009.

[4] G. Wang, H. Chen, and H. Atabakhsh,

―Automatically Detecting Deceptive Criminal Identities,‖ Comm. ACM, Mar. 2004, pp. 70- 76.

[5] D.E. Brown, S.C. Hagen. 2003. ―Data association methods with applications to law enforcement. Decision Support Systems‖, 34 (4): 369– 378.

[6] Bao, H (2003). ―Knowledge Discovery And Data Mining Techniques And Practice‖.

http://www.netnam.vn/unescocourse/knowlegd e/3-1.htm

[7] George Kelling and Catherine Coles. Fixing Broken Windows: Restoring Order and Reducing Crime in Our Communities, ISBN: 0- 684-83738-2.

[8]. P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, R. Wirth, "CRISP-DM 1.0 step-by-step data mining guide", Technical report, The CRISP-DM Consortium, http://www.crispdm. orglCRlSPWP-0800.pdf], August 2000.

[9]. Thakur, C. (2003).‖ Crime Control‖, http://

ncthakur. itgo.com /chand3c.htm

[10]. S. Ruggieri, D. Pedreschi and F. Turini, ―Data mining for discrimination discovery‖. ACM Transactions on Knowledge Discovery and data 4(2), Article 9, ACM,2010.

