International Journal On Advanced Computer Theory And Engineering (IJACTE)
_______________________________________________________________________________________________
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -6, Issue -4, 2017 5
An Aggregation approach using R Language
1Anchit Biswas, 2Samrin Khan R, 3Deblina Sarkhel
1,2Dayananda Sagar College of Arts, Science and Commerce, Bangalore, India.
3Garden City College, Bangalore, India
Abstract— We use the concept of analytics to make better decisions. Although there are several things for which we require analytics. Respect to an organization analytics we can identify which section is performing well. R has some specific advantages so it is one of the most widely used languages in large industry ex: Google, Ford, Twitter, Facebook etc. One of the strongest parts of R that we can integrate with other languages ex: C, C++ and we can connect it with any relational database.
Keywords— Business Analytics, R, Statistical Annalysis, graph.proto(), R MySQL, S3-S4-RC
I. INTRODUCTION
It is a process to examine a large set of data which we gathered to take up a decision. Respect to an organization we can collect some data do research on product or marketing or any other things which can help to develop the growth of the organization. And after doing analytics we can take up a decision that where we have lack of efforts or where exactly we need to do investment for the betterment of the organization. I can be possible that the person who comes from a non- analytic background may not understand the decision that is been achieved from the collected raw data by doing analytics. So, to make it easy to understandable we use concept of visualization. Human mind can actually catch the visual graphics or visual images easily compare to raw data as such. There are certain tools have come across into the market through which we can obtain the visual graph from a huge set of raw data. Ex.
QlickView, Spark, R.
II. WHY R
The R works as programming language as well as statistical language. Apart from that R can be used as data visualization as well as data analytics also. The other tools which is available in the market are used either to achieve visualization or the analytic. It is an open source platform and at the same time easy to learn.
At present day R has become very popular from research area and industrial area. R is a very significant tool for analytics to compute the statistics of the data and perception. R is a very feasible language. So, we don‟t have to write thousand line of code. So generally R is
been preferred to move fast when we need to do analysis a large set of data. As earlier discussed that for non- Analytics Visualization is required where the transformation from raw data into a Decision is easily understandable. R provides that facility across to all kind of users.
R is very effective in statistical analysis, so where calculation of overvalue is difficult. We need to use R to get the effective solution. The process through which we can measure the movement of economic growth is called as economic forecasting. This is actually done by using statistical model. So, this also we could measure using R. ROI represents as Return of Investment. Using R we can estimate the result for an organization weather the investment has given the fruitful reverted or not.
Semantic Clustering helps an organization to understand the gaps or the missing parts which is required to enrich the organization growth. Using R we can analysis that part as well. Data journalism is field of journalism where we can get the numerical value of production house and distribution of information. Using R we can analysis the result more effectively. Using R we can get the exact weather statistics and analyse the growth of climate change even we can predict for the upcoming climate changes overview.
We can see the applications of R in Multiplayer matchmaking also. It‟s a process where we will connect different players into a single platform to play an online session. In game analysis generally we consider few parameters that are understanding what game, what would be the result, goal of the game, advantages etc.
using R also we can analyse efficiently.
III. OTHER WAY OF R
In financial analysis we can assess the profit and loss of the business, the stability of the business, products and manufacturing etc. Using R we can analyse the result and according to that we can take actions in a particular business. Using R we can analyse the credit risk of a business. It depends on borrowed and lending of a business. Through which we can measure a particular business is running under profit or loss.
International Journal On Advanced Computer Theory And Engineering (IJACTE)
_______________________________________________________________________________________________
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -6, Issue -4, 2017 6
In this paragraph we introduce the concept of graph.proto function which we can use to design graphs.
This particular graph can be done by using the Rgraphviz package .To design that we need to create the visual representation of ancestor tree. The Rgraphviz package has an interface known as GraphViz. The proto consists of three arguments in that all three are usually removed.
The first argument is a proto object in which all these proto objects and their parents should be plotted in the graph. In case this argument seems to be removed automatically the present proto object will be assumed.
After that the next argument is a graph to which we need to add up the nodes and edges. If this argument seems to be removes the present proto object will assume an empty graph. The last argument represents a logical variable which describes the orientation of arrows. If the removed arrows are been directed from the child tree to the parent tree .It produce the result as mentioned below:
> library(Rgraphviz) > g <- graph.proto() > plot(g)
Figure 1: Basic diagram of R Application
IV. APPLICATIONS OF R LANGUAGE
R language helps to integrate with other languages like C,C++,java, etc. It helps to communicate with many data sources such as ODBC (compliant database). R language is used to take care of most difficult issues in the fields starting from computational science to extensive marketing. The applications of R language compasses the universe from hypothetical, computational and the hard sciences for example (astronomy, chemistry)
R language is the best for statistics, data analysis and machine learning which is used to create own objects, functions and packages. R is vector oriented which means that the objects are generally treated as a whole.
R is free and open source software so anyone can make use of it. It is mostly used in GNU/LINUX and MICROSOFT WINDOWS .R has no license restrictions so we are able to run it anywhere and at any time. We can even sell it under conditions of the license.
V. DATA BASE IN R
Objects of R created and managed within a single memory area. This frame can be considered as a two dimensional array. R users are used to bring the data into the memory for analysis using one of the many functions such as read CSV() for CSV files. R is a relational database in which systems are stored in normalized form. In order to carry out statistical computing very advance and complex SQL queries are used. Whereas R can connect easily to many relational database like MySQL, ORACLE SQL server etc., if the data is available in R environment it becomes a normal R data set and can be manipulated using all the powerful packages and functions. There is a built in package in R named “R MySQL” which provides connectivity between MySQL databases, which can be installed through R environment. Once it is installed a connection objects R is created to connect to the database which takes username and password. Next is querying the tables where we can query the tables in MySQL using the function dbsendquery(). Then the rows of the tables are updated by passing the update query to dbsendquery() function. Then we insert the data into the tables where rows inserted into the tables in MySQL creating tables in MySQL using function dbwritetable().We can drop the tables in MySQL database by passing “drop table” statement into dbsendquery() function.
VI. FUTURE SCOPE OF R
R is not really a language „R‟, It is an interactive environment for computing statistics. Future of R programming is powerful and good as it is one of the most demanded scripting languages developed by and for statisticians. R is the best thing that has ever happened to analysts/ statisticians/ data scientists. Due to rising demands the competition for hiring data scientists is intense because of which the companies who relied on legacy propriety platforms for statistical analysis have to now adopt a new alternative that is open source R. R has widely spread in the academy as well as in commercial sector. R has been ranked as the 9th most popular language by IEEE spectrum and popular language in data science. Many companies inherit R for data science applications such as:
GOOGLE -> To calculate ROI an advertising company.
FORD -> To improve the design of the vehicles.
TWITTER -> Monitor user experiences.
We National Weather Science -> To prevent severe flooding.
Human Rights Data Analysis Group -> To quantify the impact of war. R is frequently being used by the New York Times to create info graphics and interactive data
International Journal On Advanced Computer Theory And Engineering (IJACTE)
_______________________________________________________________________________________________
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -6, Issue -4, 2017 7
journalism applications. It is a full-fledged programming language in modern time.
VII. OBJECT ORIENTED PROGRAMMING USING R
R language has three different object oriented systems which are S3, S4 and RC (REFERENCE CLASSES). It has a style of object oriented programming which is known as generic-function object orientation. This method of programming is much different from the other programming languages which are for example JAVA, C# or C++. These languages use a technique of message passing object orientation. The message passing method uses a method of passing messages (methods) to the objects which in turn determines which function needs to be called. S3 is different from these types of languages though it uses methods or messages to carry out the program but a different function called the generic function determines which method has to be called.
S4 mostly works similar to S3 but there are two differences between them. The class definitions of S4 are more formal which describes each class with representation and inheritance.S4 has other helper functions which defines generics and methods. S4 has a multiple dispatch which is that the generic functions are able to choose methods which are based on the class or any number of arguments. This object orientation is quite different from that of S3 and S4. This method uses message passing object orientation that means the methods belongs to classes and not functions. The symbol “$” is used to make a separation between methods and objects. RC objects are also capricious:
they don‟t use R‟s usual copy-on-amend semantics, but are modified in place. This makes them tougher to reason about, but allows them to solve problems that are tough to solve with S3 or S4.
VIII.DATA SECURITY USING R LANGUAGE
Everyone needs to be attentive about safety and security on the Web today. The reader has some general familiarity with object oriented programming concepts and with R. The paper will focus on illustrating the package proto through demonstration. The paper remainder is organized as follows Explains how “proto objects are created and illustrates the corresponding methods for setting and getting components. The further discussion is on how the objects oriented delegation is handled and lastly discussed the internals of the package. Additional examples of prototype programming are provided in action. Secondly we demonstrate the calculation of correlation confidence internals using classical and modern methods. Fourthly solution of linear equation s to illustrate program evolution from object based to class based everything
within proto frame work. Finally an appendix provides a reference card that summarizes the functionality contained in proto in terms of its constituent commands.
IX.CONCLUSION
R includes almost all functionalities to compute statistical analysis like distributions, chi-square, logistics etc. In addition to that it has many mathematical functions, matrix, control structures and it is reasonably fast. So the future scope in R is quite large enough in the field of Research and Industry as well.
REFERENCE
[1] Hotho, A., N¨urnberger, A., Paab, G.: A brief survey of tex mining. LDV Forum - LDV Journal for Computational Linguistics and Language Technology (2005)
[2] Cooper, H.M.: The structure of knowledge synthesis, Knowledge in Society, vol. 1 (1988) [3] Aphinyanaphongs Y FAU Aphinyanaphongs,
Y., Aliferis C FAU Aliferis, C.: Text
[4] categorization models for retrieval of high quality articles in internal medicine.In: AMIA Annual Symposium Proceedings. pp. 31–35. No. 1942- 59(Electronic)(2003)
[5] Cohen AM FAU Cohen, A.M., Ambert K FAU Ambert, K., McDonagh M FAU McDonagh,M.:
Cross-topic learning for work prioritization in systematic review creation
[6] and update. In: Journal of the American Medical Informatics Association?:JAMIA. pp. 690–704.
No. 1527-974X (Electronic) (2009)
[7] Munzert, S., Rubba, C., Meibner, P., Nyhuis, D.:
Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. No.
ISBN 978-1-118-83481-7, John Wiley & Sons, Ltd, 1st edn. (2015)
[8] Mousumi Paul , Debabrata Samanta, and Goutam Sanyal,” Dynamic job Scheduling in Cloud Computing based on horizontal load balancing”, International Journal of Computer Technology and Applications (IJCTA) , Vol. 2 (5), pp. 1552- 1556, 2011, ISSN: 2229-6093.
[9] Syed K Ahmed Khadri, D Samanta, Mousumi Paul,” Message communication using Phase Shifting Method (PSM )”,International Journal of Advanced Research in Computer Science (IJARCS), Volume 4, Number 11, pp.9-11 ,November-December 2013.
International Journal On Advanced Computer Theory And Engineering (IJACTE)
_______________________________________________________________________________________________
_______________________________________________________________________________________________
ISSN (Print): 2319-2526, Volume -6, Issue -4, 2017 8
[10] Syed K Ahmed Khadri, D Samanta, and Mousumi Paul, "Approach of Message Communication Using Fibonacci Series: In Cryptology," Lecture Notes on Information Theory, Vol. 2, No. 2, pp. 168-171, June 2014.
doi: 10.12720/lnit.2.2.168-171
[11] Syed K Ahmed Khadri, D Samanta, Mousumi Paul,” Novel Approach for Message Security”, International Journal of Information Science and Intelligent System (IJISIS), pp. 47-52,Volume 3, Number 1, 2014.
[12] Syed K Ahmed Khadri, D Samanta, Mousumi Paul,” Message Encryption Using Text Inversion plus N Count: In Cryptology”, International Journal of Information Science and Intelligent System (IJISIS), pp. 71-74, Volume 3, Number 2, 2014.
[13] Syed K Ahmed Khadri, D Samanta, Mousumi Paul,” Secure Approach for Message Communication”, International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), pp.
3481-3484, Vol. 2, Issue 9, September 2013, Impact Factor: 1.770.
[14] S. K. Ahmed Khadri, D. Samanta, and M. Paul,
“Secure approach for message communication,"
International Journal of Advanced Research in Computer and Communication Engineering, pp.
3481-3484, vol. 2, no. 9, September 2013.
[15] Fereshteh Jaferi, Khadijeh Tanhaei Saeid, Lawrence Borah, and Debabrata Samanta, Recognition of Potential Drug-drug Interactions in Diabetic‟s Patients in Hospital Pharmacy, International Journal of Control Theory and Applications, ISSN : 0974-5572 , 9(10), 2016, pp. 1-11.
[16] Zhao, Y.: R and Data Mining: Examples and Case Studies. Academic Press (2013)
[17] Blei, D.M., Lafferty, J.D.: Topic models (2009) [18] Edinger T FAU Edinger, T., Cohen AM FAU
Cohen, A.M.: A large-scale analysis of the reasons given for excluding articles that are retrieved by literature search during systematic review. In: AMIA Annual Symposium Proceedings. pp. 379–387. No. 1942-597X (Electronic) (Nov 2013)
[19] Reed, C.: Latent Dirichlet Allocation: A Student Companion (2012)
[20] Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in R. Journal of Statistical Software 25(5), 1–54 (3 2008)
[21] K. Hornik, “Package NLP”, CRAN R Project, (2015).
[22] Feinerer, K. Hornik and D. Meyer, “Text mining infrastructure in R”, Journal of Statistical Software, vol. 25, no. 5, (2008), pp. 1-54.
[23] H. Liimatainen, E. Kallionpää, M. Pöllänen, P.
Stenholm, P. Tapio and A. McKinnon,
“Decarbonizing road freight in the future – Detailed scenarios of the carbon emissions of Finnish road freight transport in 2030 using a Delphi method approach”, Technological Forecasting and Social Change, vol. 81, (2014), pp. 177-191.
[24] M. Maechler, and P. Rousseeuw, “Package cluster”, CRAN R Project, (2015).
[25] S. Park and S. Jun, “New Technology Management Using Time Series Regression and Clustering”, International Journal of Software Engineering and Its Applications, vol. 6, no. 2, (2012), pp. 155-160.