How to process the data
2.7 R and data
2.7.3 How to load the text data
In essence, data which need to be processed could be of two kinds: textandbinary.
To avoid unnecessary details, we will accept here thattext datais something which you can read and edit in thesimple text editor like Geany6. But if you want to edit thebinary data, you typically need a program which outputted this file in the past.
Without the specific software, the binary data is not easy to read.
Text data for the statistical processing is usually text tables where every row corre- sponds with the table row, and columns are separated withdelimiters, either invis- ible, like spaces or tab symbols, or visible, like commas or semicolons. If you want Rto “ingest” this kind of data, is is necessary to make sure first that the data file is located within the same directory whichRregards as aworking directory:
> getwd()
[1] "d:/programs/R/R-3.2.3"
If this is not the directory you want, you can change it with the command:
> setwd("e:\\wrk\\temp") # Windows only!
> getwd()
[1] "e:/wrk/temp"
Note howRworks with backslashes underWindows. Instead of one backslash, you need to entertwo. Only in that caseRunderWindowswill understand it. It is also possible to use slashes underWindows, similar toLinuxandmacOS:
> setwd("e:/wrk/temp")
> getwd()
[1] "e:/wrk/temp"
5By the way, if you want the Euler number,e, typeexp(1).
6And also like editor which is embedded intoRforWindowsor intoR macOSGUI, or the editor from riteRpackage, butnotoffice software like MS Word or Excel!
Please alwaysstarteach of yourRsession fromchanging working directory. Ac- tually, it is not absolutely necessary to remember long paths. You can copy it from your file manager intoR. Then, graphical RunderWindowsandmacOShave rudi- mentary menu system, and it is sometimes easier tochange working directory though the menu. Finally, packageshipunovcontains functionFiles()which is the textual file browser, so it is possible to runsetwd(Files())and then follow screen instruc- tions7.
The next step after you got sure that the working directory is correct, is to check if your data file is in place, withdir()command:
> dir("data") [1] "mydata.txt"
It is really handy to separate data from all other stuff. Therefore, we assumed above that you have subdirectorydatain you working directory, and your data files (in- cludingmydata.txt) are in that subdirectory.Please create it(and of course,cre- ate the working directory) if you do not have it yet. You can create these with your file manager, or even withRitself:
> dir.create("data")
* * *
Now you can load your data withread.table()command. But wait a minute! You need tounderstand the structureof your file first.
Commandread.table()is sophisticated but it is not smart enough to determine the data structure on the fly8. This is why you need to check data. You can open it in any available simple text editor, in your Web browser, or even from insideRwith file.show()orurl.show()command. It outputs the data “as is”. This is what you will see:
> file.show("data/mydata.txt") a;b;c
1;2;3 4;5;6 7;8;9
7Yet another possibility is to set working directory in preferences (this is quite different between op- erating systems) but this is not the best solution because you might (and likely will) want different working directories for different tasks.
8There isriopackage which can determine the structure of data.
(By the way, if you typefile.show("data/my and pressTab,completionwill show you if your file is here—if it is really here. This will save both typing file name and checking the presence withdir().)
How did the filemydata.txtappear in your data subdirectory? We assume that you already downloaded it from the repository mentioned in the foreword. If you did not do it, pleasedo it now. It is possible to perform with any browser and even withR:
> download.file("http://ashipunov.info/data/mydata.txt", + "data/mydata.txt")
(Within parentheses, left part is for URL whereas right tellsRhow to place and name the downloaded file.)
Alternatively, you can check your file directly from the URL withurl.show() and then useread.table()from the same URL.
* * *
Now time finally came to load data intoR. We know that all columns have names, and therefore usehead=TRUE, and also know that the delimiter is the semicolon, this is why we usesep=";":
> mydata <- read.table("data/mydata.txt", sep=";", head=TRUE)
Immediately after we loaded the data, we must check the new object. There are three ways:
> str(mydata)
'data.frame': 3 obs. of 3 variables:
$ a: int 1 4 7
$ b: int 2 5 8
$ c: int 3 6 9
> head(mydata) a b c
1 1 2 3 2 4 5 6 3 7 8 9
Third way is to simply typemydatabut this is not optimal since when data is large, your computer screen will be messed with content. Commandshead()andstr() are much more efficient.
To summarize, local data file should be loaded intoRinthree steps:
1. Make sure that youdata is in place, withdir()command,Tabcompletion or through Web browser;
2. Take a look on data withfile.show() orurl.show() command and deter- mine its structure;
3. Loadit withread.table()commandusing appropriate options(see below).