• Tidak ada hasil yang ditemukan

Reading in other sources of data in R

N/A
N/A
Protected

Academic year: 2025

Membagikan "Reading in other sources of data in R"

Copied!
12
0
0

Teks penuh

(1)

48

Reading in other sources of data in R

library( ) or require( ) functions

Loading and Listing of Packages

library( )

list all the installed packages

library(package name) or require(package name) Loading the specified package.

Example:

library("UsingR")

data( )

Loads specified data sets, or list the available data sets.

data( )

List all available data sets in loaded package.

data(name of data set) load the specified data sets.

Example:

data(survey,package="MASS")

# will not load help files for data set or the rest of package library("MASS")

data(survey) #better

Accessing the variables in a data set:

names( )

get or set the names of an object of the data set.

attach( ) and with( ):

Attach Set of R Objects to Search Path.

Important note:

Cannot change variable values in attached dataset

library

package

data set

(2)

Example

summary(women$height) # refers to variable 'height' in the data frame (data set women)

attach(women)

summary(height) # The same variable now available by name detach()

with(data.frame,command) # attach & detach

example in p26 names(Sitka) Sitka

data(Sitka) library(MASS) data(Sitka) names(Sitka) length(tree)

length(Sitka$tree) Sitka$size[tree>78]

Error: object "tree" not found Sitka$size[Sitka$tree>78]

[1] 2.99 3.61 4.48 4.91 5.06

with(Sitka,list(a=range(tree),b=table(treat),c=max(Time))) attach(Sitka)

Sitka$size[tree>78]

[1] 2.99 3.61 4.48 4.91 5.06 summary(Sitka)

detach(Sitka) tree

easy example to create dataframe weight = c(150, 135, 210, 140) height = c(65, 61, 70, 65)

gender = c("Fe","Fe","M","Fe")

study = data.frame(weight,height,gender) # make the data frame study

row.names(study)<-c("Mary","Alice","Bob","Judy") study

rm(weight) # clean out an old copy weight

Error: Object "weight" not found attach(study)

weight

(3)

50

UsingR package

Write these commands in R

>where="http://www.math.csi.cuny.edu/UsingR"

>install.packages("UsingR",contriburl=where) OR

> install.packages("UsingR")

--- Please select a CRAN mirror for use in this session ---

trying URL 'http://cran.wustl.edu/bin/windows/contrib/2.3/UsingR_0.1-4.zip' Content type 'application/zip' length 1419692 bytes

opened URL

downloaded 1386Kb

package 'UsingR' successfully unpacked and MD5 sums checked The downloaded packages are in

C:\Documents and Settings\ÃæíÓ\Local

Settings\Temp\RtmpQCZ8ub\downloaded_packages updating HTML package descriptions

(4)

Import data into R:

Read data into a vector or list from the console or file.

Cut or copy and paste.

Using c

The c operator combines values. One of its simplest usages is to combine a sequence of values into a vector of values. For example

> x = c(1,2,3,4)

stores the values 1,2,3,4 into x. This is the easiest way to enter in data quickly, but suffers if the data set is long.

Scan()

using scan

The function scan at its simplest can do the same as c. It saves you having to type the commas though.

Notice, we start typing the numbers in, If we hit the return key once we continue on a new row, if we hit it twice in a row, scan stops.

Example:

> w=scan() 1: 3 4 6 7 8 7 9 9

9: # press enter (blank line) Read 8 items

dump()

The function dumb() can be used to write values of R object to a text file.

This function takes a vector of names of R objects and produces text representations of the objects on a file or connection. A 'dump' file can usually be 'source'd into another R (or S) session.

Arguments:

dump(list, file = "dumpdata.R",…)

Example:

dump("x","filename.txt") # or can write a vector of objects in one file dump("w","infile.txt")

dump("s","filename.doc") # any extension dump("z") # save as dumpdata.R

dump("h") # overwrite on dumpdata.R

dump(c("r","m","q"),"filename.txt") # save more than one object in one file

(5)

52

Get or set working directory of R

getwd() # filename representing the current working directory of the R process setwd() # setwd(dir) is used to set the working directory to dir

source

Read R Code (commands) from a File or a Connection 'source' causes R to accept its input from the named file or URL (the name must be quoted) or connection. Input is read and 'parse'd by from that file until the end of the file is reached, then the parsed expressions are evaluated sequentially in the chosen environment.

Examples:

source("infile.txt")

Examples:

whales=scan()

1: 74 122 235 111 292 111 211 133 156 79 11:

Read 10 items

dump("whales","f1.txt")

In a new session of R

source("f1.txt") # open file name f1.txt and get the data from it or any R commands stored

whales

(6)

reading data from formatted data source:

the above methods (scan) can be used in a few data points (10-40 say), but you might want to use a file if you have more.

If we have a data in txt file with spaces between them, we can read them as follows

Using scan with a file:

If we have our numbers stored in a text file, then scan can be used to read them in. You just need to tell scan to open the file and read them in.

scan(file="f2.txt")

The scan function has other options, one particularly useful one is the choice of separator.

scan(file="f2.txt",sep=",") #if comma between values

Reading in tables of data:

If you want to enter multivariate sets of data, you can do any of the above for each variable. However, it may be more convenient to read in tables of data at once if your data is in tabular form.

read.table(file="filename",header=T) #read data frame or tables with column name

Note

header=T: specifying that the first line is a header containing the names of variables contained in the file.

Spreadsheet data

Alternatively, you may have data from a spreadsheet. The simplest way to enter this into R is through a file format that both applications can talk. Typically, this is CSV format (comma separated values). First, save the data from the spreadsheet as a CSV file say data.csv. Then the R command read.csv will read it in as follows

> x=read.csv(file="data.csv")

If you use Windows, there is a developing package RExcel which allows you to do much much more with R and that spreadsheet. If you use linux, there is a package for interfacing with the spreadsheet gnumeric (http://www.gnome.org).

read.cvs() #Prepare the data matrix in a spreadsheet program and save as cvs files read.table(file=file.choose()) # choose the file u need to read without writing its name

Read file from anywhere see p29

read.table cannot read directly from any type of file directories,

Best to use (text file).

(7)

54

site="site name"

read.table(file=site,header=T)

“Foreign” formats

The package foreign allows you to read in other file formats from popular statistics packages such as SAS, SPSS, and MINITAB. For example, to read MINITAB portable files the R command is read.mtp.

Importing from SPSS

The recommended package foreign provides import facilities for files produced by SPSS.

Function read.spss() can read files created by the `save' and `export' commands in SPSS.

It returns a list with one component for each variable in the saved data set. SPSS variables with value labels are optionally converted to R factors.

library(foreign) read.spss(file="file") Example:

library(foreign)

?read.spss

## try to put the spss file in R directory

read.spss("smoking.sav", use.value.labels = T, to.data.frame = T)

for more detailes see

http://cran.r-project.org/doc/manuals/R-data.html

(8)

subset:

Returns subsets of vectors or data frames that meet specific requirements, subset can be used to select rows of a data frame.

Example:

library(MASS) data(Cars93)

attach(Cars93) # unnecessary in this case Vans <- subset(Cars93,Type=="Van") detach(Cars93)

Type

Vans <- subset(Cars93,Type=="Van")")") ")

transform:

The transform function can be used to add new variables to a data frame using the old ones.

Example:

Cars93T <- transform(Cars93,WeightT=Weight/1000) names(Cars93)

names(Cars93T)

Grouped data and data frames Example:

attach(mtcars) #(built-in data set)

?mtcars

mtcars$mpg[mtcars$cyl==4]

# same as

mtcars$mpg[cyl==4]

Another way

split(mtcars$mpg,mtcars$cyl) # same as > split(mpg,cyl) detach(mtcars)

class:

The class( ) function can be used to know the object of the data sets.

Example:

class(mtcars) [1] "data.frame"

class(lynx)

[1] "ts" # time series

(9)

56

Order() Function:

The order( ) tell you which elements correspond to which order in your vector. Then, it can be used to determine the order needed to sort a particular vector.

• The command sort( ) can be used to sort a vector, but if you want to sort more than one variable, it is best to use the order function, which returns an integer indexing vector that can be used to get the sorted vectors.

Sort datasets

Normally we have a data frame of several columns (variables) and many rows (observations). The goal is to shuffle the rows so that they are ordered by the values of one or more columns (to re-order the rows of a data frame by one or more columns). This is done with the order function.

order(…,decreasing = FALSE) # increasing order The following is a short example of this.

Example:

x=c(4,2,8,1) y=c(1,2,5,1) sort(x) [1] 1 2 4 8 order(x) [1] 4 2 1 3 x[order(x)]

[1] 1 2 4 8

order(x,y) # order(x,y) will give an order based on x; ties are resolved according to the values of y.

[1] 4 2 1 3 order(y,x) [1] 4 1 2 3

xy=data.frame(x,y) xy[order(x,y),]

x y 4 1 1 2 2 2 1 4 1 3 8 5

xy[order(y,x),]

x y 4 1 1 1 4 1 2 2 2 3 8 5

(10)

Example:

exam1=c(9,7,6,9,10,8,5,5,4) exam2=c(9,5,7,9,9,7,9,3,3)

final=c(38,30,36,39,39,35,38,29,29) x=data.frame(exam1,exam2,final) x

of=order(final) of

[1] 8 9 2 6 3 1 7 4 5

o=order(final,exam2,exam1) o

[1] 9 8 2 6 3 7 1 4 5 x[o,]

exam1 exam2 final 9 4 3 29 8 5 3 29 2 7 5 30 6 8 7 35 3 6 7 36 7 5 9 38 1 9 9 38 4 9 9 39 5 10 9 39 Note

The command sort(x) is equivalent to x[order(x)].

(11)

58

Example Q. (4.7) P. 124

Use the data set mtcars (built-in data set)

?mtcars

names(mtcars)

[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"

[11] "carb"

1- sort the data set by weight, heaviest first.

attach(mtcars)

order(wt,decreasing=T)

[1] 16 17 15 12 25 24 14 13 7 31 22 6 5 10 11 23 4 8 29 9 2 32 30 1 21 [26] 3 18 27 26 20 19 28

mtcars[order(wt,decreasing=T),]

2- which car gets the best mileage (largest mpg)? Which gets the worst?

sort(mpg) # same as sort(mtcars$mpg) if we don’t use attach()

[1] 10.4 10.4 13.3 14.3 14.7 15.0 15.2 15.2 15.5 15.8 16.4 17.3 17.8 18.1 18.7 [16] 19.2 19.2 19.7 21.0 21.0 21.4 21.4 21.5 22.8 22.8 24.4 26.0 27.3 30.4 30.4 [31] 32.4 33.9

range(mpg) [1] 10.4 33.9

#we just have the sorting number but the question want the cars name rownames(mtcars[mpg==range(mpg),]) #increasing order [1] "Cadillac Fleetwood" "Toyota Corolla"

#Another method:

rownames((mtcars[order(mpg),])[c(1,length(mpg)),]) [1] "Cadillac Fleetwood" "Toyota Corolla"

detach() # same as detach(mtcars)

(12)

Sample function:

Random Samples and Permutations

sample(x, size, replace = FALSE, prob = NULL)

Arguments:

x: Either a (numeric, complex, character or logical) vector of more than one element from which to choose, or a positive integer.

size: non-negative integer giving the number of items to choose.

replace: Should sampling be with replacement?

prob: A vector of probability weights for obtaining the elements of the vector being sampled.

Examples:

sample(1:30,10,F) # without replacement sample(1:30,10,T)

sample(1:30,31,F)

Error in sample(length(x), size, replace, prob) : can't take a sample larger than the population when replace = FALSE

sample(10,5) sample(10)

sample(1e6,40) # sample of 40 from 1,000,000

sample(c(1,3,7,9),10,T,prob=c(0.3,0.2,0.1,0.5)) sample(0:1,100,T,c(0.3,0.7)) # Binomial(100,0.7) sample(0:1,1,T,c(0.3,0.7)) # Bernolli(0.7)

## To simulate rolling a fair die 30 times:

die <- 1:6

x <- sample(x=die,size=30,replace=T)

## To flip a fair coin 1000 times,

sample(c("H","T"), size = 1000, replace =TRUE).

Referensi

Dokumen terkait

You might already have cubes if you tried to grab a column from a data set to a Cross Tab report item; however, if you do not have a cube, you can create one from the context

If you just want to make a few edits to a template you already have installed on your server, then simply navigate to the Extensions → Template Manager from within the

If you just want to sell a small part of your gold reserve you can always just select a few coins, as compared to gold bars, which can only be sold whole!. This increases

If, when you exhibit your art, you value what people have to say about your work (and I don’t mean if you want everyone to love everything you do otherwise you will sulk and

If you want to activate a size changing command for a whole paragraph of text or even more, you might want to use the environment syntax for font changing

 If you change your mind about an answer you have crossed out and now want to choose it, draw a ring around the cross as shown... A few weeks later when I visited the farm again, I

If the author assumed that some people did not mind having them in the house, the author might say something like “if you want to get rid of the fruit flies” or “for those of you who

You can get a nicely formatted web page aboutmagic_recap with publish magic_recap If you want to concentrate on learning Matlab, make sure you read, run, and understand the