Ding R intro cn

(1)

R

2.3.0 (2006-04-24) 0.1 (2006-06-15)

W. N. Venables, D. M. Smith

(2)

:

Copyright c°1990 W. N. Venables

Copyright c°1992 W. N. Venables & D. M. Smith Copyright c°1997 R. Gentleman & R. Ihaka Copyright c°1997, 1998 M. Maechler

Copyright c°1999 2006 R Development Core Team

Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this manual into another

language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Development Core Team.

c

°1990 W. N. Venables c

°1992 W. N. Venables & D. M. Smith c

°1997 R. Gentleman & R. Ihaka c

°1997, 1998 M. Maechler c

°1999 2006 R Development Core Team

R R Development Core Team

GNU FDL GNU

(3)

0.2 ( 05 ) . . . viii

0.3 ( PDF ) . . . x

1 1 1.1 R . . . 1

1.2 . . . 1

1.3 R . . . 2

1.4 R . . . 2

1.5 R . . . 3

1.6 R . . . 4

1.7 . . . 4

1.8 R . . . 5

1.9 . . . 6

1.10 . . . 6

1.11 . . . 6

2 8 2.1 . . . 8

2.2 . . . 9

2.3 . . . 10

2.4 . . . 11

2.5 . . . 12

2.6 . . . 13

2.7 . . . 13

(4)

(5)

(6)

(7)

13 100

13.1 . . . 101

13.2 CRAN . . . 101

13.3 . . . 101

1 102

2 R 108

13.4 R. . . 108

13.5 Windows R. . . 112

13.6 Mac OS X R . . . 112

3 114

13.7 . . . 114

13.8 . . . 114

13.9 . . . 115

4 116

5 118

(8)

Bill Venables David Smith R

[email protected] [email protected]

0.1

R A session < 102>

R R sessions

R

( < 83> )

R

http://cran.r-project.org/mirrors.html

“Precompiled Binary Distributions”

Windows “Windows (95 and later)” “base”

“rwxxxx.exe” rw2010.exe Windows OK

A session < 102>

0.2 (

05 )

R R

(9)

R

ROTATION R

Stata MATLAB SPSS SAS R

R

• R

R MATLAB

• R

R R

•

• R / R

R R

R

http://www.r-project.org/

foundation/about.html

Texinfo Texinfo PDF

HTML ( UTF-8)

Texinfo LaTeX PDF

email

PDF R

R An Introduction to R

R Data Import/Export The R language definition Writing R

(10)

R

R for beginners XF Wang

R for beginners R : )

(11)

Roation R R LA

TEX R

Email

Email [email protected]

(12)

R

•

• ( ‘S’)

S

, “ ” environment R

, ,

R

packages R

1.2

R Bell Laboratories Rick Becker John Chambers

Allan Wilks S S SPLUS

S John Chambers

(13)

Wilks The New SLanguage: A Programming Environment for Data Analysis and Graphics John M. Chambers and Trevor J. Hastie Statistical Models in S

1991 S 3 1

methods method

class John M. Chambers Programming with Data

< 122> .

: John M. Chambers 1988 S3

2

(14)

R

1.5 R

R

> UNIX shell

R

UNIX shell $

UNIX R

1. work R

R

$ mkdir work $ cd work

2. R

$ R

3. R ( )

4. R

> q()

R

yes no,

cancel

R R

R

1. work

$ cd work $ R

(15)

Windows R

’ Search Engine & Keywords

(16)

help.search 5 ?help.search

> example(topic) # topic barplot

Windows R

6

> ?help

1.8 R

R expression language

A a R

R ( locale )

. ( ) .

.

expressions assignments

evaluate

(;) ({ })

compound expression

7

(#)

R

+

continuation prompt

5

?

6

: BioConductor Vignettes .

(17)

1.9

Emacs Speaks Statistics) R : R

1.10

Vi UltraEdit UltraEdit R http://

www.ultraedit.com/index.php?name=Content&pa=showpage&pid=40 R wordfile “R Scripting

- 02/18/2003” UltraEdit wordfile.txt

.R

(18)

R (

) R ( ls())

> objects()

R

workspace

rm

> rm(x, y, z, ink, junk, temp, foo, bar)

R R

R

.RData10 .Rhistory

R

x y

10

(19)

R data structure vector

x 10.4 5.6 3.1 6.4 21.7 R

> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)

c() c()

1

R 1

(<-) <(“ ”) -(“ ”)

2

‘ ’

=

assign()

> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))

<-> c(10.4, 5.6, 3.1, 6.4, 21.7) -<-> x

3

1

c() list <

35>

2

‘->’ ‘<-’ ’‘-<’ ‘>-’

3

(20)

> 1/x

( x )

> y <- c(x, 0, x)

11 y x 0

2.2

recycled ( )

> v <- 2*x + y + 1

11 v 2*x 2.2 y 1 11

+ - * / ^

log exp sin cos tan sqrt

max min range

2 c(min(x), max(x)) length(x) x sum(x)

x prod(x)

mean(x) sum(x)/length(x) var(x) var(x)

> sum((x-mean(x))^2)/(length(x)-1)

var() n p p

-p p

sort(x) x

( order() sort.list() )

max min

parallel

(21)

“ ” R

> sqrt(-17)

NaN

> sqrt(-17+0i)

2.3

R 1:30 c(1, 2, ..., 29, 30)

R 2*1:15 c(2, 4, ..., 28, 30)

n <- 10 1:n-1 1:(n-1)

30:1 construction

seq()

seq(2,10) 2:10

seq() R

from=value to=value seq(1,30) seq(from=1, to=30) seq(to=30, from=1) 1:30

seq() by=value length=value

by=1 1

> seq(-5, 5, by=.2) -> s3

c(-5.0, -4.8, -4.6, ..., 4.6, 4.8, 5.0) s3

> s4 <- seq(length=51, from=-5, by=.2)

(22)

along=vector 4

1, 2, ..., length(vector) vector

rep()

> s5 <- rep(x, times=5)

x x s5

> s6 <- rep(x, each=5)

x

2.4

R

TRUE FALSE NA (“ ”, ) T F

T F TRUE FALSE reserved

word TRUE

FALSE

conditions

> temp <- x > 13

,temp x FALSE x

TRUE

R < <= > >= ==

!= c1 c2 c1 & c2

(“ ”) c1 | c2 (“ ”) !c1 c1

FALSE

0 TRUE 1

4

(23)

2.5

“

” not available “ ” missing value

NA5 NA NA

is.na(x) x TRUE x

NA

> z <- c(1:3,NA); ind <- is.na(z)

x == NA is.na(x) NA

, x == NA x

NA

“ ” Not a Number NaN

> 0/0

> Inf - Inf

NaN

NA NaN is.na(xx) TRUE is.nan(xx)

NaN TRUE

<NA> 6

> a<-c("a","b",NA) > a

[1] "a" "b" NA > print(a,quote=F) [1] a b <NA>

5

0.01β , PDF

(24)

2.6

R

"x-values" "New iteration results"

(") (’)

( ) C escape

sequences \ \ \ "

\" \n \t \b

c()

paste()

sep=string string

> labs <- paste(c("X","Y"), 1:10, sep="")

labs

c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")

c("X", "Y") 5

1:10 7

2.7

subset

1.

TRUE FALSE

> y <- x[!is.na(x)]

7

paste(..., collapse=ss) ss R

(25)

( ) x y x

y x

> (x+1)[(!is.na(x)) & x>0] -> z

z x+1 x

2. {1, 2, . . . ,length(x)}

x[6] x

> x[1:10]

x 10 ( length(x) 10)

> c("x","y")[rep(c(1,2,2,1), times=4)]

16 "x", "y", "y", "x" 4

3. 8

> y <- x[-(1:5)]

x y

4. names

> fruit <- c(5, 10, 1, 20)

> names(fruit) <- c("orange", "banana", "apple", "peach") > lunch <- fruit[c("apple","orange")]

name indices numeric indices

data frames

(26)

vector[indexvector]

> x[is.na(x)] <- 0

0 x

> y[y < 0] <- -y[y < 0]

> y <- abs(y)

2.8

R

• matrix array

< 23>

• factor < 20>.

• list general form

< 34>.

• data frame

‘ ’

< 36>

• function R R

(27)

(28)

> z <- 0:9

> digits <- as.character(z)

digits c("0", "1", "2", ..., "9")

> d <- as.integer(digits)

d z 4 as.something()

3.2

“ ”

> e <- numeric()

e character()

5

> e[3] <- 17

3 e( NA)

scan() (scan() < 40>.)

alpha

10

> alpha <- alpha[2 * 1:5]

5 (

)

> length(alpha) <- 3

4

roundoff

(29)

3.3

> attr(z, "dim") <- c(10,10)

(30)

(31)

( < 69>)

4.1

30 1

state

> state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act")

“ ” 2

factor()

> statef <- factor(state)

print()

> statef

[1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa [16] tas sa nt wa vic qld nsw nsw wa sa act nsw vic vic act Levels: act nsw nt qld sa tas vic wa

levels() levels

> levels(statef)

[1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa"

1

8 Australian Capital Territory, New South Wales, the Northern Territory, Queensland, South Australia, Tasmania, Victoria Western Australia

2

(32)

4.2 tapply()

> incmeans <- tapply(incomes, statef, mean)

act nsw nt qld sa tas vic wa

44.500 57.333 55.500 53.600 55.000 60.500 56.000 52.250

tapply() mean() 3 statef

incomes

standard error 4

R var()

> stderr <- function(x) sqrt(var(x)/length(x))

( < 54>

R sd()5)

> incster <- tapply(incomes, statef, stderr)

3

(33)

> incster

act nsw nt qld sa tas vic wa

1.5 4.3102 4.5 4.1061 2.7386 0.5 5.244 2.6575

95%

tapply() length() t

-qt()( R t- )

tapply()

( )

ragged array

4.3

factor

ordered() ordered() factor

(34)

R

dimension vector k

k- 2- 1

dim R z 1500

> dim(z) <- c(3,5,100)

dim 3 5 100

matrix() array() array()

< 26> .

data vector FORTRAN

“ ”

a c(3,4,2) a 3×4×2 = 24 a[1,1,1],

a[2,1,1], ..., a[2,4,2], a[3,4,2]

( )

5.2

(35)

c(a[2,1,1], a[2,2,1], a[2,3,1], a[2,4,1], a[2,1,2], a[2,2,2], a[2,3,2], a[2,4,2])

a[,,] a

Z dim(Z) (

)

5.3

4 5 X

• X[1,3],X[2,2] X[3,1]

• X 0

(36)

> x <- array(1:20, dim=c(4,5)) # 4 5

> i <- array(c(1:3,3:1), dim=c(3,2))

> i # i 3 2

blocks(b ) varieties

(v ) n

> Xb <- matrix(0, n, b) > Xv <- matrix(0, n, v) > ib <- cbind(1:n, blocks) > iv <- cbind(1:n, varieties) > Xb[ib] <- 1

> Xv[iv] <- 1

> X <- cbind(Xb, Xv)

N

> N <- crossprod(Xb, Xv)

table()

(37)

5.4 array()

dim array

> Z <- array(data vector, dim vector)

h 24

> Z <- array(h, dim=c(3,4,2))

h Z 3 4 2 h 24

> dim(Z) <- c(3,4,2)

h 24 24 (

< 26>)

> Z <- array(0, c(3,4,2))

Z 0

dim(Z) c(3,4,2) Z[1:24] h

Z[] Z

dim A,B C

> D <- 2*A*B + C + 1

D

5.4.1

•

(38)

• dim

(39)

> d <- outer(0:9, 0:9)

> fr <- table(outer(d, d, "-"))

> plot(as.numeric(names(fr)), fr, type="h", xlab="Determinant", ylab="Frequency")

names “

” for < 52>

20 2

5.6

aperm(a, perm) a perm {1, . . . , k}

k a a

perm[j] j

A ( ) B

> B <- aperm(A, c(2,1))

A t()

B <- t(A)

5.7

R t(X)

nrow(A) ncol(A) A

5.7.1

%*% n 1 1 n

n

A B

> A * B

2

(40)

> A %*% B

x

> x %*% A %*% x

quadratic form 3

crossprod() “ ” crossproduct crossprod(X,

y) t(X) %*% y crossprod()

By=x squared length

“Even better would be to form a matrix square rootB withA=BB′ _{and find the squared length}

(41)

5.7.3

eigen(Sm) Sm

values vectors

> ev <- eigen(Sm)

ev ev$val Sm ev$vec

> evals <- eigen(Sm)$values

evals

> eigen(Sm)

> evals <- eigen(Sm, only.values = TRUE)$values

5.7.4

svd(M) M , M

M U M V

D M = U %*% D %*% t(V) D

svd(M) d,u v

M

> absdetM <- prod(svd(M)$d)

M

R

> absdet <- function(M) prod(svd(M)$d)

, absdet() R

trace tr() [ : ,

diag()]

(42)

5.7.5 QR

> b <- qr.coef(Xplus, y) > fit <- qr.fitted(Xplus, y) > res <- qr.resid(Xplus, y)

(43)

rbind()

X1 X2 X

1

> X <- cbind(1, X1, X2)

rbind() cbind() cbind(x) rbind(x)

x

5.9 c()

cbind() rbind() dim c()

dim dimnames

as.vector()

> vec <- as.vector(X)

c()

> vec <- c(X)

( )

5.10

table() k

k

-statef

> statefr <- table(statef)

statefr

(44)

incomef “ ”

cut()

> factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef

> table(incomef,statef) statef

incomef act nsw nt qld sa tas vic wa

(35,45] 1 1 0 1 0 0 1 0

(45,55] 1 1 1 1 2 0 1 3

(55,65] 0 3 1 3 2 2 2 1

(45)

R list components

> Lst <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9))

numbered

Lst Lst[[1]],Lst[[2]],Lst[[3]] Lst[[4]]

Lst[[4]] Lst[[4]][1]

Lst length(Lst) (

)

> name$component name

Lst$name Lst[[1]] "Fred",

Lst$wife Lst[[2]] "Mary",

(46)

Lst$name Lst[["name"]]

> x <- "name"; Lst[[x]]

Lst[[1]] Lst[1] [[. . .]]

[. . .] Lst

, named list

1

Lst

2

Lst$coefficients Lst$coe Lst$covariance Lst$cov

(names)

6.2

list()

> Lst <- list(name 1=object 1, . . ., name m=object m)

m Lst object 1, . . . , object m

> Lst[5] <- list(matrix=Mat)

6.2.1

c()

> list.ABC <- c(list.A, list.B, list.C)

1

: ( ).

(47)

dim

6.3

data frame "data.frame"

• ( , , ) ;

• ;

•

.

6.3.1

data.frame ( )

> accountants <- data.frame(home=statef, loot=incomes, shot=incomef)

as.data.frame()

read.table() < 39>

6.3.2 attach() detach()

$ accountants$statef

attach()

lentils lentils$u,lentils$v,lentils$w

(48)

(49)

•

x, y z

6.3.4

attach()

"list"

> attach(any.old.list)

detach

6.3.5

search

( )

> search()

[1] ".GlobalEnv" "Autoloads" "package:base"

.GlobalEnv 7

lentils

> search()

[1] ".GlobalEnv" "lentils" "Autoloads" "package:base" > ls(2)

[1] "u" "v" "w"

ls( objects)

> detach("lentils") > search()

[1] ".GlobalEnv" "Autoloads" "package:base"

7

(50)

Perl1 R

read.table() scan()2,

R R / 3

7.1 read.table()

•

4

1

UNIX Sed Awk

2

read.table() scan() Excel

3

R

4

(51)

Price Floor Area Rooms Age Cent.heat

01 52.00 111.0 830 5 6.2 no

02 54.75 128.0 710 5 7.5 no

03 57.50 101.0 1000 5 4.2 no

04 57.50 131.0 690 6 8.8 no

05 59.75 93.0 900 5 1.9 yes

...

( ) ,

Cent.heat read.table()

> HousePrice <- read.table("houses.data")

Price Floor Area Rooms Age Cent.heat

52.00 111.0 830 5 6.2 no

54.75 128.0 710 5 7.5 no

57.50 101.0 1000 5 4.2 no

57.50 131.0 690 6 8.8 no

59.75 93.0 900 5 1.9 yes

...

> HousePrice <- read.table("houses.data", header=TRUE)

header=TRUE

7.2 scan()

input.dat scan()

(52)

dummy list structure

inp

> label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]]

> inp <- scan("input.dat", list(id="", x=0, y=0))

> label <- inp$id; x <- inp$x; y <- inp$y

2 ( < 38>)

> X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)

R

7.3

R 100 ( datasets ) ( R

)

data()

R 2.0.0 R

data R

data(infert)

( )

(53)

7.3.1 R

package

data(package="rpart")

data(Puromycin, package="datasets")

library R

7.4

edit R

> xnew <- edit(xold)

xold xnew

xold fix(xold) xold <- edit(xold)

(54)

R R P(X ≤

x) ( q P(X ≤x)> q x

)

R

beta shape1, shape2, ncp

binom size, prob

Cauchy cauchy location, scale

chisq df, ncp

exp rate

F f df1, df1, ncp

gamma shape, scale

geom prob

hyper m, n, k

lnorm meanlog, sdlog

logistic logis location, scale

nbinom size, prob

norm mean, sd

Poisson pois lambda

t t df, ncp

unif min, max

Weibull weibull shape, scale

Wilcoxon wilcox m, n

d p

(55)

(random deviates) dxxx x pxxx q qxxx

p rxxx n (rhyper rwilcox nn)

non-centrality parameter ncp

1

pxxx qxxx lower.tail log.p, dxxx

log

(“ ”) hazard H(t) =−log(1−F(t))

- pxxx(t, ..., lower.tail = FALSE, log.p = TRUE)

(dxxx(..., log = TRUE))

ptukey qtukey studentized

range

> ## t p

> 2*pt(-2.43, df = 13) > ## F(2, 7) 1% > qf(0.99, 2, 7)

8.2

summary fivenum stem

(“ ” )

1

(56)

> attach(faithful) > summary(eruptions)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.600 2.163 4.000 3.488 4.454 5.100

> fivenum(eruptions)

[1] 1.6000 2.1585 4.0000 4.4585 5.1000 > stem(eruptions)

The decimal point is 1 digit(s) to the left of the |

16 | 070355555588

(57)

)

> long <- eruptions[eruptions > 3]

> plot(ecdf(long), do.points=FALSE, verticals=TRUE) > x <- seq(3, 5.4, 0.01)

> lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3)

(58)

3.0 3.5 4.0 4.5 5.0

0.0

0.2

0.4

0.6

0.8

1.0

ecdf(long)

x

Fn(x)

−2 −1 0 1 2

3.0

3.5

4.0

4.5

5.0

Normal Q−Q Plot

Theoretical Quantiles

Sample Quantiles

QQ ( )

Q-Q

qqplot(qt(ppoints(250), df = 5), x, xlab = "Q-Q plot for t dsn") qqline(x)

(59)

> shapiro.test(long)

Shapiro-Wilk normality test

data: long

W = 0.9793, p-value = 0.01052

Kolmogorov-Smirnov

> ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long)))

One-sample Kolmogorov-Smirnov test

data: long

D = 0.0661, p-value = 0.4284 alternative hypothesis: two.sided

( distribution theory

)

8.3

R “ ” stats

latent heat (cal/gm) Rice (1995, p.490)

Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02

Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97

boxplot

A <- scan()

79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02

B <- scan()

80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97

(60)

1 2

79.94

79.96

79.98

80.00

80.02

80.04

t

-> t.test(A, B)

Welch Two Sample t-test

data: A and B

t = 3.2499, df = 12.027, p-value = 0.00694

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

0.01385526 0.07018320 sample estimates: mean of x mean of y

80.02077 79.97875

3

R

SPLUS t.test

F

3

(61)

> var.test(A, B)

F test to compare two variances

data: A and B

F = 0.5837, num df = 12, denom df = 7, p-value = 0.3938

alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval:

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

Wilcoxon rank sum test with continuity correction

data: A and B

W = 89, p-value = 0.007497

alternative hypothesis: true mu is not equal to 0

Warning message:

(62)

, ( )

> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B)) > plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)

qqplot Q-Q

Kolmogorov-Smirnov Kolmogorov-Smirnov

> ks.test(A, B)

Two-sample Kolmogorov-Smirnov test

data: A and B

D = 0.5962, p-value = 0.05919 alternative hypothesis: two.sided

Warning message:

(63)

R expression language

{expr

1; . . .; expr m}

9.2

9.2.1 if

R

> if (expr1) expr2 else expr3

expr1

“ ” short-circuit && || if

& | 1 && || 1

2

R if/else ifelse ifelse(condition,

a, b) condition[i]

a[i] b[i]

1

: .

2

(64)

9.2.2 for repeat while

R for

> for (name in expr 1) expr 2

name expr 1 ( 1:20 )

expr 2 name name expr 1

expr 2

ind vector of class indicators

y x coplot()3

> xc <- split(x, ind) > yc <- split(y, ind) > for (i in 1:length(yc)) {

> while (condition) expr

break repeat

(65)

R

R mean(), var(),

postscript() R

> name <- function(arg 1, arg 2, ...) expression

expression R ( ) argi

name(expr1, expr2, ...)

10.1

t- “

”

> twosam <- function(y1, y2) {

n1 <- length(y1); n2 <- length(y2) yb1 <- mean(y1); yb2 <- mean(y2) s1 <- var(y1); s2 <- var(y2) s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2) tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2)) tst

}

(66)

-> tstat <- twosam(data$male, data$female); tstat

Matlab y X

( ) qr()

n 1 y n p X X\y (X′_X₎−_X′_y_,

(X′_X₎− _X′_X _{generalized inverse}

> bslash <- function(X, y) { X <- qr(X)

qr.coef(X, y) }

> regcoeff <- bslash(Xmat, yvar)

R lsfit() 1

qr() qr.coef()

binary operator

10.2

bslash()

%anything%

binary operator

!

> "%!%" <- function(X, y) { ... }

( ) X %!% y (

)

%*% %o%

1

(67)

10.3

< 10> “name=object”

fun1

> fun1 <- function(data, data.frame, graph, limit) {

[ ]

}

> ans <- fun1(d, df, TRUE, 20)

> ans <- fun1(d, df, graph=TRUE, limit=20)

> ans <- fun1(data=d, limit=20, graph=TRUE, data.frame=df)

fun1

> fun1 <- function(data, data.frame, graph=TRUE, limit=20) { ... }

> ans <- fun1(d, df)

> ans <- fun1(d, df, limit=10)

10.4 . . .

par() plot()

par() ( par() < 90> par()

(68)

fun1 <- function(data, data.frame, graph=TRUE, limit=20, ...) {

[ ]

if (graph)

par(pch="*", ...)

[ ]

}

10.5

X <- qr(X)

R Evalution frame

“ ” superassignment

<<- assign() help SPLUS

<<- R semantics <

60>

10.6

10.6.1

( < 24> )

block design blocks(b ) varieties (v

) R K v v b b replications block

size N b v incidence matrix

(69)

> bdeff <- function(blocks, varieties) {

blocks <- as.factor(blocks) # minor safety move b <- length(levels(blocks))

varieties <- as.factor(varieties) # minor safety move v <- length(levels(varieties))

K <- as.vector(table(blocks)) # dim

R <- as.vector(table(varieties)) # dim N <- table(blocks, varieties)

A <- 1/sqrt(K) * N * rep(1/sqrt(R), rep(b, v)) sv <- svd(A)

list(eff=1 - sv$d^2, blockcv=sv$u, varietycv=sv$v) }

block and variety canonical contrasts

10.6.2

dimnames R

dimnames X

> temp <- X

(70)

> no.dimnames(X)

pattern

10.6.3

evaluation frame 2

one-panel trapezium rule (two panel)

3

,

R

2

“that such functions, or indeed variables, are not inherited by called functions in higher evaluation frames as they would be if they were on the search path.”

3

(71)

(72)

cube <- function(n) {

(73)

open.account <- function(total) { list(

deposit = function(amount) { if(amount <= 0)

stop("Deposits must be positive!\n") total <<- total + amount

cat(amount, "deposited. Your balance is", total, "\n\n") },

withdraw = function(amount) { if(amount > total)

stop("You don’t have that much money!\n") total <<- total - amount

cat(amount, "withdrawn. Your balance is", total, "\n\n") },

balance = function() {

(74)

.Rprofile R

> .First <- function() {

options(prompt="$ ", continue="+\t") # $ options(digits=5, length=999) #

> .Last <- function() {

(75)

[ [[<- any as.matrix [<- mean plot summary

methods()

> methods(class="data.frame")

plot() "data.frame" "density" "factor"

methods()

> methods(plot)

> coef

function (object, ...) UseMethod("coef")

UseMethod

methods()

> methods(coef)

[1] coef.aov* coef.Arima* coef.default* coef.listof*

[5] coef.nls* coef.summary.nls*

Non-visible functions are asterisked

8

: ( coef.aov), R ‘‘Error: object

(76)

> getAnywhere("coef.aov")

A single object matching ’coef.aov’ was found It was found in the following places

registered S3 method for coef from namespace stats namespace:stats

with value

function (object, ...) {

z <- object$coef z[!is.na(z)] }

> getS3method("coef", "aov") function (object, ...) {

z <- object$coef z[!is.na(z)] }

(77)

1

R

11.1

yi= p

X

j=0

βjxij +ei, ei ∼NID(0, σ2), i= 1, . . . , n

y=Xβ+e

y X model matrix design

ma-trix X x0, x1, . . . , xp determining variable x0

1 intercept

y,x,x0,x1, x2, . . . X A,B,C, . . .

1

(78)

y ~ x

(79)

y ~ A*B + Error(C) A B C

formula operators Glim Genstat Wilkinson

& Rogers (notation) . R :,

R

( Chambers & Hastie, 1992, p.29):

Y ~ M Y M

M 1 + M 2 M 1 M 2

M 1 - M 2 M 1 M 2

M 1 : M 2 M 1 M 2 tensor product

“ ” (subclasses factor)4 3

: .

(80)

M 1 %in% M 2 M 1:M 2

M 1 * M 2 M 1 + M 2 + M 1:M 2.

M 1 / M 2 M 1 + M 2 %in% M1.

M^n M n “ ” 5

I(M) M M

I() identity function

11.1.1

(

1 )

k- A

2 . . . k k−1 (

) k−1

1, . . . , k orthogonal polynomial

k 6

options contrasts R

options(contrasts = c("contr.treatment", "contr.poly"))

R S S Helmert

SPLUS

options(contrasts = c("contr.helmert", "contr.poly"))

treatment contrast (R )

5

: “All terms inM together with “interactions” up to ordern”

6

(81)

contrasts C

7

R

11.2

multiple model lm()

> fitted.model <- lm(formula, data = data.frame)

> fm2 <- lm(y ~ x1 + x2, data = production)

y x1 x2 ( )

( ) data = production

production production

11.3

lm() "lm"

"lm"

add1 coef effects kappa predict residuals

alias deviance family labels print step

anova drop1 formula plot proj summary

7

(82)

anova(object 1, object 2)

coef(object) ( )

coefficients(object).

deviance(object) formula(object) plot(object)

8

predict(object, newdata=data.frame)

data.frame

predict.gam(object,

newdata=data.frame) predict.gam()

predict() lm, glm

gam

print(object)

9

residuals(object) ( )

resid(object) step(object)

AIC (Akaike )

summary(object)

8

9

(83)

11.4

response mean.formula + Error(strata.formula)

strata.formula strata.formula

> fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)

v + n*p*k “

“ Note also that the analysis of variance table (or tables) are for a sequence of fitted models.”

11

(84)

11.5

update()

> new.model <- update(old.model, new.formula)

new.formula . “

”

> fm05 <- lm(y ~ x1 + x2 + x3 + x4 + x5, data = production) > fm6 <- update(fm05, . ~ . + x6)

> smf6 <- update(fm6, sqrt(.) ~ .)

production

data=

update()

.

> fmfull <- lm(y ~ . , data = production)

y production

add1(), drop1() step()

11.6

12

• y stimulus variable x1, x2, . . .

12

(85)

• y

• µ smooth invertible function

µ=m(η), η=m−1

(µ) =ℓ(µ)

(inverse function)ℓ() link function

(estimation and inference)

(86)

Gamma identity,inverse,log

inverse.aussian 1/mu^2,identity,inverse,log

poisson identity,log,sqrt

quasi logit,probit,cloglog,identity,

inverse,log,1/mu^2,sqrt

family

11.6.2 glm()

R glm()

> fitted.model <- glm(formula, family=family.generator, data=data.frame)

lm() family.generator

13

< 74>

“ ”

quasi

gaussian

> fm <- glm(y ~ x1 + x2, family = gaussian, data = sales)

> fm <- lm(y ~ x1+x2, data=sales)

13

(87)

Silvey (1970) Kalythos Aegean

Age: 20 35 45 55 70

No.: tested: 50 50 50 50 50

No.: blind: 6 17 26 37 44

logistic probit

LD50 50%

y n x

y∼B(n, F(β0+β1x))

probit F(z) = Φ(z) logit (

) F(z) =ez/(1 +ez)

LD50 =−β0/β1

0

,

> kalythos <- data.frame(x = c(20,35,45,55,70), n = rep(50,5), y = c(6,17,26,37,44))

glm()

• binary 0/1

•

(88)

> kalythos$Ymat <- cbind(kalythos$y, kalythos$n - kalythos$y)

> fmp <- glm(Ymat ~ x, family = binomial(link=probit), data = kalythos) > fml <- glm(Ymat ~ x, family = binomial, data = kalythos)

logit

> summary(fmp) > summary(fml)

LD50

> ld50 <- function(b) -b[1]/b[2]

> ldp <- ld50(coef(fmp)); ldl <- ld50(coef(fml)); c(ldp, ldl)

43.663 43.601

Poisson

Poisson log

Poisson

-Poisson

Poisson

> fmod <- glm(y ~ A + B + x, family = poisson(link=sqrt), data = worm.counts)

multiplier poisson Var[y] =µ

(89)

y= θ1z1

optim() nlm() nlminb() R2.2.0 SPLUS ms()

nlminb() lack-of-fit

Bates & Watts (1988) 51

(90)

> plot(x, y)

> xfit <- seq(.02, 1.1, .05) > yfit <- 200 * xfit/(0.1 + xfit) > lines(spline(xfit, yfit))

200 0.1

> out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)

out$minimum SSE out$estimate

(SE)

> sqrt(diag(2*out$minimum/(length(y) - 2) * solve(out$hessian)))

2 95% ±1.96 SE

> plot(x, y)

> xfit <- seq(.02, 1.1, .05)

> yfit <- 212.68384222 * xfit/(0.06412146 + xfit) > lines(spline(xfit, yfit))

stats

(91)

> df <- data.frame(x=x, y=y)

> fit <- nls(y ~ SSmicmen(x, Vm, K), df) > fit

Nonlinear regression model model: y ~ SSmicmen(x, Vm, K)

data: df

Vm K

212.68370711 0.06412123

residual sum-of-squares: 1195.449 > summary(fit)

Formula: y ~ SSmicmen(x, Vm, K)

Parameters:

Estimate Std. Error t value Pr(>|t|) Vm 2.127e+02 6.947e+00 30.615 3.24e-11 K 6.412e-02 8.281e-03 7.743 1.57e-05

Residual standard error: 10.93 on 10 degrees of freedom

Correlation of Parameter Estimates: Vm

K 0.7651

11.7.2

Maximum likelihood

Dobson (1990), pp. : 108–111

logistic glm()

> x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839)

> y <- c( 6, 13, 18, 28, 52, 53, 61, 60) > n <- c(59, 60, 62, 56, 63, 59, 62, 60)

> fn <- function(p)

(92)

> out <- nlm(fn, p = c(-50,20), hessian = TRUE)

out$minimum out$estimate

> sqrt(diag(solve(out$hessian)))

95% ±1.96 SE

11.8

R

• Mixed models nlme lme() nlme()

mixed-effects models

• (Local approximating regressions) loess()

loess projection pursuit regression

stats

• (Robust regression)

MASS lqs

MASS rlm

• (Additive models)

smooth additive function

acepack avas ace mda

bruto mars gam mgcv

(93)

tree() plot() text()

(94)

R device driver graphics window

UNIX X11() Windows windows()

R

•

R

‘ ’ grid

grid

lattice S Trellis multi-panel

plot

12.1

(95)

12.1.1 plot()

R plot()

plot(x, y)

plot(xy) x y plot(x, y) y x

x y

( )

plot(x) x x

x

plot(f)

plot(f, y) f y f

y f plot(df)

plot(~ expr)

plot(y ~ expr) df y expr ‘+’

( a + b + c)

(

) y expr

12.1.2

R X

> pairs(X)

X pairwise scatterplot matrix

X X n(n−1)

coplot a b

c ( )

(96)

c a b c a

c b c

conditioning intervals c a b a

b coplot() given.values=

: “ This automatic simultaneous scaling feature is also useful when the xi’s are

(97)

(98)

main=string

> plot(x, y, type="n"); text(x, y, names)

(99)

legend(x, y, legend, ...) legend

legend

v ( legend

)

legend( , fill=v)

legend( , col=v)

legend( , lty=v)

legend( , lwd=v)

legend( , pch=v)

( )

title(main, sub) main

sub

axis(side, ...) (1 4

)

axes=FALSE plot()

( x y )

x y x y

locator()( )

12.2.1

R

expression text mtext axis

title

(100)

R

> help(plotmath) > example(plotmath) > demo(plotmath)

12.2.2 Hershey

text contour Hershey Hershey

• Hershey

• Hershey cyrillic

R Hershey

> help(Hershey) > demo(Hershey) > help(Japanese) > demo(Japanese)

12.3

R

locator() locator(n, type)

n ( 512)

type

locator()

x y

locator()

outlying point

(101)

( postscript locator() )

identify(x, y, labels) labels labels

x y( )

x y (x, y) identify()

> plot(x, y) > identify(x, y)

identify()

(

x/y ) identify()

labels ( ) plot = FALSE

identify()

x y

12.4

R

( ‘col’ ) ( )

12.4.1 : par()

par()

par(c("col", "lty")) (

(102)

par(col=4, lty=2) ( )

par()

“ ”

par() par()

R par()

2

> oldpar <- par(col=4, lty=2)

... ...

> par(oldpar)

3

> oldpar <- par(no.readonly=TRUE)

... ...

> par(oldpar)

12.4.2

( ) par()

> plot(x, y, pch="+")

par()

12.5

R par()

2

(103)

name=value name par()

> legend(locator(1), as.character(0:25), pch = 0:25)

(104)

(105)

(106)

mfg=c(3,2,3,2)

omi[1]

omi[4]

mfrow=c(3,2)

−−−−−−−−−−−−−−− −−−−−−−−−−−−−−− −−−−−−−−−−−−−−− −−−−−−−−−−−−−−− −−−−−−−−−−−−−−−

oma[3]

mfcol=c(3, 2) mfrow=c(2, 4)

mfcol mfrow

mfrow=c(3,2)

(par("cex") )

0.83 0.66

mfg=c(2, 2, 3, 2)

unequally-sized figures

fig=c(4, 9, 1, 4)/10

(107)

oma=c(2, 0, 3, 0)

omi=c(0, 0, 0.8, 0) mar mai

mtext() outer=TRUE oma omi

split.screen() layout()

4

grid lattice

12.6

R ( )

R

device driver R ( “draw a line,”)

help(Devices)

> postscript()

PostScript

X11() UNIX X11

windows() Windows

quartz() MacOS X

postscript() PostScript PostScript

pdf() PDF PDF

png() PNG ( )

jpeg() JPEG image (

)

> dev.off()

(108)

( )

> postscript("file.ps", horizontal=FALSE, height=5, pointsize=10)

(109)

...

dev.print(device, ..., which=k) k device

postscript

R xgobi XGobi UNIX Windows X

R

6

(110)

XGobi GGobi 7

http://www.ggobi.

org

7

ggobi Rggobi R R BioConductor

Ggobi Ggobi Rggobi http://www.ggobi.org/

downloads/ Ggobi R Rggobi Ggobi

(111)

)

R

> library()

( boot Davison

& Hinkley (1997))

> library(boot)

CRAN.packages() ( Windows RAqua

Packages )

> search()

( <

101>)

> help.start()

(112)

13.1

( ) R R

R

13.2 CRAN

R

(

) R CRAN (http://CRAN.R-project.

org/ ) Bioconductor (http://www.bioconductor.

org/) R

13.3

namespaces

datasets

t() R t

::

base::t base

::: R

getAnywhere()

(113)

1

R

$ R

R2 R

( R )

help.start()

HTML ( )

x <- rnorm(50) y <- rnorm(x)

x y

plot(x, y)

ls()

R

rm(x, y)

1

: Linux Windows

2

(114)

( )

x <- 1:20

x= (1,2, . . . ,20)

w <- 1 + sqrt(x)/2

‘ ’

dummy <- data.frame(x=x, y= x + rnorm(x)*w) dummy

x y

fm <- lm(y ~ x, data=dummy) summary(fm)

y x

fm1 <- lm(y ~ x, data=dummy, weight=1/w^2) summary(fm1)

attach(dummy)

lrf <- lowess(x, y)

plot(x, y)

lines(x, lrf$y)

abline(0, 1, lty=3)

(115)

abline(coef(fm))

abline(coef(fm1), col = "red")

detach()

plot(fitted(fm), resid(fm), xlab="Fitted values", ylab="Residuals",

main="Residuals vs Fitted")

heteroscedasticity

qqnorm(resid(fm), main="Residuals Rankit Plot")

skewness kurtosis outlier

rm(fm, fm1, lrf, x, dummy)

Michaelson Morley

morley read.table

filepath <- system.file("data", "morley.tab" , package="datasets") filepath

file.show(filepath)

(116)

Michaelson Morley (Expt

) 20 (Run ) sl

mm$Expt <- factor(mm$Expt) mm$Run <- factor(mm$Run)

Expt Run

attach(mm)

3 ( )

plot(Expt, Speed, main="Speed of Light Data", xlab="Experiment No.")

fm <- aov(Speed ~ Run + Expt, data=mm) summary(fm)

‘runs’ ‘experiments’

fm0 <- update(fm, . ~ . - Run) anova(fm0, fm)

‘runs’

detach() rm(fm, fm0)

x <- seq(-pi, pi, len=50) y <- x

x −π≤x≤π 50 y

f <- outer(x, y, function(x, y) cos(y)/(1 + x^2))

f x y cos(y)/(1 +x2)

(117)

“ ”

contour(x, y, f)

contour(x, y, f, nlevels=15, add=TRUE)

f

fa <- (f-t(f))/2

fa f “ ”(t() )

contour(x, y, fa, nlevels=15)

. . .

par(oldpar)

. . .

image(x, y, f) image(x, y, fa)

( ) . . .

objects(); rm(x, y, f, fa)

. . . R

th <- seq(-pi, pi, len=100) z <- exp(1i*th)

1i i

par(pty="s") plot(z, type="l")

w <- rnorm(100) + rnorm(100)*1i

. . .

w <- ifelse(Mod(w) > 1, 1/w, w)

(118)

plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+",xlab="x", ylab="y") lines(z)

w <- sqrt(runif(100))*exp(2*pi*runif(100)*1i)

plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+", xlab="x", ylab="y") lines(z)

rm(th, w, z)

q()

(119)

13.4 R

UNIX Windows R R

R [options] [<infile] [>outfile],

R CMD “ ” ( R

) R

UNIX TMPDIR

R

( Startup

Windows-)

• --no-environ R

R ENVIRON

$R HOME/etc/Renviron.site ( )

.Renviron name=value (help(Startup)

) R PAPERSIZE(

/ ) R PRINTCMD ( ) R LIBS ( R

)

• R --no-site-file

R PROFILE

$R HOME/etc/Rprofile.site

• --no-init-file R

(120)

• .RData ( --no-restore --no-restore-data)

--vanilla --no-save, --no-environ --no-site-file,

(121)

--no-readline ( UNIX) readline

Emacs ESS (“Emacs Speaks Statistics”) R

< 114>

--ess ( Windows) R-inferior-mode ESS

Rterm --min-vsize=N

--max-vsize=N “ ” vector heap N

N ‘Giga’

(2ˆ30) ‘Mega’ (2ˆ20) ‘Kilo’ (2ˆ10 ) ‘kilo’

(1000 ) G M K k

--min-nsize=N

--max-nsize=N “cons ” cons cells N

(122)

(123)

13.5 Windows

R

Windows R ( cmd.exe command.com

) R.exe Rterm.exe

C:\Documents and Settings\username\My Documents )

HOMEDRIVE HOMEPATH ( Windows NT/2000/XP

)

--debug Rgui “Break to debugger”

Windows R CMD *.bat *.exe

R HOME R VERSION R CMD R OSTYPE PATH PERL5LIB

TEXINPUTS latex.exe

R CMD latex.exe mydoc

mydoc.tex LA

TEX R share/texmf TEXINPUTS

13.6 Mac OS X

R

Mac OS X R Terminal.app R

(124)

Applications Mac OS X

(125)

13.7

UNIX GNU readline R UNIX

R

UNIX GNOME

--no-readline ( ESS 3

)

Windows R GUI Help

Console Rterm.exe README.Rterm

readline R

Meta character

Control-m CTRL m C-m Meta-b

META4 b M-b META

ESC M-b

ESC b ESC

13.8

R

Emacs-vi

M-i M-a ESC

RET

3

‘Emacs Speaks Statistics’ URLhttp://ESS.R-project.org 4

(126)

13.9

C-p ( )

C-n ( )

C-r text text

C-p C-n

C-a C-e M-b M-f C-b C-f

C-b C-f

text text

C-f text text

DEL ( )

C-d

M-d “ ”

C-k “ ”

C-y “ ”

C-t M-l M-c

RET R

(127)

(128)

(129)

(130)

(131)

(132)

wilcox.test . . . .50 windows . . . .96

(133)

D. M. Bates and D. G. Watts (1988), Nonlinear Regression Analysis and Its Applica-tions. John Wiley & Sons, New York.

Richard A. Becker, John M. Chambers and Allan R. Wilks (1988), The New S Lan-guage. Chapman & Hall, New York. This book is often called the “Blue Book”. John M. Chambers and Trevor J. Hastie eds. (1992),Statistical Models in S.Chapman & Hall, New York. This is also called the “White Book”.

John M. Chambers (1998)Programming with Data. Springer, New York. This is also called the “Green Book”.

A. C. Davison and D. V. Hinkley (1997), Bootstrap Methods and Their Applications, Cambridge University Press.

Annette J. Dobson (1990), An Introduction to Generalized Linear Models, Chapman and Hall, London.

Peter McCullagh and John A. Nelder (1989),Generalized Linear Models. Second edi-tion, Chapman and Hall, London.

John A. Rice (1995), Mathematical Statistics and Data Analysis. Second edition. Duxbury Press, Belmont, CA.

(134)