R
2.3.0 (2006-04-24) 0.1 (2006-06-15)
W. N. Venables, D. M. Smith
:
Copyright c°1990 W. N. Venables
Copyright c°1992 W. N. Venables & D. M. Smith Copyright c°1997 R. Gentleman & R. Ihaka Copyright c°1997, 1998 M. Maechler
Copyright c°1999 2006 R Development Core Team
Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.
Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.
Permission is granted to copy and distribute translations of this manual into another
language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Development Core Team.
c
°1990 W. N. Venables c
°1992 W. N. Venables & D. M. Smith c
°1997 R. Gentleman & R. Ihaka c
°1997, 1998 M. Maechler c
°1999 2006 R Development Core Team
R R Development Core Team
GNU FDL GNU
0.2 ( 05 ) . . . viii
0.3 ( PDF ) . . . x
1 1 1.1 R . . . 1
1.2 . . . 1
1.3 R . . . 2
1.4 R . . . 2
1.5 R . . . 3
1.6 R . . . 4
1.7 . . . 4
1.8 R . . . 5
1.9 . . . 6
1.10 . . . 6
1.11 . . . 6
2 8 2.1 . . . 8
2.2 . . . 9
2.3 . . . 10
2.4 . . . 11
2.5 . . . 12
2.6 . . . 13
2.7 . . . 13
13 100
13.1 . . . 101
13.2 CRAN . . . 101
13.3 . . . 101
1 102
2 R 108
13.4 R. . . 108
13.5 Windows R. . . 112
13.6 Mac OS X R . . . 112
3 114
13.7 . . . 114
13.8 . . . 114
13.9 . . . 115
4 116
5 118
Bill Venables David Smith R
[email protected] [email protected]
0.1
R A session < 102>
R R sessions
R
R
( < 83> )
R
http://cran.r-project.org/mirrors.html
“Precompiled Binary Distributions”
Windows “Windows (95 and later)” “base”
“rwxxxx.exe” rw2010.exe Windows OK
A session < 102>
0.2
(
05
)
R R
R R
R
ROTATION R
Stata MATLAB SPSS SAS R
R
• R
R MATLAB
• R
R R
•
• R / R
R R
R
http://www.r-project.org/
foundation/about.html
Texinfo Texinfo PDF
HTML ( UTF-8)
Texinfo LaTeX PDF
PDF R
R An Introduction to R
R Data Import/Export The R language definition Writing R
R
R for beginners XF Wang
R for beginners R : )
R
•
•
•
•
• ( ‘S’)
S
, “ ” environment R
, ,
R
packages R
1.2
R Bell Laboratories Rick Becker John Chambers
Allan Wilks S S SPLUS
S John Chambers
Wilks The New SLanguage: A Programming Environment for Data Analysis and Graphics John M. Chambers and Trevor J. Hastie Statistical Models in S
1991 S 3 1
methods method
class John M. Chambers Programming with Data
< 122> .
: John M. Chambers 1988 S3
2
R
1.5
R
R
> UNIX shell
R
UNIX shell $
UNIX R
1. work R
R
$ mkdir work $ cd work
2. R
$ R
3. R ( )
4. R
> q()
R
yes no,
cancel
R R
R
1. work
$ cd work $ R
Windows R
’ Search Engine & Keywords
help.search 5 ?help.search
> example(topic) # topic barplot
Windows R
6
> ?help
1.8
R
R expression language
A a R
R ( locale )
. ( ) .
.
expressions assignments
evaluate
(;) ({ })
compound expression
7
(#)
R
+
continuation prompt
5
?
6
: BioConductor Vignettes .
1.9
Emacs Speaks Statistics) R : R
1.10
Vi UltraEdit UltraEdit R http://
www.ultraedit.com/index.php?name=Content&pa=showpage&pid=40 R wordfile “R Scripting
- 02/18/2003” UltraEdit wordfile.txt
.R
R (
) R ( ls())
> objects()
R
workspace
rm
> rm(x, y, z, ink, junk, temp, foo, bar)
R R
R
.RData10 .Rhistory
R
R
x y
10
R data structure vector
x 10.4 5.6 3.1 6.4 21.7 R
> x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
c() c()
1
R 1
(<-) <(“ ”) -(“ ”)
2
‘ ’
=
assign()
> assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))
<-> c(10.4, 5.6, 3.1, 6.4, 21.7) -<-> x
3
1
c() list <
35>
2
‘->’ ‘<-’ ’‘-<’ ‘>-’
3
> 1/x
( x )
> y <- c(x, 0, x)
11 y x 0
2.2
recycled ( )
> v <- 2*x + y + 1
11 v 2*x 2.2 y 1 11
+ - * / ^
log exp sin cos tan sqrt
max min range
2 c(min(x), max(x)) length(x) x sum(x)
x prod(x)
mean(x) sum(x)/length(x) var(x) var(x)
> sum((x-mean(x))^2)/(length(x)-1)
var() n p p
-p p
sort(x) x
( order() sort.list() )
max min
parallel
“ ” R
> sqrt(-17)
NaN
> sqrt(-17+0i)
2.3
R 1:30 c(1, 2, ..., 29, 30)
R 2*1:15 c(2, 4, ..., 28, 30)
n <- 10 1:n-1 1:(n-1)
30:1 construction
seq()
seq(2,10) 2:10
seq() R
from=value to=value seq(1,30) seq(from=1, to=30) seq(to=30, from=1) 1:30
seq() by=value length=value
by=1 1
> seq(-5, 5, by=.2) -> s3
c(-5.0, -4.8, -4.6, ..., 4.6, 4.8, 5.0) s3
> s4 <- seq(length=51, from=-5, by=.2)
along=vector 4
1, 2, ..., length(vector) vector
rep()
> s5 <- rep(x, times=5)
x x s5
> s6 <- rep(x, each=5)
x
2.4
R
TRUE FALSE NA (“ ”, ) T F
T F TRUE FALSE reserved
word TRUE
FALSE
conditions
> temp <- x > 13
,temp x FALSE x
TRUE
R < <= > >= ==
!= c1 c2 c1 & c2
(“ ”) c1 | c2 (“ ”) !c1 c1
FALSE
0 TRUE 1
4
2.5
“
” not available “ ” missing value
NA5 NA NA
is.na(x) x TRUE x
NA
> z <- c(1:3,NA); ind <- is.na(z)
x == NA is.na(x) NA
, x == NA x
NA
“ ” Not a Number NaN
> 0/0
> Inf - Inf
NaN
NA NaN is.na(xx) TRUE is.nan(xx)
NaN TRUE
<NA> 6
> a<-c("a","b",NA) > a
[1] "a" "b" NA > print(a,quote=F) [1] a b <NA>
5
0.01β , PDF
2.6
R
"x-values" "New iteration results"
(") (’)
( ) C escape
sequences \ \ \ "
\" \n \t \b
c()
paste()
sep=string string
> labs <- paste(c("X","Y"), 1:10, sep="")
labs
c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")
c("X", "Y") 5
1:10 7
2.7
subset
1.
TRUE FALSE
> y <- x[!is.na(x)]
7
paste(..., collapse=ss) ss R
( ) x y x
y x
> (x+1)[(!is.na(x)) & x>0] -> z
z x+1 x
2. {1, 2, . . . ,length(x)}
x[6] x
> x[1:10]
x 10 ( length(x) 10)
> c("x","y")[rep(c(1,2,2,1), times=4)]
16 "x", "y", "y", "x" 4
3. 8
> y <- x[-(1:5)]
x y
4. names
> fruit <- c(5, 10, 1, 20)
> names(fruit) <- c("orange", "banana", "apple", "peach") > lunch <- fruit[c("apple","orange")]
name indices numeric indices
data frames
vector[indexvector]
> x[is.na(x)] <- 0
0 x
> y[y < 0] <- -y[y < 0]
> y <- abs(y)
2.8
R
• matrix array
< 23>
• factor < 20>.
• list general form
< 34>.
• data frame
‘ ’
< 36>
• function R R
> z <- 0:9
> digits <- as.character(z)
digits c("0", "1", "2", ..., "9")
> d <- as.integer(digits)
d z 4 as.something()
3.2
“ ”
> e <- numeric()
e character()
5
> e[3] <- 17
3 e( NA)
scan() (scan() < 40>.)
alpha
10
> alpha <- alpha[2 * 1:5]
5 (
)
> length(alpha) <- 3
4
roundoff
3.3
> attr(z, "dim") <- c(10,10)
( < 69>)
4.1
30 1
state
> state <- c("tas", "sa", "qld", "nsw", "nsw", "nt", "wa", "wa", "qld", "vic", "nsw", "vic", "qld", "qld", "sa", "tas", "sa", "nt", "wa", "vic", "qld", "nsw", "nsw", "wa", "sa", "act", "nsw", "vic", "vic", "act")
“ ” 2
factor()
> statef <- factor(state)
print()
> statef
[1] tas sa qld nsw nsw nt wa wa qld vic nsw vic qld qld sa [16] tas sa nt wa vic qld nsw nsw wa sa act nsw vic vic act Levels: act nsw nt qld sa tas vic wa
levels() levels
> levels(statef)
[1] "act" "nsw" "nt" "qld" "sa" "tas" "vic" "wa"
1
8 Australian Capital Territory, New South Wales, the Northern Territory, Queensland, South Australia, Tasmania, Victoria Western Australia
2
4.2
tapply()
> incmeans <- tapply(incomes, statef, mean)
act nsw nt qld sa tas vic wa
44.500 57.333 55.500 53.600 55.000 60.500 56.000 52.250
tapply() mean() 3 statef
incomes
standard error 4
R var()
> stderr <- function(x) sqrt(var(x)/length(x))
( < 54>
R sd()5)
> incster <- tapply(incomes, statef, stderr)
3
> incster
act nsw nt qld sa tas vic wa
1.5 4.3102 4.5 4.1061 2.7386 0.5 5.244 2.6575
95%
tapply() length() t
-qt()( R t- )
tapply()
( )
ragged array
4.3
factor
ordered() ordered() factor
R
dimension vector k
k- 2- 1
dim R z 1500
> dim(z) <- c(3,5,100)
dim 3 5 100
matrix() array() array()
< 26> .
data vector FORTRAN
“ ”
a c(3,4,2) a 3×4×2 = 24 a[1,1,1],
a[2,1,1], ..., a[2,4,2], a[3,4,2]
( )
5.2
c(a[2,1,1], a[2,2,1], a[2,3,1], a[2,4,1], a[2,1,2], a[2,2,2], a[2,3,2], a[2,4,2])
a[,,] a
Z dim(Z) (
)
5.3
4 5 X
• X[1,3],X[2,2] X[3,1]
• X 0
> x <- array(1:20, dim=c(4,5)) # 4 5
> i <- array(c(1:3,3:1), dim=c(3,2))
> i # i 3 2
blocks(b ) varieties
(v ) n
> Xb <- matrix(0, n, b) > Xv <- matrix(0, n, v) > ib <- cbind(1:n, blocks) > iv <- cbind(1:n, varieties) > Xb[ib] <- 1
> Xv[iv] <- 1
> X <- cbind(Xb, Xv)
N
> N <- crossprod(Xb, Xv)
table()
5.4
array()
dim array
> Z <- array(data vector, dim vector)
h 24
> Z <- array(h, dim=c(3,4,2))
h Z 3 4 2 h 24
> dim(Z) <- c(3,4,2)
h 24 24 (
< 26>)
> Z <- array(0, c(3,4,2))
Z 0
dim(Z) c(3,4,2) Z[1:24] h
Z[] Z
dim A,B C
> D <- 2*A*B + C + 1
D
5.4.1
•
• dim
> d <- outer(0:9, 0:9)
> fr <- table(outer(d, d, "-"))
> plot(as.numeric(names(fr)), fr, type="h", xlab="Determinant", ylab="Frequency")
names “
” for < 52>
20 2
5.6
aperm(a, perm) a perm {1, . . . , k}
k a a
perm[j] j
A ( ) B
> B <- aperm(A, c(2,1))
A t()
B <- t(A)
5.7
R t(X)
nrow(A) ncol(A) A
5.7.1
%*% n 1 1 n
n
A B
> A * B
2
> A %*% B
x
> x %*% A %*% x
quadratic form 3
crossprod() “ ” crossproduct crossprod(X,
y) t(X) %*% y crossprod()
By=x squared length
“Even better would be to form a matrix square rootB withA=BB′ and find the squared length
5.7.3
eigen(Sm) Sm
values vectors
> ev <- eigen(Sm)
ev ev$val Sm ev$vec
> evals <- eigen(Sm)$values
evals
> eigen(Sm)
> evals <- eigen(Sm, only.values = TRUE)$values
5.7.4
svd(M) M , M
M U M V
D M = U %*% D %*% t(V) D
svd(M) d,u v
M
> absdetM <- prod(svd(M)$d)
M
R
> absdet <- function(M) prod(svd(M)$d)
, absdet() R
trace tr() [ : ,
diag()]
5.7.5 QR
> b <- qr.coef(Xplus, y) > fit <- qr.fitted(Xplus, y) > res <- qr.resid(Xplus, y)
rbind()
X1 X2 X
1
> X <- cbind(1, X1, X2)
rbind() cbind() cbind(x) rbind(x)
x
5.9
c()
cbind() rbind() dim c()
dim dimnames
as.vector()
> vec <- as.vector(X)
c()
> vec <- c(X)
( )
5.10
table() k
k
-statef
> statefr <- table(statef)
statefr
incomef “ ”
cut()
> factor(cut(incomes, breaks = 35+10*(0:7))) -> incomef
> table(incomef,statef) statef
incomef act nsw nt qld sa tas vic wa
(35,45] 1 1 0 1 0 0 1 0
(45,55] 1 1 1 1 2 0 1 3
(55,65] 0 3 1 3 2 2 2 1
R list components
> Lst <- list(name="Fred", wife="Mary", no.children=3, child.ages=c(4,7,9))
numbered
Lst Lst[[1]],Lst[[2]],Lst[[3]] Lst[[4]]
Lst[[4]] Lst[[4]][1]
Lst length(Lst) (
)
> name$component name
Lst$name Lst[[1]] "Fred",
Lst$wife Lst[[2]] "Mary",
Lst$name Lst[["name"]]
> x <- "name"; Lst[[x]]
Lst[[1]] Lst[1] [[. . .]]
[. . .] Lst
, named list
1
Lst
2
Lst$coefficients Lst$coe Lst$covariance Lst$cov
(names)
6.2
list()
> Lst <- list(name 1=object 1, . . ., name m=object m)
m Lst object 1, . . . , object m
> Lst[5] <- list(matrix=Mat)
6.2.1
c()
> list.ABC <- c(list.A, list.B, list.C)
1
: ( ).
dim
6.3
data frame "data.frame"
• ( , , ) ;
• ;
• ;
•
.
6.3.1
data.frame ( )
> accountants <- data.frame(home=statef, loot=incomes, shot=incomef)
as.data.frame()
read.table() < 39>
6.3.2 attach() detach()
$ accountants$statef
attach()
lentils lentils$u,lentils$v,lentils$w
•
x, y z
6.3.4
attach()
"list"
> attach(any.old.list)
detach
6.3.5
search
( )
> search()
[1] ".GlobalEnv" "Autoloads" "package:base"
.GlobalEnv 7
lentils
> search()
[1] ".GlobalEnv" "lentils" "Autoloads" "package:base" > ls(2)
[1] "u" "v" "w"
ls( objects)
> detach("lentils") > search()
[1] ".GlobalEnv" "Autoloads" "package:base"
7
Perl1 R
read.table() scan()2,
R R / 3
7.1
read.table()
•
•
4
1
UNIX Sed Awk
2
read.table() scan() Excel
3
R
4
Price Floor Area Rooms Age Cent.heat
01 52.00 111.0 830 5 6.2 no
02 54.75 128.0 710 5 7.5 no
03 57.50 101.0 1000 5 4.2 no
04 57.50 131.0 690 6 8.8 no
05 59.75 93.0 900 5 1.9 yes
...
( ) ,
Cent.heat read.table()
> HousePrice <- read.table("houses.data")
Price Floor Area Rooms Age Cent.heat
52.00 111.0 830 5 6.2 no
54.75 128.0 710 5 7.5 no
57.50 101.0 1000 5 4.2 no
57.50 131.0 690 6 8.8 no
59.75 93.0 900 5 1.9 yes
...
> HousePrice <- read.table("houses.data", header=TRUE)
header=TRUE
7.2
scan()
input.dat scan()
dummy list structure
inp
> label <- inp[[1]]; x <- inp[[2]]; y <- inp[[3]]
> inp <- scan("input.dat", list(id="", x=0, y=0))
> label <- inp$id; x <- inp$x; y <- inp$y
2 ( < 38>)
> X <- matrix(scan("light.dat", 0), ncol=5, byrow=TRUE)
R
7.3
R 100 ( datasets ) ( R
)
data()
R 2.0.0 R
data R
data(infert)
( )
7.3.1 R
package
data(package="rpart")
data(Puromycin, package="datasets")
library R
7.4
edit R
> xnew <- edit(xold)
xold xnew
xold fix(xold) xold <- edit(xold)
R R P(X ≤
x) ( q P(X ≤x)> q x
)
R
beta shape1, shape2, ncp
binom size, prob
Cauchy cauchy location, scale
chisq df, ncp
exp rate
F f df1, df1, ncp
gamma shape, scale
geom prob
hyper m, n, k
lnorm meanlog, sdlog
logistic logis location, scale
nbinom size, prob
norm mean, sd
Poisson pois lambda
t t df, ncp
unif min, max
Weibull weibull shape, scale
Wilcoxon wilcox m, n
d p
(random deviates) dxxx x pxxx q qxxx
p rxxx n (rhyper rwilcox nn)
non-centrality parameter ncp
1
pxxx qxxx lower.tail log.p, dxxx
log
(“ ”) hazard H(t) =−log(1−F(t))
- pxxx(t, ..., lower.tail = FALSE, log.p = TRUE)
(dxxx(..., log = TRUE))
ptukey qtukey studentized
range
> ## t p
> 2*pt(-2.43, df = 13) > ## F(2, 7) 1% > qf(0.99, 2, 7)
8.2
summary fivenum stem
(“ ” )
1
> attach(faithful) > summary(eruptions)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.600 2.163 4.000 3.488 4.454 5.100
> fivenum(eruptions)
[1] 1.6000 2.1585 4.0000 4.4585 5.1000 > stem(eruptions)
The decimal point is 1 digit(s) to the left of the |
16 | 070355555588
)
> long <- eruptions[eruptions > 3]
> plot(ecdf(long), do.points=FALSE, verticals=TRUE) > x <- seq(3, 5.4, 0.01)
> lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3)
3.0 3.5 4.0 4.5 5.0
0.0
0.2
0.4
0.6
0.8
1.0
ecdf(long)
x
Fn(x)
−2 −1 0 1 2
3.0
3.5
4.0
4.5
5.0
Normal Q−Q Plot
Theoretical Quantiles
Sample Quantiles
QQ ( )
Q-Q
qqplot(qt(ppoints(250), df = 5), x, xlab = "Q-Q plot for t dsn") qqline(x)
> shapiro.test(long)
Shapiro-Wilk normality test
data: long
W = 0.9793, p-value = 0.01052
Kolmogorov-Smirnov
> ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long)))
One-sample Kolmogorov-Smirnov test
data: long
D = 0.0661, p-value = 0.4284 alternative hypothesis: two.sided
( distribution theory
)
8.3
R “ ” stats
latent heat (cal/gm) Rice (1995, p.490)
Method A: 79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02
Method B: 80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97
boxplot
A <- scan()
79.98 80.04 80.02 80.04 80.03 80.03 80.04 79.97 80.05 80.03 80.02 80.00 80.02
B <- scan()
80.02 79.94 79.98 79.97 79.97 80.03 79.95 79.97
1 2
79.94
79.96
79.98
80.00
80.02
80.04
t
-> t.test(A, B)
Welch Two Sample t-test
data: A and B
t = 3.2499, df = 12.027, p-value = 0.00694
alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
0.01385526 0.07018320 sample estimates: mean of x mean of y
80.02077 79.97875
3
R
SPLUS t.test
F
3
> var.test(A, B)
F test to compare two variances
data: A and B
F = 0.5837, num df = 12, denom df = 7, p-value = 0.3938
alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval:
alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
Wilcoxon rank sum test with continuity correction
data: A and B
W = 89, p-value = 0.007497
alternative hypothesis: true mu is not equal to 0
Warning message:
, ( )
> plot(ecdf(A), do.points=FALSE, verticals=TRUE, xlim=range(A, B)) > plot(ecdf(B), do.points=FALSE, verticals=TRUE, add=TRUE)
qqplot Q-Q
Kolmogorov-Smirnov Kolmogorov-Smirnov
> ks.test(A, B)
Two-sample Kolmogorov-Smirnov test
data: A and B
D = 0.5962, p-value = 0.05919 alternative hypothesis: two.sided
Warning message:
R expression language
{expr
1; . . .; expr m}
9.2
9.2.1 if
R
> if (expr1) expr2 else expr3
expr1
“ ” short-circuit && || if
& | 1 && || 1
2
R if/else ifelse ifelse(condition,
a, b) condition[i]
a[i] b[i]
1
: .
2
9.2.2 for repeat while
R for
> for (name in expr 1) expr 2
name expr 1 ( 1:20 )
expr 2 name name expr 1
expr 2
ind vector of class indicators
y x coplot()3
> xc <- split(x, ind) > yc <- split(y, ind) > for (i in 1:length(yc)) {
> while (condition) expr
break repeat
R
R mean(), var(),
postscript() R
> name <- function(arg 1, arg 2, ...) expression
expression R ( ) argi
name(expr1, expr2, ...)
10.1
t- “
”
> twosam <- function(y1, y2) {
n1 <- length(y1); n2 <- length(y2) yb1 <- mean(y1); yb2 <- mean(y2) s1 <- var(y1); s2 <- var(y2) s <- ((n1-1)*s1 + (n2-1)*s2)/(n1+n2-2) tst <- (yb1 - yb2)/sqrt(s*(1/n1 + 1/n2)) tst
}
-> tstat <- twosam(data$male, data$female); tstat
Matlab y X
( ) qr()
n 1 y n p X X\y (X′X)−X′y,
(X′X)− X′X generalized inverse
> bslash <- function(X, y) { X <- qr(X)
qr.coef(X, y) }
> regcoeff <- bslash(Xmat, yvar)
R lsfit() 1
qr() qr.coef()
binary operator
10.2
bslash()
%anything%
binary operator
!
> "%!%" <- function(X, y) { ... }
( ) X %!% y (
)
%*% %o%
1
10.3
< 10> “name=object”
fun1
> fun1 <- function(data, data.frame, graph, limit) {
[ ]
}
> ans <- fun1(d, df, TRUE, 20)
> ans <- fun1(d, df, graph=TRUE, limit=20)
> ans <- fun1(data=d, limit=20, graph=TRUE, data.frame=df)
fun1
> fun1 <- function(data, data.frame, graph=TRUE, limit=20) { ... }
> ans <- fun1(d, df)
> ans <- fun1(d, df, limit=10)
10.4
. . .
par() plot()
par() ( par() < 90> par()
fun1 <- function(data, data.frame, graph=TRUE, limit=20, ...) {
[ ]
if (graph)
par(pch="*", ...)
[ ]
}
10.5
X <- qr(X)
R Evalution frame
“ ” superassignment
<<- assign() help SPLUS
<<- R semantics <
60>
10.6
10.6.1
( < 24> )
block design blocks(b ) varieties (v
) R K v v b b replications block
size N b v incidence matrix
> bdeff <- function(blocks, varieties) {
blocks <- as.factor(blocks) # minor safety move b <- length(levels(blocks))
varieties <- as.factor(varieties) # minor safety move v <- length(levels(varieties))
K <- as.vector(table(blocks)) # dim
R <- as.vector(table(varieties)) # dim N <- table(blocks, varieties)
A <- 1/sqrt(K) * N * rep(1/sqrt(R), rep(b, v)) sv <- svd(A)
list(eff=1 - sv$d^2, blockcv=sv$u, varietycv=sv$v) }
block and variety canonical contrasts
10.6.2
dimnames R
dimnames X
> temp <- X
> no.dimnames(X)
pattern
10.6.3
evaluation frame 2
one-panel trapezium rule (two panel)
3
,
R
2
“that such functions, or indeed variables, are not inherited by called functions in higher evaluation frames as they would be if they were on the search path.”
3
cube <- function(n) {
open.account <- function(total) { list(
deposit = function(amount) { if(amount <= 0)
stop("Deposits must be positive!\n") total <<- total + amount
cat(amount, "deposited. Your balance is", total, "\n\n") },
withdraw = function(amount) { if(amount > total)
stop("You don’t have that much money!\n") total <<- total - amount
cat(amount, "withdrawn. Your balance is", total, "\n\n") },
balance = function() {
.Rprofile R
> .First <- function() {
options(prompt="$ ", continue="+\t") # $ options(digits=5, length=999) #
> .Last <- function() {
[ [[<- any as.matrix [<- mean plot summary
methods()
> methods(class="data.frame")
plot() "data.frame" "density" "factor"
methods()
> methods(plot)
> coef
function (object, ...) UseMethod("coef")
UseMethod
methods()
> methods(coef)
[1] coef.aov* coef.Arima* coef.default* coef.listof*
[5] coef.nls* coef.summary.nls*
Non-visible functions are asterisked
8
8
: ( coef.aov), R ‘‘Error: object
> getAnywhere("coef.aov")
A single object matching ’coef.aov’ was found It was found in the following places
registered S3 method for coef from namespace stats namespace:stats
with value
function (object, ...) {
z <- object$coef z[!is.na(z)] }
> getS3method("coef", "aov") function (object, ...) {
z <- object$coef z[!is.na(z)] }
1
R
R
11.1
yi= p
X
j=0
βjxij +ei, ei ∼NID(0, σ2), i= 1, . . . , n
y=Xβ+e
y X model matrix design
ma-trix X x0, x1, . . . , xp determining variable x0
1 intercept
y,x,x0,x1, x2, . . . X A,B,C, . . .
1
y ~ x
y ~ A*B + Error(C) A B C
formula operators Glim Genstat Wilkinson
& Rogers (notation) . R :,
R
( Chambers & Hastie, 1992, p.29):
Y ~ M Y M
M 1 + M 2 M 1 M 2
M 1 - M 2 M 1 M 2
M 1 : M 2 M 1 M 2 tensor product
“ ” (subclasses factor)4 3
: .
M 1 %in% M 2 M 1:M 2
M 1 * M 2 M 1 + M 2 + M 1:M 2.
M 1 / M 2 M 1 + M 2 %in% M1.
M^n M n “ ” 5
I(M) M M
I() identity function
11.1.1
(
1 )
k- A
2 . . . k k−1 (
) k−1
1, . . . , k orthogonal polynomial
k 6
options contrasts R
options(contrasts = c("contr.treatment", "contr.poly"))
R S S Helmert
SPLUS
options(contrasts = c("contr.helmert", "contr.poly"))
treatment contrast (R )
5
: “All terms inM together with “interactions” up to ordern”
6
contrasts C
7
R
R
11.2
multiple model lm()
> fitted.model <- lm(formula, data = data.frame)
> fm2 <- lm(y ~ x1 + x2, data = production)
y x1 x2 ( )
( ) data = production
production production
11.3
lm() "lm"
"lm"
add1 coef effects kappa predict residuals
alias deviance family labels print step
anova drop1 formula plot proj summary
7
anova(object 1, object 2)
coef(object) ( )
coefficients(object).
deviance(object) formula(object) plot(object)
8
predict(object, newdata=data.frame)
data.frame
predict.gam(object,
newdata=data.frame) predict.gam()
predict() lm, glm
gam
print(object)
9
residuals(object) ( )
resid(object) step(object)
AIC (Akaike )
summary(object)
8
9
11.4
response mean.formula + Error(strata.formula)
strata.formula strata.formula
> fm <- aov(yield ~ v + n*p*k + Error(farms/blocks), data=farm.data)
v + n*p*k “
“ Note also that the analysis of variance table (or tables) are for a sequence of fitted models.”
11
11.5
update()
> new.model <- update(old.model, new.formula)
new.formula . “
”
> fm05 <- lm(y ~ x1 + x2 + x3 + x4 + x5, data = production) > fm6 <- update(fm05, . ~ . + x6)
> smf6 <- update(fm6, sqrt(.) ~ .)
production
data=
update()
.
> fmfull <- lm(y ~ . , data = production)
y production
add1(), drop1() step()
11.6
12
• y stimulus variable x1, x2, . . .
12
• y
• µ smooth invertible function
µ=m(η), η=m−1
(µ) =ℓ(µ)
(inverse function)ℓ() link function
(estimation and inference)
Gamma identity,inverse,log
inverse.aussian 1/mu^2,identity,inverse,log
poisson identity,log,sqrt
quasi logit,probit,cloglog,identity,
inverse,log,1/mu^2,sqrt
family
11.6.2 glm()
R glm()
> fitted.model <- glm(formula, family=family.generator, data=data.frame)
lm() family.generator
13
< 74>
“ ”
quasi
gaussian
> fm <- glm(y ~ x1 + x2, family = gaussian, data = sales)
> fm <- lm(y ~ x1+x2, data=sales)
13
Silvey (1970) Kalythos Aegean
Age: 20 35 45 55 70
No.: tested: 50 50 50 50 50
No.: blind: 6 17 26 37 44
logistic probit
LD50 50%
y n x
y∼B(n, F(β0+β1x))
probit F(z) = Φ(z) logit (
) F(z) =ez/(1 +ez)
LD50 =−β0/β1
0
,
> kalythos <- data.frame(x = c(20,35,45,55,70), n = rep(50,5), y = c(6,17,26,37,44))
glm()
• binary 0/1
•
> kalythos$Ymat <- cbind(kalythos$y, kalythos$n - kalythos$y)
> fmp <- glm(Ymat ~ x, family = binomial(link=probit), data = kalythos) > fml <- glm(Ymat ~ x, family = binomial, data = kalythos)
logit
> summary(fmp) > summary(fml)
LD50
> ld50 <- function(b) -b[1]/b[2]
> ldp <- ld50(coef(fmp)); ldl <- ld50(coef(fml)); c(ldp, ldl)
43.663 43.601
Poisson
Poisson log
Poisson
-Poisson
Poisson
> fmod <- glm(y ~ A + B + x, family = poisson(link=sqrt), data = worm.counts)
multiplier poisson Var[y] =µ
y= θ1z1
optim() nlm() nlminb() R2.2.0 SPLUS ms()
nlminb() lack-of-fit
Bates & Watts (1988) 51
> plot(x, y)
> xfit <- seq(.02, 1.1, .05) > yfit <- 200 * xfit/(0.1 + xfit) > lines(spline(xfit, yfit))
200 0.1
> out <- nlm(fn, p = c(200, 0.1), hessian = TRUE)
out$minimum SSE out$estimate
(SE)
> sqrt(diag(2*out$minimum/(length(y) - 2) * solve(out$hessian)))
2 95% ±1.96 SE
> plot(x, y)
> xfit <- seq(.02, 1.1, .05)
> yfit <- 212.68384222 * xfit/(0.06412146 + xfit) > lines(spline(xfit, yfit))
stats
> df <- data.frame(x=x, y=y)
> fit <- nls(y ~ SSmicmen(x, Vm, K), df) > fit
Nonlinear regression model model: y ~ SSmicmen(x, Vm, K)
data: df
Vm K
212.68370711 0.06412123
residual sum-of-squares: 1195.449 > summary(fit)
Formula: y ~ SSmicmen(x, Vm, K)
Parameters:
Estimate Std. Error t value Pr(>|t|) Vm 2.127e+02 6.947e+00 30.615 3.24e-11 K 6.412e-02 8.281e-03 7.743 1.57e-05
Residual standard error: 10.93 on 10 degrees of freedom
Correlation of Parameter Estimates: Vm
K 0.7651
11.7.2
Maximum likelihood
Dobson (1990), pp. : 108–111
logistic glm()
> x <- c(1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839)
> y <- c( 6, 13, 18, 28, 52, 53, 61, 60) > n <- c(59, 60, 62, 56, 63, 59, 62, 60)
> fn <- function(p)
> out <- nlm(fn, p = c(-50,20), hessian = TRUE)
out$minimum out$estimate
> sqrt(diag(solve(out$hessian)))
95% ±1.96 SE
11.8
R
• Mixed models nlme lme() nlme()
mixed-effects models
• (Local approximating regressions) loess()
loess projection pursuit regression
stats
• (Robust regression)
MASS lqs
MASS rlm
• (Additive models)
smooth additive function
acepack avas ace mda
bruto mars gam mgcv
tree() plot() text()
R device driver graphics window
UNIX X11() Windows windows()
R
•
•
•
R
‘ ’ grid
grid
lattice S Trellis multi-panel
plot
12.1
12.1.1 plot()
R plot()
plot(x, y)
plot(xy) x y plot(x, y) y x
x y
( )
plot(x) x x
x
plot(f)
plot(f, y) f y f
y f plot(df)
plot(~ expr)
plot(y ~ expr) df y expr ‘+’
( a + b + c)
(
) y expr
12.1.2
R X
> pairs(X)
X pairwise scatterplot matrix
X X n(n−1)
coplot a b
c ( )
c a b c a
c b c
conditioning intervals c a b a
b coplot() given.values=
: “ This automatic simultaneous scaling feature is also useful when the xi’s are
main=string
> plot(x, y, type="n"); text(x, y, names)
legend(x, y, legend, ...) legend
legend
v ( legend
)
legend( , fill=v)
legend( , col=v)
legend( , lty=v)
legend( , lwd=v)
legend( , pch=v)
( )
title(main, sub) main
sub
axis(side, ...) (1 4
)
axes=FALSE plot()
( x y )
x y x y
locator()( )
12.2.1
R
expression text mtext axis
title
R
> help(plotmath) > example(plotmath) > demo(plotmath)
12.2.2 Hershey
text contour Hershey Hershey
• Hershey
• Hershey
• Hershey cyrillic
R Hershey
> help(Hershey) > demo(Hershey) > help(Japanese) > demo(Japanese)
12.3
R
locator() locator(n, type)
n ( 512)
type
locator()
x y
locator()
outlying point
( postscript locator() )
identify(x, y, labels) labels labels
x y( )
x y (x, y) identify()
> plot(x, y) > identify(x, y)
identify()
(
x/y ) identify()
labels ( ) plot = FALSE
identify()
x y
12.4
R
R
( ‘col’ ) ( )
12.4.1 : par()
par()
par()
par(c("col", "lty")) (
par(col=4, lty=2) ( )
par()
“ ”
par() par()
R par()
2
> oldpar <- par(col=4, lty=2)
... ...
> par(oldpar)
3
> oldpar <- par(no.readonly=TRUE)
... ...
> par(oldpar)
12.4.2
( ) par()
> plot(x, y, pch="+")
par()
12.5
R par()
2
name=value name par()
> legend(locator(1), as.character(0:25), pch = 0:25)
mfg=c(3,2,3,2)
omi[1]
omi[4]
mfrow=c(3,2)
−−−−−−−−−−−−−−− −−−−−−−−−−−−−−− −−−−−−−−−−−−−−− −−−−−−−−−−−−−−− −−−−−−−−−−−−−−−
oma[3]
mfcol=c(3, 2) mfrow=c(2, 4)
mfcol mfrow
mfrow=c(3,2)
(par("cex") )
0.83 0.66
mfg=c(2, 2, 3, 2)
unequally-sized figures
fig=c(4, 9, 1, 4)/10
oma=c(2, 0, 3, 0)
omi=c(0, 0, 0.8, 0) mar mai
mtext() outer=TRUE oma omi
split.screen() layout()
4
grid lattice
12.6
R ( )
R
device driver R ( “draw a line,”)
help(Devices)
> postscript()
PostScript
X11() UNIX X11
windows() Windows
quartz() MacOS X
postscript() PostScript PostScript
pdf() PDF PDF
png() PNG ( )
jpeg() JPEG image (
)
> dev.off()
( )
> postscript("file.ps", horizontal=FALSE, height=5, pointsize=10)
...
dev.print(device, ..., which=k) k device
postscript
R xgobi XGobi UNIX Windows X
R
6
XGobi GGobi 7
http://www.ggobi.
org
7
ggobi Rggobi R R BioConductor
Ggobi Ggobi Rggobi http://www.ggobi.org/
downloads/ Ggobi R Rggobi Ggobi
)
R
> library()
( boot Davison
& Hinkley (1997))
> library(boot)
CRAN.packages() ( Windows RAqua
Packages )
> search()
( <
101>)
> help.start()
13.1
( ) R R
R
R
13.2
CRAN
R
(
) R CRAN (http://CRAN.R-project.
org/ ) Bioconductor (http://www.bioconductor.
org/) R
13.3
namespaces
datasets
t() R t
::
base::t base
::: R
getAnywhere()
1
R
$ R
R2 R
( R )
help.start()
HTML ( )
x <- rnorm(50) y <- rnorm(x)
x y
plot(x, y)
ls()
R
rm(x, y)
1
: Linux Windows
2
( )
x <- 1:20
x= (1,2, . . . ,20)
w <- 1 + sqrt(x)/2
‘ ’
dummy <- data.frame(x=x, y= x + rnorm(x)*w) dummy
x y
fm <- lm(y ~ x, data=dummy) summary(fm)
y x
fm1 <- lm(y ~ x, data=dummy, weight=1/w^2) summary(fm1)
attach(dummy)
lrf <- lowess(x, y)
plot(x, y)
lines(x, lrf$y)
abline(0, 1, lty=3)
abline(coef(fm))
abline(coef(fm1), col = "red")
detach()
plot(fitted(fm), resid(fm), xlab="Fitted values", ylab="Residuals",
main="Residuals vs Fitted")
heteroscedasticity
qqnorm(resid(fm), main="Residuals Rankit Plot")
skewness kurtosis outlier
rm(fm, fm1, lrf, x, dummy)
Michaelson Morley
morley read.table
filepath <- system.file("data", "morley.tab" , package="datasets") filepath
file.show(filepath)
Michaelson Morley (Expt
) 20 (Run ) sl
mm$Expt <- factor(mm$Expt) mm$Run <- factor(mm$Run)
Expt Run
attach(mm)
3 ( )
plot(Expt, Speed, main="Speed of Light Data", xlab="Experiment No.")
fm <- aov(Speed ~ Run + Expt, data=mm) summary(fm)
‘runs’ ‘experiments’
fm0 <- update(fm, . ~ . - Run) anova(fm0, fm)
‘runs’
detach() rm(fm, fm0)
x <- seq(-pi, pi, len=50) y <- x
x −π≤x≤π 50 y
f <- outer(x, y, function(x, y) cos(y)/(1 + x^2))
f x y cos(y)/(1 +x2)
“ ”
contour(x, y, f)
contour(x, y, f, nlevels=15, add=TRUE)
f
fa <- (f-t(f))/2
fa f “ ”(t() )
contour(x, y, fa, nlevels=15)
. . .
par(oldpar)
. . .
image(x, y, f) image(x, y, fa)
( ) . . .
objects(); rm(x, y, f, fa)
. . . R
th <- seq(-pi, pi, len=100) z <- exp(1i*th)
1i i
par(pty="s") plot(z, type="l")
w <- rnorm(100) + rnorm(100)*1i
. . .
w <- ifelse(Mod(w) > 1, 1/w, w)
plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+",xlab="x", ylab="y") lines(z)
w <- sqrt(runif(100))*exp(2*pi*runif(100)*1i)
plot(w, xlim=c(-1,1), ylim=c(-1,1), pch="+", xlab="x", ylab="y") lines(z)
rm(th, w, z)
q()
13.4
R
UNIX Windows R R
R [options] [<infile] [>outfile],
R CMD “ ” ( R
) R
UNIX TMPDIR
R
( Startup
Windows-)
• --no-environ R
R ENVIRON
$R HOME/etc/Renviron.site ( )
.Renviron name=value (help(Startup)
) R PAPERSIZE(
/ ) R PRINTCMD ( ) R LIBS ( R
)
• R --no-site-file
R PROFILE
$R HOME/etc/Rprofile.site
• --no-init-file R
• .RData ( --no-restore --no-restore-data)
--vanilla --no-save, --no-environ --no-site-file,
--no-readline ( UNIX) readline
Emacs ESS (“Emacs Speaks Statistics”) R
< 114>
--ess ( Windows) R-inferior-mode ESS
Rterm --min-vsize=N
--max-vsize=N “ ” vector heap N
N ‘Giga’
(2ˆ30) ‘Mega’ (2ˆ20) ‘Kilo’ (2ˆ10 ) ‘kilo’
(1000 ) G M K k
--min-nsize=N
--max-nsize=N “cons ” cons cells N
13.5
Windows
R
Windows R ( cmd.exe command.com
) R.exe Rterm.exe
C:\Documents and Settings\username\My Documents )
HOMEDRIVE HOMEPATH ( Windows NT/2000/XP
)
--debug Rgui “Break to debugger”
Windows R CMD *.bat *.exe
R HOME R VERSION R CMD R OSTYPE PATH PERL5LIB
TEXINPUTS latex.exe
R CMD latex.exe mydoc
mydoc.tex LA
TEX R share/texmf TEXINPUTS
13.6
Mac OS X
R
Mac OS X R Terminal.app R
Applications Mac OS X
13.7
UNIX GNU readline R UNIX
R
UNIX GNOME
--no-readline ( ESS 3
)
Windows R GUI Help
Console Rterm.exe README.Rterm
readline R
Meta character
Control-m CTRL m C-m Meta-b
META4 b M-b META
ESC M-b
ESC b ESC
13.8
R
Emacs-vi
M-i M-a ESC
RET
3
‘Emacs Speaks Statistics’ URLhttp://ESS.R-project.org 4
13.9
C-p ( )
C-n ( )
C-r text text
C-p C-n
C-a C-e M-b M-f C-b C-f
C-b C-f
text text
C-f text text
DEL ( )
C-d
M-d “ ”
C-k “ ”
C-y “ ”
C-t M-l M-c
RET R
wilcox.test . . . .50 windows . . . .96
D. M. Bates and D. G. Watts (1988), Nonlinear Regression Analysis and Its Applica-tions. John Wiley & Sons, New York.
Richard A. Becker, John M. Chambers and Allan R. Wilks (1988), The New S Lan-guage. Chapman & Hall, New York. This book is often called the “Blue Book”. John M. Chambers and Trevor J. Hastie eds. (1992),Statistical Models in S.Chapman & Hall, New York. This is also called the “White Book”.
John M. Chambers (1998)Programming with Data. Springer, New York. This is also called the “Green Book”.
A. C. Davison and D. V. Hinkley (1997), Bootstrap Methods and Their Applications, Cambridge University Press.
Annette J. Dobson (1990), An Introduction to Generalized Linear Models, Chapman and Hall, London.
Peter McCullagh and John A. Nelder (1989),Generalized Linear Models. Second edi-tion, Chapman and Hall, London.
John A. Rice (1995), Mathematical Statistics and Data Analysis. Second edition. Duxbury Press, Belmont, CA.