Permission is granted to make and distribute verbatim copies of this manual provided that the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute translations of this manual in another language under the conditions above for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.
The R environment
Related software and documentation
R and statistics
Thus, while SAS and SPSS will provide ample output from a regression or discriminant analysis, R will provide minimal output and store the results in an appropriate object for subsequent querying with additional R functions.
R and the window system
Using R interactively
An introductory session
Getting help with functions and features
R commands, case sensitivity, etc
Comments can be placed almost anywhere, starting with a hashmark ('#'), everything up to the end of the line is a comment. If a command is not complete at the end of a line, R will give a different prompt, by default.
Recall and correction of previous commands
Executing commands from or diverting output to a file
Data permanency and removing objects
All objects created during an R session can be permanently saved to a file for use in future R sessions. At the end of each R session, you have the option to save all currently available objects.
Vectors and assignment
Vector arithmetic
For most purposes, the user will not care whether the "numbers" in the numeric vector are integers, real, or even complex. Internal calculations are performed as double-precision real numbers or double-precision complex numbers if the input data is complex.
Generating regular sequences
If the argument tovar() is an n-by-p matrix, the value is a p-by-p sample covariance matrix obtained by treating the rows as independent sample vectors of variance p. The parallel max and min functions pmaxandpmin return a vector (of length equal to their longest argument) containing in each element the largest (smallest) element at that position in any of the input vectors.
Logical vectors
Missing values
Character vectors
Index vectors; selecting and modifying subsets of a data set
The corresponding elements of the vector are selected and joined, in that order, in the result. The index vector can be of any length and the result is the same length as the index vector.
Other types of objects
In this case, the subvector of the name vector can be used in the same way as the positive integral labels in point 2 above. The indexed expression can also appear on the receiving end of the assignment, in which case the assignment operation is performed only on those elements of the vector.
Intrinsic attributes: mode and length
Changing the length of an object
Getting and setting attributes
The class of an object
You only need this facility in quite special situations, but one of them is when you get to grips with the idea of class and generic functions. Generic functions and classes are discussed further in Section 10.9 [Object Orientation], page 48, but only briefly.
A specific example
Afactor is a vector object used to specify a discrete classification (grouping) of the components of other vectors of the same length.
The function tapply() and ragged arrays
The result is a structure of the same length as the levels attribute of the factor containing the results. To do this you can use taply() one more time with the length() function to find the sample sizes, and the qt() function to find the percentage points of the appropriate t-distributions. You can also explore R's facilities for t-tests.).
Ordered factors
For this we need to write an R function to calculate the standard error for any given vector. The values in the vector are grouped into groups corresponding to the different entries in the factor.
Arrays
Array indexing. Subsections of an array
Index matrices
Index matrices must be numeric: any other form of matrix (eg a logical or character matrix) supplied as a matrix is treated as an indexing vector.
The array() function
Mixed vector and array arithmetic. The recycling rule
The outer product of two arrays
Generalized transpose of an array
Matrix facilities
Matrix multiplication
Then ev$val is the vector of eigenvalues of Sm and ev$vec is the matrix of corresponding eigenvectors.
Singular value decomposition and determinants
Least squares fitting and the QR decomposition
For large arrays, it is better to avoid calculating eigenvectors if we do not need them using an expression. These calculate the orthogonal projection of y onto the area X in the fit, the projection onto the orthogonal complement of inres, and the coefficient vector for the projection inb, which is essentially the result of Matlab's 'backslash' operator.
Forming partitioned matrices, cbind() and rbind()
Although still useful in some contexts, it would now generally be superseded by statistical model functions, as discussed in Chapter 11 [Statistical Models in R], page 51.
The concatenation function, c(), with arrays
Frequency tables from factors
Lists
Constructing and modifying lists
Concatenating lists
The expression must be of the form vector[index_vector] since having an arbitrary expression instead of the vector name doesn't make much sense here. When the sizes of the subclasses are all the same, indexing can be done implicitly and much more efficiently, as we see in the next section.
Data frames
- Making data frames
- attach() and detach()
- Working with data frames
- Attaching arbitrary lists
- Managing the search path
However, the new value of component u is not visible until the data frame is detached and reattached. Finally, we detach the data frame and confirm it has been removed from the search path.
The read.table() function
Large data objects are usually read as values from external files rather than entered during an R session on the keyboard. If variables need to be kept primarily in data frames, as we strongly recommend, an entire data frame can be read directly using the read.table() function.
The scan() function
Accessing builtin datasets
Loading data from other R packages
Editing data
R as a set of statistical tables
Examining the distribution of a set of data
Note that distribution theory is not valid here since we have estimated the parameters of the normal distribution from the same sample.).
One- and two-sample tests
To test for the equality of the means of the two examples, we can use an unpaired t-test. We can use the F-test to test for equality in the variances, provided that the two samples are from normal populations.
Grouped expressions
Control statements
Conditional execution: if statements
Repetitive execution: for loops, repeat and while
In the process, the language gains enormously in power, convenience, and elegance, and learning to write useful functions is one of the most important ways to make your use of R comfortable and productive. It should be emphasized that most of the functions provided as part of the R system, such as average(),var(),postscript() and so on, are themselves written in R and are therefore not substantially different from user-written functions.
Simple examples
The expression is an R expression, (usually a grouped expression) that uses the arguments, arg i, to calculate a value. It again uses the qr() and qr.coef() functions in the slightly counterintuitive way above to perform this part of the calculation.
Defining new binary operators
Therefore, it probably has some value to have just this part isolated in an easy to use function if it will be used frequently.
Named arguments and defaults
The ‘...’ argument
Assignments within functions
More advanced examples
Efficiency factors in block designs
Dropping all names in a printed array
Recursive numerical integration
Scope
The special assignment operator, <<-, is used to change the value associated with total. For most users<<-create a global variable and assign the value from the right to it2.
Customizing the environment
Classes, generic functions and object orientation
Contrasts
We need at least an idea of how the model formulas specify the columns of the model matrix. This is easy if we have continuous variables, as each gives one column in the model matrix (and the intersection will give a column of one if included in the model).
Linear models
Generic functions for extracting model information
Analysis of variance and model comparison
ANOVA tables
Updating fitted models
In particular, note that if the data= argument is specified on the original call to the model fit function, this information is passed through the fit model object toupdate() and its allies.
Generalized linear models
Families
The class of generalized linear models handled by facilities provided in R includes Gaussian, binomial, Poisson, inverse Gaussian and gamma response distributions and also quasi-probability models where the response distribution is not explicitly specified. In the latter case the variance function must be specified as a function of the mean, but in other cases this function is implied by the response distribution.
The glm() function
The shape of the dependence of the variance on the mean is characteristic of the response distribution; for example for the poisson distribution Var[y] =µ. For quasi-likelihood estimation and inference, the exact response distribution is not specified, but only a link function and the shape of the variance function as it depends on the mean.
Nonlinear least squares and maximum likelihood models
Least squares
For all families, the variance of the response will depend on the mean and will have the scale parameter as a multiplier. After adjustment, $minimum is the SSE and $estimates are the least squares estimates of the parameters.
Maximum likelihood
Some non-standard models
Graphics devices can be used in both interactive and batch mode, but in most cases, interactive use is more productive. There is a recommended package (https://CRAN.R-project.org/package=lattice) which builds ongrid and provides ways to produce multi-panel plots similar to those in the Trellis system in S.
High-level plotting commands
- The plot() function
- Displaying multivariate data
- Display graphics
- Arguments to high-level plotting functions
The first two forms produce distribution plots of variables in a data frame (first form) or of a number of named objects (second form). In a dotchart the y-axis gives a label of the data in x and the x-axis gives its value.
Low-level plotting commands
Mathematical annotation
Hershey vector fonts
Interacting with graphics
Waits for the user to select locations on the current plot using the left mouse button. Allow the user to highlight any of the points defined by xandy (using the left mouse button) by drawing the corresponding label component nearby (or the point index number if the label is missing).
Using graphics parameters
Permanent changes: The par() function
When the process terminates (see above), identifi() returns the indices of the selected points; you can use these indices to extract the selected points from the original vectors x and y. With named arguments (or a single list argument), sets the values of the named graphics parameters, and returns the original values of the parameters as a list.
Temporary changes: Arguments to graphics functions
Setting graphics parameters with the par() function changes the value of the parameters permanently, in the sense that all future calls to graphics functions (on the current device) will be affected by the new value. You can think of setting graphics parameters this way as setting "default" values for the parameters, which will be used by all graphics functions unless an alternative value is provided.
Graphics parameters list
- Graphical elements
- Axes and tick marks
- Figure margins
- Multiple figure environment
Negative values give tick marks outside the drawing region. default) annotations always fall within the data range, regardless of the "r" style. The first two numbers are the row and column of the current figure; the last two are the number of rows and columns in the set of multiple figures.
Device drivers
PostScript diagrams for typeset documents
However, there are no outer borders by default, so you need to create them explicitly using omaoromi. This unusual notation derives from compatibility with S: it actually means that the output will be a single page (which is part of the EPSF specification).
Multiple graphics devices
This works best when encapsulated PostScript is produced: R always produces consistent output, but only marks the output as such if theonefile=FALSEargument is supplied.
Dynamic graphics
Users connected to the Internet can use the install.packages() and update.packages() functions (available from the Packages menu in the Windows and macOS GUIs, see Section . “Installing Packages” in Setup and Administration) to install packages. install and update. Some packages may be loaded but not available in the search list (see Section 13.3 [Namespaces], page 77): they will be included in the list provided by.
Standard packages
Use a command like.
Contributed packages and CRAN
Namespaces
Packages are often interdependent, and loading one can cause the others to load automatically. When packages with namespaces are loaded automatically, they are not added to the search list.
Files and directories
Filepaths
Windows allows file paths that contain drives and relative to the current directory on the drive, e.g. d:foo/bar refers to tod:/a/b/c/foo/bar if the current running directory: is/a/b/c. Functionpathexpand does "tilde expansion", substituting values for the home directories (of the current user and possibly other users).
System commands
Multiple slashes in file paths such as /abc//defare valid on POSIX filesystems and treated as if there was only one slash. However, leading double slashes can have a different meaning. UNC\server\dir1\dir2\file) are not supported, but they may work in some R functions.
Compression and Archives
The name of this file is taken from the environment variable R_PROFILE_USER; if not set, a file called .Rprofile is looked for in the current directory or in the user's home directory (in that order). It also loads a saved workspace from file .RData in the current directory if there is one (unless --no-restore or--no-restore-data is specified).
Invoking R under Windows
Under Windowscmd can be an executable or a batch file, or if it has the extension .sh or.pl, the appropriate interpreter (if available) is called to run it.
Invoking R under macOS
The startup procedure under macOS is very similar to that under UNIX, but R.app does not use command line arguments. The 'home directory' is the one inside the R.framework, but the startup and current working directory is set as the user's home directory, unless a different startup directory is specified in the Preferences window accessible from the GUI.
Scripting with R
Preliminaries
Editing actions
Command-line editor summary
On most terminals, you can also use the up and down arrow keys instead of C-p and C-n respectively. On most terminals, you can also use the left and right arrow keys instead of C-b and C-f respectively.