• Tidak ada hasil yang ditemukan

An Introduction to R - Name

N/A
N/A
Nguyễn Gia Hào

Academic year: 2023

Membagikan "An Introduction to R - Name"

Copied!
105
0
0

Teks penuh

Permission is granted to make and distribute verbatim copies of this manual provided that the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute translations of this manual in another language under the conditions above for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

The R environment

Related software and documentation

R and statistics

Thus, while SAS and SPSS will provide ample output from a regression or discriminant analysis, R will provide minimal output and store the results in an appropriate object for subsequent querying with additional R functions.

R and the window system

Using R interactively

An introductory session

Getting help with functions and features

R commands, case sensitivity, etc

Comments can be placed almost anywhere, starting with a hashmark ('#'), everything up to the end of the line is a comment. If a command is not complete at the end of a line, R will give a different prompt, by default.

Recall and correction of previous commands

Executing commands from or diverting output to a file

Data permanency and removing objects

All objects created during an R session can be permanently saved to a file for use in future R sessions. At the end of each R session, you have the option to save all currently available objects.

Vectors and assignment

Vector arithmetic

For most purposes, the user will not care whether the "numbers" in the numeric vector are integers, real, or even complex. Internal calculations are performed as double-precision real numbers or double-precision complex numbers if the input data is complex.

Generating regular sequences

If the argument tovar() is an n-by-p matrix, the value is a p-by-p sample covariance matrix obtained by treating the rows as independent sample vectors of variance p. The parallel max and min functions pmaxandpmin return a vector (of length equal to their longest argument) containing in each element the largest (smallest) element at that position in any of the input vectors.

Logical vectors

Missing values

Character vectors

Index vectors; selecting and modifying subsets of a data set

The corresponding elements of the vector are selected and joined, in that order, in the result. The index vector can be of any length and the result is the same length as the index vector.

Other types of objects

In this case, the subvector of the name vector can be used in the same way as the positive integral labels in point 2 above. The indexed expression can also appear on the receiving end of the assignment, in which case the assignment operation is performed only on those elements of the vector.

Intrinsic attributes: mode and length

Changing the length of an object

Getting and setting attributes

The class of an object

You only need this facility in quite special situations, but one of them is when you get to grips with the idea of ​​class and generic functions. Generic functions and classes are discussed further in Section 10.9 [Object Orientation], page 48, but only briefly.

A specific example

Afactor is a vector object used to specify a discrete classification (grouping) of the components of other vectors of the same length.

The function tapply() and ragged arrays

The result is a structure of the same length as the levels attribute of the factor containing the results. To do this you can use taply() one more time with the length() function to find the sample sizes, and the qt() function to find the percentage points of the appropriate t-distributions. You can also explore R's facilities for t-tests.).

Ordered factors

For this we need to write an R function to calculate the standard error for any given vector. The values ​​in the vector are grouped into groups corresponding to the different entries in the factor.

Arrays

Array indexing. Subsections of an array

Index matrices

Index matrices must be numeric: any other form of matrix (eg a logical or character matrix) supplied as a matrix is ​​treated as an indexing vector.

The array() function

Mixed vector and array arithmetic. The recycling rule

The outer product of two arrays

Generalized transpose of an array

Matrix facilities

Matrix multiplication

Then ev$val is the vector of eigenvalues ​​of Sm and ev$vec is the matrix of corresponding eigenvectors.

Singular value decomposition and determinants

Least squares fitting and the QR decomposition

For large arrays, it is better to avoid calculating eigenvectors if we do not need them using an expression. These calculate the orthogonal projection of y onto the area X in the fit, the projection onto the orthogonal complement of inres, and the coefficient vector for the projection inb, which is essentially the result of Matlab's 'backslash' operator.

Forming partitioned matrices, cbind() and rbind()

Although still useful in some contexts, it would now generally be superseded by statistical model functions, as discussed in Chapter 11 [Statistical Models in R], page 51.

The concatenation function, c(), with arrays

Frequency tables from factors

Lists

Constructing and modifying lists

Concatenating lists

The expression must be of the form vector[index_vector] since having an arbitrary expression instead of the vector name doesn't make much sense here. When the sizes of the subclasses are all the same, indexing can be done implicitly and much more efficiently, as we see in the next section.

Data frames

  • Making data frames
  • attach() and detach()
  • Working with data frames
  • Attaching arbitrary lists
  • Managing the search path

However, the new value of component u is not visible until the data frame is detached and reattached. Finally, we detach the data frame and confirm it has been removed from the search path.

The read.table() function

Large data objects are usually read as values ​​from external files rather than entered during an R session on the keyboard. If variables need to be kept primarily in data frames, as we strongly recommend, an entire data frame can be read directly using the read.table() function.

The scan() function

Accessing builtin datasets

Loading data from other R packages

Editing data

R as a set of statistical tables

Examining the distribution of a set of data

Note that distribution theory is not valid here since we have estimated the parameters of the normal distribution from the same sample.).

One- and two-sample tests

To test for the equality of the means of the two examples, we can use an unpaired t-test. We can use the F-test to test for equality in the variances, provided that the two samples are from normal populations.

Grouped expressions

Control statements

Conditional execution: if statements

Repetitive execution: for loops, repeat and while

In the process, the language gains enormously in power, convenience, and elegance, and learning to write useful functions is one of the most important ways to make your use of R comfortable and productive. It should be emphasized that most of the functions provided as part of the R system, such as average(),var(),postscript() and so on, are themselves written in R and are therefore not substantially different from user-written functions.

Simple examples

The expression is an R expression, (usually a grouped expression) that uses the arguments, arg i, to calculate a value. It again uses the qr() and qr.coef() functions in the slightly counterintuitive way above to perform this part of the calculation.

Defining new binary operators

Therefore, it probably has some value to have just this part isolated in an easy to use function if it will be used frequently.

Named arguments and defaults

The ‘...’ argument

Assignments within functions

More advanced examples

Efficiency factors in block designs

Dropping all names in a printed array

Recursive numerical integration

Scope

The special assignment operator, <<-, is used to change the value associated with total. For most users<<-create a global variable and assign the value from the right to it2.

Customizing the environment

Classes, generic functions and object orientation

Contrasts

We need at least an idea of ​​how the model formulas specify the columns of the model matrix. This is easy if we have continuous variables, as each gives one column in the model matrix (and the intersection will give a column of one if included in the model).

Linear models

Generic functions for extracting model information

Analysis of variance and model comparison

ANOVA tables

Updating fitted models

In particular, note that if the data= argument is specified on the original call to the model fit function, this information is passed through the fit model object toupdate() and its allies.

Generalized linear models

Families

The class of generalized linear models handled by facilities provided in R includes Gaussian, binomial, Poisson, inverse Gaussian and gamma response distributions and also quasi-probability models where the response distribution is not explicitly specified. In the latter case the variance function must be specified as a function of the mean, but in other cases this function is implied by the response distribution.

The glm() function

The shape of the dependence of the variance on the mean is characteristic of the response distribution; for example for the poisson distribution Var[y] =µ. For quasi-likelihood estimation and inference, the exact response distribution is not specified, but only a link function and the shape of the variance function as it depends on the mean.

Nonlinear least squares and maximum likelihood models

Least squares

For all families, the variance of the response will depend on the mean and will have the scale parameter as a multiplier. After adjustment, $minimum is the SSE and $estimates are the least squares estimates of the parameters.

Maximum likelihood

Some non-standard models

Graphics devices can be used in both interactive and batch mode, but in most cases, interactive use is more productive. There is a recommended package (https://CRAN.R-project.org/package=lattice) which builds ongrid and provides ways to produce multi-panel plots similar to those in the Trellis system in S.

High-level plotting commands

  • The plot() function
  • Displaying multivariate data
  • Display graphics
  • Arguments to high-level plotting functions

The first two forms produce distribution plots of variables in a data frame (first form) or of a number of named objects (second form). In a dotchart the y-axis gives a label of the data in x and the x-axis gives its value.

Low-level plotting commands

Mathematical annotation

Hershey vector fonts

Interacting with graphics

Waits for the user to select locations on the current plot using the left mouse button. Allow the user to highlight any of the points defined by xandy (using the left mouse button) by drawing the corresponding label component nearby (or the point index number if the label is missing).

Using graphics parameters

Permanent changes: The par() function

When the process terminates (see above), identifi() returns the indices of the selected points; you can use these indices to extract the selected points from the original vectors x and y. With named arguments (or a single list argument), sets the values ​​of the named graphics parameters, and returns the original values ​​of the parameters as a list.

Temporary changes: Arguments to graphics functions

Setting graphics parameters with the par() function changes the value of the parameters permanently, in the sense that all future calls to graphics functions (on the current device) will be affected by the new value. You can think of setting graphics parameters this way as setting "default" values ​​for the parameters, which will be used by all graphics functions unless an alternative value is provided.

Graphics parameters list

  • Graphical elements
  • Axes and tick marks
  • Figure margins
  • Multiple figure environment

Negative values ​​give tick marks outside the drawing region. default) annotations always fall within the data range, regardless of the "r" style. The first two numbers are the row and column of the current figure; the last two are the number of rows and columns in the set of multiple figures.

Device drivers

PostScript diagrams for typeset documents

However, there are no outer borders by default, so you need to create them explicitly using omaoromi. This unusual notation derives from compatibility with S: it actually means that the output will be a single page (which is part of the EPSF specification).

Multiple graphics devices

This works best when encapsulated PostScript is produced: R always produces consistent output, but only marks the output as such if theonefile=FALSEargument is supplied.

Dynamic graphics

Users connected to the Internet can use the install.packages() and update.packages() functions (available from the Packages menu in the Windows and macOS GUIs, see Section . “Installing Packages” in Setup and Administration) to install packages. install and update. Some packages may be loaded but not available in the search list (see Section 13.3 [Namespaces], page 77): they will be included in the list provided by.

Standard packages

Use a command like.

Contributed packages and CRAN

Namespaces

Packages are often interdependent, and loading one can cause the others to load automatically. When packages with namespaces are loaded automatically, they are not added to the search list.

Files and directories

Filepaths

Windows allows file paths that contain drives and relative to the current directory on the drive, e.g. d:foo/bar refers to tod:/a/b/c/foo/bar if the current running directory: is/a/b/c. Functionpathexpand does "tilde expansion", substituting values ​​for the home directories (of the current user and possibly other users).

System commands

Multiple slashes in file paths such as /abc//defare valid on POSIX filesystems and treated as if there was only one slash. However, leading double slashes can have a different meaning. UNC\server\dir1\dir2\file) are not supported, but they may work in some R functions.

Compression and Archives

The name of this file is taken from the environment variable R_PROFILE_USER; if not set, a file called .Rprofile is looked for in the current directory or in the user's home directory (in that order). It also loads a saved workspace from file .RData in the current directory if there is one (unless --no-restore or--no-restore-data is specified).

Invoking R under Windows

Under Windowscmd can be an executable or a batch file, or if it has the extension .sh or.pl, the appropriate interpreter (if available) is called to run it.

Invoking R under macOS

The startup procedure under macOS is very similar to that under UNIX, but R.app does not use command line arguments. The 'home directory' is the one inside the R.framework, but the startup and current working directory is set as the user's home directory, unless a different startup directory is specified in the Preferences window accessible from the GUI.

Scripting with R

Preliminaries

Editing actions

Command-line editor summary

On most terminals, you can also use the up and down arrow keys instead of C-p and C-n respectively. On most terminals, you can also use the left and right arrow keys instead of C-b and C-f respectively.

Referensi

Dokumen terkait

Legend of variables used in the analysis Variable Measure of AOI Area of Interest: Part of the page for which we are obtaining data OBSLEN Observation length: Duration of a visit to