Addendum 2: Preview of Descriptive Statistics and Graphics Using RGraphics Using R

The purpose of this addendum is to provide a limited introduction to the power of R functions as they are used with simple descriptive statistics and accompa- nying graphics (e.g., ﬁgures). To achieve this aim, consider this early addendum as a preview of how R can:

• Generate a theoretical dataset of infant birth weights, with the data con- sisting of 10,000 random numbers exhibiting a collective mean of 3500 g and a standard deviation of 450 g, or an avoirdupois weight (a system of weights in pounds and ounces still widely used in some English-speaking countries) of approximately mean equals 7-pounds and 11-ounces and standard deviation equals 1-pound and 0-ounces.

• Apply functions used to display data and generate descriptive statistics such as minimum, ﬁrst quartile, median, mean, third quartile, and maximum.

• Display increasingly detailed and compelling graphics (e.g., ﬁgures), used to enhance understanding of the relationship between and among data.

Follow along with the syntax shown below and give special attention to the many comments that follow along after the R-based function and function arguments are deployed. If there is text on the screen from prior use of R, it may be helpful to clear the screen using one of two methods:

• Method 1: Press the Control key and the letter l (lowercase l, not the number 1) at the same time.

• Method 2: Use a mouse or keypad to click the R menu selections Edit and then Clear console.

With the R screen clear of all unnecessary text, follow along with the R-based syntax shown below:

R Input

ls()

# List all objects in the active R session set.seed(8)

# Allow reproducible random numbers

BirthWeightGrams <- round(rnorm(10000, mean=3500, sd=450))

# Make an object variable called BirthWeightGrams. Note

# how the name BirthWeightGrams is an example of internal

# self-documentation, providing a ready reference to what

# was measured (e.g., BirthWeight) and the unit of

# measure (e.g., Grams).

# R supports the use of = and <- as assignment operators.

# For this text, <- is used for assignment, to avoid any

# possible confusion and incorrect use of == (two equal

# signs with no space between them) since == is used to

# express comparative equivalency, similar to many other

# programming languages.

# For this demonstration, the rnorm() function will be

# used to generate a dataset of:

# N ... 10,000 random numbers

# Mean ... 3,500 grams

# Standard Deviation (sd) .. 450 grams

# These values approximate expected birth weights of

# infants at an otherwise unidentified metropolitan

# area, but a few values at either end of the continuum

# may certainly seem extreme for this theoretical dataset

# of randomly-generated numbers.

# Note how the round() function was wrapped around the

# rnorm() function so that output would show as whole

# numbers.

Far more discussion on descriptive statistics and measures of central tendency appears in later lessons throughout this text. For now, note how English-like terms are used to deploy selected functions. Look also at the way the# symbol is used to express a comment, shown immediately below as a brief description of each function:

R Input

head(BirthWeightGrams) # Show the head, first values tail(BirthWeightGrams) # Show the tail, last values

mean(BirthWeightGrams) # Mean, arithmetic average sd(BirthWeightGrams) # SD, standard deviation median(BirthWeightGrams) # Median, midpoint

min(BirthWeightGrams) # Minimum value max(BirthWeightGrams) # Maximum value

summary(BirthWeightGrams) # Summary descriptive statistics

R Output

[Selected output is not shown, to save space.]

> summary(BirthWeightGrams) # Summary descriptive statistics Min. 1st Qu. Median Mean 3rd Qu. Max.

1619 3195 3507 3502 3804 5232

Study on the output of these above functions before the graphics (e.g., ﬁgures) shown below are attempted. Collectively, these functions should provide a sense of the data, the average birth weight, dispersion of birth weights, etc.²⁵

However, to gain a more complete understanding of the objectBirthWeightGrams, focus on the different graphics shown below, all based on the use of the hist() function. The graphics start out as a simple histogram, with few if any embel- lishments. By using different R-based arguments, note how increasingly inter- esting and useful figures are generated, adding color, labels, bold text, larger fonts, etc. In later lessons even greater complexity will be added to the figures, but for now merely observe the potentials that can be achieved by using R for biostatistics (Fig.1.10).

R Input

par(ask=TRUE)

par(mfrow=c(2,2)) # 4 figures into a 2 row by 2 column grid hist(BirthWeightGrams)

hist(BirthWeightGrams,

breaks=25, # Granularity of output main="Distribution of Infant Birth Weight (g)", col="red",

xlab="Infant Birth Weight (g)",

25By itself, the term average is somewhat ambiguous given that there are multiple ways to view centrality. Does average refer tomode, the most frequently occurring value? Does average refer to median, the mid-point? Does average refer to mean, which is generally viewed as the arithmetic expression of Sum divided by N? Fortunately, R supports functions and function arguments that easily accommodate these multiple views of central distribution.

ylab="Count")

hist(BirthWeightGrams,

breaks=50, # Granularity of output main="Distribution of Infant Birth Weight (g)", col="red", # Color(s)

xlab="Infant Birth Weight (g)", ylab="Count",

font.axis=2, # Make the axis bold font.lab=2) # Make the labels bold hist(BirthWeightGrams,

breaks=50, # Granularity of output main="Distribution of Infant Birth Weight (g)", col="red", # Color(s)

xlab="Infant Birth Weight (g)", ylab="Count",

font.axis=2, # Make the axis bold font.lab=2, # Make the labels bold cex.main=1.75, # Size - main (title) cex.lab=1.25, # Size - labels

xlim=c(0,6000), # Adjust X axis limits ylim=c(0,1000)) # Adjust Y axis limits

Figure 1.10: Multiple histograms of birth weight

Dalam dokumen Thomas W. MacFarland Jan M. Yates (Halaman 74-77)