Addendum 1: Specialized External Packages and Func- tions

There are more than 15,000 external R-based packages available for immediate download, without required permissions and without direct cost. From among these there are thousands of specialized functions to supplement the set of functions available when R is initially downloaded. A few specialized functions speciﬁc to speciﬁc to descriptive statistics and measures of central tendency are demonstrated below.

Some specialized functions provide not only numerical statistics of immediate use, but they also concurrently provide a graphical image, to further reinforce the organization of data in question. Function arguments are typically used to embellish graphical output, but in this lesson function arguments have been kept to a minimum. More details, available through the use of function arguments, are demonstrated in later lessons.

R Input

install.packages("asbio")

library(asbio) # Load the asbio package.

help(package=asbio) # Show the information page.

sessionInfo() # Confirm all attached packages.

R Input

asbio::Mode(CPIIISecLbsGen.df$Lbs)

# The modes::modes() function was demonstrated previously.

# The asbio::Mode() function is merely another way to obtain

# this measure of central tendency.

R Output

[1] 114 122

The epiDisplay package is quite versatile, with many functions that support frequency distributions, descriptive statistics, and measures of central tendency.

Some epiDisplay functions provide not only attractive graphics but also useful statistics in text format printed to the screen (Figs.2.5,2.6 and 2.7) .

R Input

install.packages("epiDisplay")

library(epiDisplay) # Load the epiDisplay package.

help(package=epiDisplay) # Show the information page.

sessionInfo() # Confirm all attached packages.

R Input

par(ask=TRUE)

par(mfrow=c(1,2)) # 2 figures into a 1 row by 2 column grid epiDisplay::tab1(CPIIISecLbsGen.df$Section,

main="Frequency Distribution of Section", col=c("red", "blue"), font.lab=2, font.axis=2) epiDisplay::tab1(CPIIISecLbsGen.df$Gender,

main="Frequency Distribution of Gender",

col=c("red", "blue"), font.lab=2, font.axis=2)

# Generate frequency distributions and percentages

# of by breakouts. Generate a figure of frequency

# distributions.

R Input

par(ask=TRUE)

par(mfrow=c(1,2)) # 2 figures into a 1 row by 2 column grid epiDisplay::tabpct(CPIIISecLbsGen.df$Section,

CPIIISecLbsGen.df$Gender, graph=TRUE, decimal=1, main="Frequency Distribution of Section by Gender", xlab = "Section", ylab = "Gender", cex.axis=1, percent= "both", las=1, col=c("red", "blue")) epiDisplay::tabpct(CPIIISecLbsGen.df$Gender,

CPIIISecLbsGen.df$Section, graph=TRUE, decimal=1, main="Frequency Distribution of Gender by Section", xlab = "Gender", ylab = "Section", cex.axis=1, percent= "both", las=1, col=c("red", "blue"))

# Along with the figure that appears as a stacked bar

# chart, look at the detailed frequency distribution

# statistics printed to the screen, which can be

# copied and pasted into a word-processed document.

Figure 2.5: Section and gender: frequency distribution overall

Figure 2.6: Section and gender: frequency distribution breakouts

R Input

par(ask=TRUE)

par(mfrow=c(1,3)) # 3 figures into a 1 row by 3 column grid epiDisplay::summ(CPIIISecLbsGen.df$Lbs,

by=NULL, # No breakout statistics graph=TRUE, box=TRUE, # Dotplot and boxplot pch=20, ylab="auto",

main="Sorted Dotplot of Weight (Lbs), Overall", cex.X.axis=1.25, # Note X axis label size.

cex.Y.axis=1.25, # Note Y axis label size.

font.lab=2, dot.col="auto")

epiDisplay::summ(CPIIISecLbsGen.df$Lbs,

by=Section, # Breakout statistics graph=TRUE, # Dotplot

pch=20, ylab="auto",

main="Sorted Dotplot of Weight (Lbs) by Section", cex.X.axis=1.25, # Note X axis label size.

cex.Y.axis=1.25, # Note Y axis label size.

font.lab=2, dot.col="auto")

epiDisplay::summ(CPIIISecLbsGen.df$Lbs,

by=Gender, # Breakout statistics graph=TRUE, # Dotplot

pch=20, ylab="auto",

main="Sorted Dotplot of Weight (Lbs) by Gender", cex.X.axis=1.25, # Note X axis label size.

cex.Y.axis=1.25, # Note Y axis label size.

font.lab=2, dot.col="auto")

# Produce a sorted dotplot and accompanying descriptive

# statistics: by=NULL (e.g., overall), by=Section, and

# by=Gender. Do not confuse a sorted dotplot with a QQ plot.

# The two figures represent different constructs.

The s20x package is also quite good for generating detailed output of descriptive statistics, overall, and by breakout groups of factor-type object variables.

R Input

install.packages("s20x", dependencies=TRUE)

library(s20x) # Load the s20x package.

help(package=s20x) # Show the information page.

sessionInfo() # Confirm all attached packages.

Figure 2.7: Weight by section and gender breakouts

R Input

s20x::summaryStats(CPIIISecLbsGen.df$Lbs, na.rm=TRUE) # Accommodate missing values.

R Output

Minimum value: 99

Maximum value: 192

Mean value: 131.73

Median: 127

Upper quartile: 139.25 Lower quartile: 121

Variance: 309.39

Standard deviation: 17.59 Midspread (IQR): 18.25

Skewness: 1.08

Number of data values: 61 Number of missing values: 1

R Input

s20x::summaryStats(Lbs ~ Gender, CPIIISecLbsGen.df, na.rm=TRUE) # Accommodate missing values

R Output

Sample Size No. Miss. Mean Median Std Dev Midspread

Female 35 0 123.971 124 8.27997 9

Male 25 1 142.600 142 21.27401 32

R Input

s20x::summaryStats(Lbs ~ Section, CPIIISecLbsGen.df, na.rm=TRUE) # Accommodate missing values

R Output

Sample Size No. Miss. Mean Median Std Dev Midspread

AM 30 1 128.300 126 13.1520 16.25

PM 30 0 135.167 128 20.7864 21.25

The s20x::rowdistr() function, when wrapped around the crosstabs() function, produces an attractive frequency distribution barchart of factor-type object variables. It also generates text-based descriptive statistics to the screen that add further insight into the data (Figs.2.8 and 2.9) .

R Input

par(ask=TRUE); s20x::rowdistr(crosstabs(~ Gender + Lbs, data=CPIIISecLbsGen.df), plot=TRUE, suppressText=FALSE, comp=’basic’)

Figure 2.8: Graphical representation of weight by gender

R Input

par(ask=TRUE); s20x::rowdistr(crosstabs(~ Section + Lbs, data=CPIIISecLbsGen.df), plot=TRUE, suppressText=FALSE, comp=’basic’)

Figure 2.9: Graphical representation of weight by section

The arsenal::tableby() function, with the syntax shown below (but not the output, due to overly long length), builds on the presentation of breakout descriptive statistics. With this function, all of the breakouts for two or more factor-type object variables can be presented in one convenient table. A p- value is included in the printout for both breakouts of Gender and breakouts of Section. For now, ignore the p-value statistics but know that it will be ex- plained in far more detail in later lessons. The focus in this beginning lesson, now, is merely on descriptive statistics and measures of central tendency, not inferential analyses.

R Input

install.packages("arsenal", dependencies=TRUE)

library(arsenal) # Load the arsenal package.

help(package=arsenal) # Show the information page.

sessionInfo() # Confirm all attached packages.

R Input

summary(arsenal::tableby(list(Section, Gender) ~ Lbs, data = CPIIISecLbsGen.df), text=TRUE, total=TRUE)

# Produce a highly-detailed table of descriptive

# statistics, especially: mean, sd, and range.

The function pivottabler::qpvt() puts output into text format, as shown below. If desired, copy the syntax shown below but use the pivottabler::qhpvt() function to put output into HTML format. For those with advanced skills, the pivottabler::qlpvt() function generates output suitable for a L^ATEXdocument.

R Input

install.packages("pivottabler", dependencies=TRUE)

library(pivottabler) # Load the pivottabler package.

help(package=pivottabler) # Show the information page.

sessionInfo() # Confirm all attached packages.

R Input

pivottabler::qpvt(CPIIISecLbsGen.df, "Gender", "Section", c("Mean Lbs"="mean(Lbs, na.rm=TRUE)",

"SD Lbs"="sd(Lbs, na.rm=TRUE)"), formats=list("%.0f", "%.1f"))

# Row (Gender) by Column (Section) in text format

R Output

AM PM Total

Mean Lbs SD Lbs Mean Lbs SD Lbs Mean Lbs SD Lbs

Female 122 9.5 125 7.3 124 8.3

Male 134 13.9 155 24.7 143 21.3

Total 128 13.2 135 20.8 132 17.6

R Input

pivottabler::qpvt(CPIIISecLbsGen.df, "Section", "Gender", c("Median Lbs"="median(Lbs, na.rm=TRUE)"),

formats=list("%.0f", "%.1f"))

# Row (Section) by Column (Gender) in text format

# Note how the header Median Lbs does not show in output.

R Output

Female Male Total

AM 123 132 126

PM 124 163 128

Total 124 142 127

Again, there are more than 15,000 external packages available to the R commu- nity. Many more packages and associated functions relating to data exploration, descriptive statistics, and measures of central tendency are available and will be demonstrated in future lessons.

Dalam dokumen Thomas W. MacFarland Jan M. Yates (Halaman 119-127)