There are more than 15,000 external R-based packages available for immediate download, without required permissions and without direct cost. From among these there are thousands of specialized functions to supplement the set of functions available when R is initially downloaded. A few specialized functions specific to specific to descriptive statistics and measures of central tendency are demonstrated below.
Some specialized functions provide not only numerical statistics of immediate use, but they also concurrently provide a graphical image, to further reinforce the organization of data in question. Function arguments are typically used to embellish graphical output, but in this lesson function arguments have been kept to a minimum. More details, available through the use of function arguments, are demonstrated in later lessons.
R Input
install.packages("asbio")
library(asbio) # Load the asbio package.
help(package=asbio) # Show the information page.
sessionInfo() # Confirm all attached packages.
R Input
asbio::Mode(CPIIISecLbsGen.df$Lbs)
# The modes::modes() function was demonstrated previously.
# The asbio::Mode() function is merely another way to obtain
# this measure of central tendency.
R Output
[1] 114 122
The epiDisplay package is quite versatile, with many functions that support frequency distributions, descriptive statistics, and measures of central tendency.
Some epiDisplay functions provide not only attractive graphics but also useful statistics in text format printed to the screen (Figs.2.5,2.6 and 2.7) .
R Input
install.packages("epiDisplay")
library(epiDisplay) # Load the epiDisplay package.
help(package=epiDisplay) # Show the information page.
sessionInfo() # Confirm all attached packages.
R Input
par(ask=TRUE)
par(mfrow=c(1,2)) # 2 figures into a 1 row by 2 column grid epiDisplay::tab1(CPIIISecLbsGen.df$Section,
main="Frequency Distribution of Section", col=c("red", "blue"), font.lab=2, font.axis=2) epiDisplay::tab1(CPIIISecLbsGen.df$Gender,
main="Frequency Distribution of Gender",
col=c("red", "blue"), font.lab=2, font.axis=2)
# Generate frequency distributions and percentages
# of by breakouts. Generate a figure of frequency
# distributions.
R Input
par(ask=TRUE)
par(mfrow=c(1,2)) # 2 figures into a 1 row by 2 column grid epiDisplay::tabpct(CPIIISecLbsGen.df$Section,
CPIIISecLbsGen.df$Gender, graph=TRUE, decimal=1, main="Frequency Distribution of Section by Gender", xlab = "Section", ylab = "Gender", cex.axis=1, percent= "both", las=1, col=c("red", "blue")) epiDisplay::tabpct(CPIIISecLbsGen.df$Gender,
CPIIISecLbsGen.df$Section, graph=TRUE, decimal=1, main="Frequency Distribution of Gender by Section", xlab = "Gender", ylab = "Section", cex.axis=1, percent= "both", las=1, col=c("red", "blue"))
# Along with the figure that appears as a stacked bar
# chart, look at the detailed frequency distribution
# statistics printed to the screen, which can be
# copied and pasted into a word-processed document.
Figure 2.5: Section and gender: frequency distribution overall
Figure 2.6: Section and gender: frequency distribution breakouts
R Input
par(ask=TRUE)
par(mfrow=c(1,3)) # 3 figures into a 1 row by 3 column grid epiDisplay::summ(CPIIISecLbsGen.df$Lbs,
by=NULL, # No breakout statistics graph=TRUE, box=TRUE, # Dotplot and boxplot pch=20, ylab="auto",
main="Sorted Dotplot of Weight (Lbs), Overall", cex.X.axis=1.25, # Note X axis label size.
cex.Y.axis=1.25, # Note Y axis label size.
font.lab=2, dot.col="auto")
epiDisplay::summ(CPIIISecLbsGen.df$Lbs,
by=Section, # Breakout statistics graph=TRUE, # Dotplot
pch=20, ylab="auto",
main="Sorted Dotplot of Weight (Lbs) by Section", cex.X.axis=1.25, # Note X axis label size.
cex.Y.axis=1.25, # Note Y axis label size.
font.lab=2, dot.col="auto")
epiDisplay::summ(CPIIISecLbsGen.df$Lbs,
by=Gender, # Breakout statistics graph=TRUE, # Dotplot
pch=20, ylab="auto",
main="Sorted Dotplot of Weight (Lbs) by Gender", cex.X.axis=1.25, # Note X axis label size.
cex.Y.axis=1.25, # Note Y axis label size.
font.lab=2, dot.col="auto")
# Produce a sorted dotplot and accompanying descriptive
# statistics: by=NULL (e.g., overall), by=Section, and
# by=Gender. Do not confuse a sorted dotplot with a QQ plot.
# The two figures represent different constructs.
The s20x package is also quite good for generating detailed output of descriptive statistics, overall, and by breakout groups of factor-type object variables.
R Input
install.packages("s20x", dependencies=TRUE)
library(s20x) # Load the s20x package.
help(package=s20x) # Show the information page.
sessionInfo() # Confirm all attached packages.
Figure 2.7: Weight by section and gender breakouts
R Input
s20x::summaryStats(CPIIISecLbsGen.df$Lbs, na.rm=TRUE) # Accommodate missing values.
R Output
Minimum value: 99
Maximum value: 192
Mean value: 131.73
Median: 127
Upper quartile: 139.25 Lower quartile: 121
Variance: 309.39
Standard deviation: 17.59 Midspread (IQR): 18.25
Skewness: 1.08
Number of data values: 61 Number of missing values: 1
R Input
s20x::summaryStats(Lbs ~ Gender, CPIIISecLbsGen.df, na.rm=TRUE) # Accommodate missing values
R Output
Sample Size No. Miss. Mean Median Std Dev Midspread
Female 35 0 123.971 124 8.27997 9
Male 25 1 142.600 142 21.27401 32
R Input
s20x::summaryStats(Lbs ~ Section, CPIIISecLbsGen.df, na.rm=TRUE) # Accommodate missing values
R Output
Sample Size No. Miss. Mean Median Std Dev Midspread
AM 30 1 128.300 126 13.1520 16.25
PM 30 0 135.167 128 20.7864 21.25
The s20x::rowdistr() function, when wrapped around the crosstabs() function, produces an attractive frequency distribution barchart of factor-type object variables. It also generates text-based descriptive statistics to the screen that add further insight into the data (Figs.2.8 and 2.9) .
R Input
par(ask=TRUE); s20x::rowdistr(crosstabs(~ Gender + Lbs, data=CPIIISecLbsGen.df), plot=TRUE, suppressText=FALSE, comp=’basic’)
Figure 2.8: Graphical representation of weight by gender
R Input
par(ask=TRUE); s20x::rowdistr(crosstabs(~ Section + Lbs, data=CPIIISecLbsGen.df), plot=TRUE, suppressText=FALSE, comp=’basic’)
Figure 2.9: Graphical representation of weight by section
The arsenal::tableby() function, with the syntax shown below (but not the out- put, due to overly long length), builds on the presentation of breakout de- scriptive statistics. With this function, all of the breakouts for two or more factor-type object variables can be presented in one convenient table. A p- value is included in the printout for both breakouts of Gender and breakouts of Section. For now, ignore the p-value statistics but know that it will be ex- plained in far more detail in later lessons. The focus in this beginning lesson, now, is merely on descriptive statistics and measures of central tendency, not inferential analyses.
R Input
install.packages("arsenal", dependencies=TRUE)
library(arsenal) # Load the arsenal package.
help(package=arsenal) # Show the information page.
sessionInfo() # Confirm all attached packages.
R Input
summary(arsenal::tableby(list(Section, Gender) ~ Lbs, data = CPIIISecLbsGen.df), text=TRUE, total=TRUE)
# Produce a highly-detailed table of descriptive
# statistics, especially: mean, sd, and range.
The function pivottabler::qpvt() puts output into text format, as shown be- low. If desired, copy the syntax shown below but use the pivottabler::qhpvt() function to put output into HTML format. For those with advanced skills, the pivottabler::qlpvt() function generates output suitable for a LATEXdocument.
R Input
install.packages("pivottabler", dependencies=TRUE)
library(pivottabler) # Load the pivottabler package.
help(package=pivottabler) # Show the information page.
sessionInfo() # Confirm all attached packages.
R Input
pivottabler::qpvt(CPIIISecLbsGen.df, "Gender", "Section", c("Mean Lbs"="mean(Lbs, na.rm=TRUE)",
"SD Lbs"="sd(Lbs, na.rm=TRUE)"), formats=list("%.0f", "%.1f"))
# Row (Gender) by Column (Section) in text format
R Output
AM PM Total
Mean Lbs SD Lbs Mean Lbs SD Lbs Mean Lbs SD Lbs
Female 122 9.5 125 7.3 124 8.3
Male 134 13.9 155 24.7 143 21.3
Total 128 13.2 135 20.8 132 17.6
R Input
pivottabler::qpvt(CPIIISecLbsGen.df, "Section", "Gender", c("Median Lbs"="median(Lbs, na.rm=TRUE)"),
formats=list("%.0f", "%.1f"))
# Row (Section) by Column (Gender) in text format
# Note how the header Median Lbs does not show in output.
R Output
Female Male Total
AM 123 132 126
PM 124 163 128
Total 124 142 127
Again, there are more than 15,000 external packages available to the R commu- nity. Many more packages and associated functions relating to data exploration, descriptive statistics, and measures of central tendency are available and will be demonstrated in future lessons.