ylab="Count")
hist(BirthWeightGrams,
breaks=50, # Granularity of output main="Distribution of Infant Birth Weight (g)", col="red", # Color(s)
xlab="Infant Birth Weight (g)", ylab="Count",
font.axis=2, # Make the axis bold font.lab=2) # Make the labels bold hist(BirthWeightGrams,
breaks=50, # Granularity of output main="Distribution of Infant Birth Weight (g)", col="red", # Color(s)
xlab="Infant Birth Weight (g)", ylab="Count",
font.axis=2, # Make the axis bold font.lab=2, # Make the labels bold cex.main=1.75, # Size - main (title) cex.lab=1.25, # Size - labels
xlim=c(0,6000), # Adjust X axis limits ylim=c(0,1000)) # Adjust Y axis limits
Figure 1.10: Multiple histograms of birth weight
Going beyond the immediate purpose of this lesson, look below at the way R, and more specifically the R packages ggplot2, ggthemes, ggmosaic, gridExtra, grid, and scales are used to produce figures of the highest quality. As previously mentioned at the beginning of this lesson, for now merely give attention to process. The actual R-based syntax will be highlighted in far more detail in later lessons.
There are more than a few datasets currently available in this active R session, as can be demonstrated by using either the ls() function or the objects() func- tion, with the datasets ranging in alphabetical order fromBreedMilk.dfto the temporary object called X.
R Input
ls() objects()
In this addendum, the package ggplot2 and supporting packages are used to produce a variety of figures using the dataframes currently available. Recall that these packages are external to what is available when R is first downloaded and that it is necessary to actively download these packages, to take advantage of their specialized functionality.
R Input
install.packages("ggplot2", dependencies=TRUE)
library(ggplot2) # Load the ggplot2 package.
help(package=ggplot2) # Show the information page.
sessionInfo() # Confirm all attached packages.
install.packages("ggthemes", dependencies=TRUE)
library(ggthemes) # Load the ggthemes package.
help(package=ggthemes) # Show the information page.
sessionInfo() # Confirm all attached packages.
install.packages("ggmosaic", dependencies=TRUE)
library(ggmosaic) # Load the ggmosaic package.
help(package=ggmosaic) # Show the information page.
sessionInfo() # Confirm all attached packages.
install.packages("gridExtra", dependencies=TRUE)
library(gridExtra) # Load the gridExtra package.
help(package=gridExtra) # Show the information page.
sessionInfo() # Confirm all attached packages.
install.packages("grid", dependencies=TRUE)
library(grid) # Load the grid package.
help(package=grid) # Show the information page.
sessionInfo() # Confirm all attached packages.
install.packages("scales", dependencies=TRUE)
library(scales) # Load the scales package.
help(package=scales) # Show the information page.
sessionInfo() # Confirm all attached packages.
Using these specialized packages and the R-based functions available from these packages, look at the way increasingly complex figures are prepared, each adding more detail than the preceding figure. To achieve the aim ofBeautiful Graphics, use the Sorghum.df dataset and in progression focus on how sorghum yield (Bushels per Acre) is presented, addressing different factor-type object variables, different graphical displays (e.g., density plot, boxplot), and different ggthemes- based presentations (e.g., theme_few, theme_stata, etc.). Once again, the syntax will be explained in detail in later lessons. Focus now on the many possibilities available with R (Fig.1.11).
R Input
ggplotSorghum1 <-
ggplot2::ggplot(Sorghum.df, aes(x=BUperAcre)) +
geom_density(alpha = 0.5) +
ggtitle("Sorghum Yield -- ggplot Simple") + xlab("Yield") +
ylab("Density\n") +
theme_few() + # Review the ggthemes package theme(legend.position="none") # No legend
# \n is used to force a line break ggplotSorghum2 <-
ggplot2::ggplot(Sorghum.df,
aes(x=Management, y=BUperAcre, fill=Management)) + geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=1, size=6, col="black") +
# Add a circle to represent the mean, along with the median
# which shows as the solid line facet_grid( ~ as.factor(Year)) +
ggtitle("Sorghum Yield -- ggplot Complex") +
xlab("Management") + ylab("Yield") +
scale_y_continuous(labels=scales::comma, limits=c(0,104), breaks=scales::pretty_breaks(n = 3)) +
theme_stata() + # Review the ggthemes package theme(legend.position="none") # No legend
# \n is used to force a line break par(ask=TRUE); gridExtra::grid.arrange(
ggplotSorghum1,
ggplotSorghum2, ncol=1)
Figure 1.11: ggplot2 demonstration 1—simple to complex
As a final demonstration of the ggplot2 package and how it is used to produce figures acceptable for professional publication, revisit an earlier section in this lesson and review Fuel Economy Data for 2000–2017 (http://www.fueleconomy.
gov/feg/download.shtml) and the prior resource FuelMPG2008.df, which fo- cused on fuel consumption of different cars in 2008, when new regulations about fuel economy were considered. For this last figure, look at the difference in fuel economy (e.g., MPG—Miles per Gallon) by the number of cylinders.
Using this figure only, is it reasonable to say that fuel economy decreases as the number of cylinders increases? In a later lesson on correlation and association, this type of question will be explored more closely, where different statistical tests will be used to provide a more finite answer. Different statistical tests will also be used to address issues such as, from this dataset, fuel economy between city driving v highway driving (e.g., City.MPG v Hwy.MPG). For now, however, merely focus on the graphical presentation gained by using the ggplot2::ggplot() function (Fig.1.12).
R Input
par(ask=TRUE)
ggplot2::ggplot(FuelMPG2008.df, aes(x=Cyl, y=Hwy.MPG, fill=Cyl)) + geom_boxplot() +
stat_summary(fun.y=mean, geom="point", shape=1, size=6, col="black") +
# Add a circle to represent the mean, along with the median
# which shows as the solid line ggtitle(
"Highway Miles per Gallon of 2008 Vehicles by the Number of Cylinders: Side-by-Side Box Plots With a Superimposed Mean as a Circle\n") +
xlab("\nCylinders") +
ylab("Highway Miles per Gallon (MPG)\n") +
scale_y_continuous(labels=scales::comma, limits=c(20,46), breaks=scales::pretty_breaks(n = 5)) +
theme_economist_white(base_size=12, base_family="sans", gray_bg=FALSE, horizontal=TRUE) +
theme(legend.position="none") # No legend
# \n is used to force a line break
Figure 1.12: ggplot2 demonstration 2—complex
Never underestimate the persuasive power of a well-prepared figure and how graphics gain immediate attention. A properly designed figure will receive far more attention than the attention given to what many see as sterile numeric- based statistical output. Statistical tests are certainly important and are em- phasized throughout this text, but again do not overlook the value of quality graphics.