• Tidak ada hasil yang ditemukan

Conduct a Visual Data Check Using Graphics (e.g., Fig- ures)

Dalam dokumen Thomas W. MacFarland Jan M. Yates (Halaman 96-101)

As desirable as numeric descriptive statistics and measures of central tendency may be, and are therefore often our first thought, to have a full understanding of the data it is necessary to generate graphics to actually see how data are organized. Graphics provide an essential complement to our understanding of the data. In later later lessons, other graphics will be demonstrated, but for initial purposes the graphical functions of primary interest are hist(), plot() and plot(density()), stripchart(), dotchart(), and qqnorm(). Many arguments are available to embellish these graphical figures, but for now the figures will be prepared in simple simple format.

The par(ask=TRUE) function and argument are used to freeze the presenta- tion on the screen, one figure at a time. Note how the top line of the figure, under File—Save as, provides a variety of graphical formats to save each figure, presented in the following order: Metafile, Postscript, PDF, PNG, BMP, TIFF, and JPEG.4

To save space and to also provide a convenient side-by-side view for common figures, look at the way multiple figures are put into one common figure using the par(mfrow=c()) functions. This technique is especially useful and should be considered not only for exploratory figures but also for final output, when appropriate.

R Input

par(ask=TRUE)

par(mfrow=c(2,3)) # 6 figures into a 2 row by 3 column grid

4It is also possible to perform a simple copy and paste against each graphical image or to use R syntax to save a graphical image by using R syntax. These actions are shown in later lessons.

hist(CPIIISecLbsGen.df$Lbs, main="Weight (Lbs): Histogram") plot(CPIIISecLbsGen.df$Lbs, main="Weight (Lbs): Plot")

plot(density(CPIIISecLbsGen.df$Lbs, na.rm=TRUE), # na.rm=TRUE main="Weight (Lbs): Density Plot") # missing data boxplot(CPIIISecLbsGen.df$Lbs,

main="Weight (Lbs): Box Plot") stripchart(CPIIISecLbsGen.df$Lbs,

main="Weight (Lbs): Stripchart") # Stripchart qqnorm(CPIIISecLbsGen.df$Lbs, main="Weight (Lbs): Q-Q Plot")

# Common figures for a numeric-type object variable

Figure 2.1: Multiple visualization of weight

R Input

par(ask=TRUE)

par(mfrow=c(1,2)) # 2 figures into a 1 row by 2 column grid barplot(table(CPIIISecLbsGen.df$Section),

main="Section: Barplot Frequency Distribution",

col=c("black", "red"), ylim=c(0,40)) # Alter color - Y scale barplot(table(CPIIISecLbsGen.df$Gender),

main="Gender: Barplot Frequency Distribution",

col=c("black", "red"), ylim=c(0,40)) # Alter color - Y scale

# Common figures for a factor-type object variable

As an early introduction to the ggplot2::ggplot() function and how it is used to produce breakout details in graphical format, using facets, look at the density of the numeric object variable Lbs for each of the two factor-type object variables (e.g., Section and Gender). Remember that use of the ggplot2::ggplot() function for access to multiple external packages. Typically, select the most local site for where the external packages are located and be patient, waiting for all packages packages to download before attempting their use (Figs.2.1 and 2.2).

Figure 2.2: Bar plots of section and gender

R Input

install.packages("ggplot2", dependencies=TRUE)

library(ggplot2) # Load the ggplot2 package.

help(package=ggplot2) # Show the information page.

sessionInfo() # Confirm all attached packages.

install.packages("ggthemes", dependencies=TRUE)

library(ggthemes) # Load the ggthemes package.

help(package=ggthemes) # Show the information page.

sessionInfo() # Confirm all attached packages.

install.packages("ggmosaic", dependencies=TRUE)

library(ggmosaic) # Load the ggmosaic package.

help(package=ggmosaic) # Show the information page.

sessionInfo() # Confirm all attached packages.

install.packages("gridExtra", dependencies=TRUE)

library(gridExtra) # Load the gridExtra package.

help(package=gridExtra) # Show the information page.

sessionInfo() # Confirm all attached packages.

install.packages("grid", dependencies=TRUE)

library(grid) # Load the grid package.

help(package=grid) # Show the information page.

sessionInfo() # Confirm all attached packages.

install.packages("scales", dependencies=TRUE)

library(scales) # Load the scales package.

help(package=scales) # Show the information page.

sessionInfo() # Confirm all attached packages.

R Input

# Section (two breakouts) by Lbs DensityFacetSectionLbs <-

ggplot2::ggplot(CPIIISecLbsGen.df, aes(x=Lbs)) +

geom_density(col="red", lwd=2) + facet_grid(. ~ Section) +

ggtitle("Section by Weight (Lbs): Facet - ggplot\n") + labs(x = "\nWeight (Lbs)", y = "Density\n") +

scale_x_continuous(labels=scales::comma, limits=c(0,200), breaks=seq(0,200, by=25)) +

theme_bw()

# Generate the figure DensityFacetSectionLbs, but it will

# not show until using the gridExtra::grid.arrange()

# function.

# Gender (two breakouts) by Lbs DensityFacetGenderLbs <-

ggplot2::ggplot(CPIIISecLbsGen.df, aes(x=Lbs)) +

geom_density(col="red", lwd=2) + facet_grid(. ~ Gender) +

ggtitle("Gender by Weight (Lbs): Facet - ggplot\n") + labs(x = "\nWeight (Lbs)", y = "Density\n") +

scale_x_continuous(labels=scales::comma, limits=c(0,200), breaks=seq(0,200, by=25)) +

theme_bw()

# Section (two breakouts) by Lbs BoxplotSectionLbs <-

ggplot(CPIIISecLbsGen.df,

aes(x=Section, y=Lbs, fill=Section)) + geom_boxplot() +

ggtitle("Section by Weight (Lbs): Boxplot\n") + labs(x = "\nSection", y = "Weight (Lbs)\n") +

scale_y_continuous(labels=scales::comma, limits=c(100,200), breaks=seq(100,200, by=25)) +

theme_bw()

# Gender (two breakouts) by Lbs BoxplotGenderLbs <-

ggplot(CPIIISecLbsGen.df,

aes(x=Gender, y=Lbs, fill=Gender)) + geom_boxplot() +

ggtitle("Gender by Weight (Lbs): Boxplot\n") + labs(x = "\nGender", y = "Weight (Lbs)\n") +

scale_y_continuous(labels=scales::comma, limits=c(100,200), breaks=seq(100,200, by=25)) +

theme_bw()

Now that the figures (e.g.,DensityFacetSectionLbs,DensityFacetGenderLbs, BoxplotSectionLbs, and BoxplotGenderLbs) have been generated as sepa- rate objects, put these four objects into one common figure, using the gridEx- tra::grid.arrange() function (Fig.2.3).

R Input

gridExtra::grid.arrange(

DensityFacetSectionLbs, DensityFacetGenderLbs, BoxplotSectionLbs,

BoxplotGenderLbs, ncol=2)

Figure 2.3: Multiple visualizations of weight by section and gender By using the gridExtra::grid.arrange() function to put selected ggplot-based objects into one common figure, it is fairly easy to make meaningful side-by-side comparisons of graphical output. This data exploration technique increases the ability to make meaningful comparisons and supports better decision-making on how to continue with analyses.

These initial graphics are simple and currently have only a few embellishments.

They only serve as a first guide to general trends in data organization. Embel- lishments to the graphics will be introduced in later lessons by demonstrating the many arguments used to present titles, prepare text and lines in bold and color, etc.

Dalam dokumen Thomas W. MacFarland Jan M. Yates (Halaman 96-101)