4.A Appendix: R Graphics - Second Edition

112 4 Graphs Cartesian product differ for each graph type. For example, a set can be a collection of variables, functions of a single variable, levels of a single factor, functions of a fitted model, different models, etc.

When constructing a graph that can be envisioned as a Cartesian product, it is necessary that the code writer be aware of the Cartesian product relationship. The latticecode for such a graph includes a command that explicitly states the Cartesian product.

4.A.2 Trellis Paradigm

Most of the graphs in this book have been constructed using the trellis paradigm as implemented inlattice. The trellis system of graphics is based on the paradigm of repeating the same graphical specifications for each element in a Cartesian product of levels of one or more factors.

The majority of the methods supplied in theRlatticepackage are based on a typical formula having the structure

y ~ x | a * b (4.4)

where

yis either continuous or factor xis continuous

ais factor bis factor

and each panel, as defined by the Cartesian product of the levels ofaandb, is a plot ofy ~ xfor the subset of the data with the stated values ofaandb.

4.A.3 Implementation of Trellis Graphics

The concept of trellis plots can be implemented in any graphics system. In the S family of languages (S-PlusandR), selection of the set of panels, assignment of individual observations to one panel in the set, and coordinated scaling across all panels are automated in response to a formula specification in the user level.

The termtrelliscomes from gardening, where it describes an open structure used as a support for vines. In graphics, a trellis provides a framework in which related graphs can be placed. The termlatticehas a similar meaning.

4.A.4 Coordinating Sets of Related Graphs

There are several graphical issues that needed attention in any multipanel graph. See Figure10.8for an example illustrating these issues.

positioning: The panels containing marginal displays (if any) need to be clearly delineated as distinct from the panels containing data from just a single set of levels of the factors. We do this by placing extra space between the set of panels for the individual factor values and the panels containing marginal displays.

scaling: All panels need to be on exactly the same scale to enhance the reader’s ability to compare the panels visually. We use the automatic scaling feature of trellis plots to scale simultaneously both the individual panels and the marginal panels.

labeling: We indicate the marginal panels by use of the strip labels.

shape of plotting characters: We used three distinct plotting characters for the three-level factor.

color of plotting characters: We used three contrasting colors for the three-level factor. The choice to use both distinct plotting characters and distinct colors is redundant (reemphasizing the difference between levels), accessible (making the graph work for people with color vision deficiencies), and defensive (protecting the interpretability of the graph from black-and-white copying by a reader).

There are several packages inRthat address color selection. TheRColorBrewer package (Neuwirth,2011), based on the ColorBrewer website (Brewer,2002), gives a discussion on the principles of color choice and gives a series of palettes for distinguishing nominal sets of items or sequences of items. Thecolorspace package (Ihaka et al.,2013) provides qualitative, sequential, and diverging color palettes based on HCL colors.

4.A.5 Cartesian Product of Model Parameters

Figure10.12displays four different models of a response variable as a function of a factor and a continuous covariate. The model in the center row and right column is the same model shown in Figure10.8. The models are shown as a Cartesian product of model parameters. The models in the columns of Figure10.12are distinguished by the absence or presence of a parameter forType—forcing a common intercept in the left column and allowing different intercepts byType in the right column.

114 4 Graphs The three rows are distinguished by how the covariateCaloriesis handled: sepa- rate slopes byTypein the top row, constant slope for allTypes in the middle row, or identically zero slope (horizontal line) in the bottom row.

Figure10.12is structured as a set ofsmall multiples, a term introduced by Tufte Tufte (2001) to indicate repetition of the same graphical design structure. “Small multiples are economical: once viewers understand the design of one slice, they have immediate access to the data in all other slices. Thus, as the eye moves from one slice to the next, the constancy of the design allows the viewer to focus on changes in the data rather than on changes in graphical design (Tufte (2001), page 48).” Figure10.12may be interpreted as a four-way Cartesian product: slope (αvs α_i), intercept (β = 0, β,β_j), individual panels vs superpose, hotdog Type (Beef, Meat, Poultry) with a an ordinary two-way scatterplot with a fitted line inside each element of the four-way product.

4.A.6 Examples of Cartesian Products

1. In the plots illustrating lack of homogeneity of variance (Figure6.6), one of the sets in the Cartesian product is the function of the data represented (observed data, median-centered data, absolute value of the median-centered data). The other set is the levels of thecatalystfactor. We discuss in Section 6.10the Brown–Forsyth test for variance homogeneity.

2. In the logistic regression plots (Figure17.12) there are several sets used to define the Cartesian products. The rows of the array are functions of the fitted probability. The columns of the array are the levels of one of the factors (X-ray) with a marginal value ofX-rayin the left-most column. The individual lines within the panels, as identified in the legend, are levels of theX.ray×stage×gradeinter- action. This is an ordinaryxyplotof the predicted response variable displayed on three scales—the logit scale, the odds scale, and the probability scale—against one of the predictor variablesacid.ph.

3. In the ladder-of-power plots (Figure4.17) the rows of the array are powers ofy and the columns are powers ofx. This plot is useful in a regression context for de- termining the optimal power transformations of both the response and predictor variables.

4. Figure4.21shows the ability to control the position and color of boxplots. This simulated example shows the results of a clinical trial where the patients’ fol- lowup visits were scheduled with nonconstant intervals between visits. Here, the boxes for both treatment levels are grouped byweekand the weeks are correctly spaced. The default positioning forbwplotplaces the boxes evenly spaced, hon- oring neither theweeknor thetreatmentfactor.

Week Y

10 20 30 40 50 60

1 2 4 8

Treatment

A B

Fig. 4.21 The response to treatments A and B was measured at weeks 1, 2, 4, and 8. The boxplots have been positioned at distances illustrating the time difference and with A and B adjacent at each time point.

5. Mosaic plots (Figure 15.11and other figures in Chapter 15) as constructed as Cartesian products of several factors.

6. Diverging stacked bar charts as used in displays of Likert scale data (Figure15.14 and others in Section15.9are a crossing of a set of questions (possibly nested in another factor) with a set of potential responses.

4.A.7 latticeExtra—Extra Graphical Utilities Based on Lattice

ThelatticeExtraprovides many functions for combining independently constructed latticeplots and for controlling the size and placement of arrays oflatticeplots. We use these functions in many of our graphs. Themmcplot(Figure7.18and elsewhere in the book) is built by constructing the two panels independently and then combining them with thelatticeExtra:::c.trellisfunction. Many of our plots are constructed by overlaying two independently drawn graphs with thelayerfunction or with thelatticeExtra:::‘+.trellis‘as illustrated in Figure4.22.

116 4 Graphs

Dalam dokumen Second Edition (Halaman 141-146)