4.B Appendix: Graphs Used in This Book

116 4 Graphs

A scatterplot matrix (splom) does not follow the semantic paradigm of Equa- tion (4.4). It differs from the majority of trellis-based methods in two ways. First, each of the panels is a plot of a different set of variables. Second, each of the panels is based on the entire set of observations.

Subsections4.4and4.7contain extensive discussions of scatterplot matrices. We strongly recommend the use of asplom, sometimes conditioned on values of relevant categorical variables, as an initial step in analyzing a set of data.

2. xyplotcan be used to construct more general matrices of panels, for example with different of sets of variables for the rows and columns of the scatterplot matrix. Figure4.4, for example, shows thatxyplotcan be used to specify a set of variables to define the columns of the matrix and subsets of the observations (specified as different levels of a factor) to define the rows. The formula is essen- tially

sprice ~ beds + drarea + kitarea | CondoHouse

Sets ofxyplots with coordinated subsets of variables can be useful in situations where the number of variables under study is too large to produce a legiblesplom containing all variables on a single page. In such a circumstance we recommend the use of two or more pages ofxyplots to display pairwise relationships among variables.

3. Figure4.22shows several ways to combine multiple variables in one or more panels. The figure shows overlaying plots, concatenating plots, and conditioning panels on the levels of a factor.

4.B.3 Regression Diagnostics

In the regression diagnostics plots (Figure11.6), the panels are defined by conditioning on a set of functions (one for each statistic). This plot displays all common regression diagnostics on a single page. Included are thresholds for flagging cases as unusual along with identification of such cases.

4.B.4 Graphs Requiring Multiple Calls to xyplot

When one of the sets in the Cartesian product is a set of functions, the easiest way to construct the product is to make severalxyplotcalls, one for each function in the set.

1. Partial residual plots (Figure9.10) — [functions of fitted values and residual]

×[variables]. Response against predictors, residuals against predictors, partial

118 4 Graphs AA: y1 ~ x

12 3 45 6

1 2 3 4 5 6

l l

BB: y2 ~ x

23 4 56 7

1 2 3 4 5 6

l l

CC: y3 ~ x

34 5 67 8

1 2 3 4 5 6

l l

AA + BB + CC

2 4 6 8

1 2 3 4 5 6

l l

c(AA, BB, CC)

y1 2468

1 2 3 4 5 6

llllll

1 2 3 4 5 6

llllll

1 2 3 4 5 6

llly3lll

y1 + y2 + y3 ~ x

y1 + y2 + y3

2 4 6 8

1 2 3 4 5 6

l l

y1 + y2 + y3 ~ x, outer=TRUE

y1 + y2 + y3

24 68

1 2 3 4 5 6

llllll

1 2 3 4 5 6

llllll

1 2 3 4 5 6

llly3lll

y1 ~ x | a + b

1 3 5

1 2 3 4 5 6

1 3

l 5 c

Fig. 4.22 Several ways to plot multiple variables simultaneously. The top row shows the

"trellis"objects from three separate calls to thexyplotfunction. The second row shows two ways of combining the"trellis"objects in the top row. On the left they are overlaid into the same panel using thelatticeExtra+.trellisfunction. On the right they are concatenated into a multi-panel"trellis"object by using thelatticeExtrac.trellisfunction. The third row shows two ways of specifying similar displays with a singlexyplotcommand. On the left there are three response variables in the model formula with the default setting that places them into the same panel. On the right theouter=TRUEargument places them into three adjacent panels. The bottom row shows placement of the points into separate panels by specifying the Cartesian product of the levels of the factorsaandbin the conditioning section (following the “|” symbol) of the model formula. The code for these plots is included in the file identified byHHscriptnames(4).

residuals against predictors (partial residual plots), and partial residuals of Y against partial residuals ofX (added variable plots). Each row of Figure9.10 is a different function of fitted values or residual. Each column is either one of the predictor variables or a function of the predictor variables. See the discussion in Section9.13.

2. Analysis of covariance plots (One example is in the set of Figures10.6,10.7,10.8, and10.9. Another example is in Figure14.6) — [models]×[levels]. A key fea- ture of this set of plots is its presentation of all points both superposed into one panel and also segregated into individual panels defined by the levels of a factor.

In this framework, the superposition of all levels of the factor is itself considered a level.

3. ODOFFNA plots (Figure14.17) — [transformation power]×[factors] ×[factors], a 3-dimensional Cartesian product. This is a series of interaction plots indexed by a third variable, the transformation power, all on a single page. Figure 14.17is intended to find a satisfactory power transformation to achieve homo- geneity of variance and then assess interaction among the two factors for the chosen power transformation.

4.B.5 Asymmetric Roles for the Row and Column Sets

1. Interaction plots (Figure12.1) — [factors]×[factors]. Each off-diagonal panel is a standard interaction plot. Panels in transpose positions interchange the trace- andx-factors. Rows are labeled by the trace factor. Columns are labeled by the x-factor. The main diagonal is used for boxplots of the main effects.

2. ARIMA-trellis plots (Figure18.8) — [number of AR parameters]×[number of MA parameters]×[type of display]. Each of the 3×3 displays contains diagnostic information about each of the 9 models indexed by the numbers of autoregressive and moving average parameters pandq. In addition we group several types of display on a single page. This plot displays most commonly used diagnostics for identifying the number of AR and MA parameters in time series models of the ARIMA class.

4.B.6 Rotated Plots

Mean–mean multiple comparisons plots (MMC plots) (Figure7.19) — [means at levels]×[means at levels]. The plot is designed as a crossing of the means of a response variable at the levels of one factor with itself. It is then rotated 45^◦ so the horizontal axis can be interpreted as the differences in mean levels and the vertical axis can be interpreted as the weighted averages of the means comprising each comparison. This class of plots is used to display the results of a multiple comparison procedure.

120 4 Graphs

4.B.7 Squared Residual Plots

The fundamental concept of “least squares” is difficult to present to introductory classes. Here, we illustrate the squares. The sum of their areas is the “sum of squares” that is minimized according to the “least-squares” principle.

Illustrations of 2D and 3D least-squares fits (Figures8.2,9.1, and9.5)—[fitted models]×[methods of displaying residuals]. The rows of Figure 8.2are ways of displaying residuals; the first row shows the residuals as vertical lines, the second as squares. The columns show different models: none, least-squares, and a too- shallow fit.

4.B.8 Adverse Events Dotplot

There are two primary panels in Figure15.13— [factor]×[functions of percents].

The first panel shows the observed percentages on the x-axis. The second panel shows the relative risk with its confidence interval on thexaxis. Both panels have the sameyaxis showing the event names.

4.B.9 Microplots

Microplots (as in Table13.2) are small plots embedded into a table of numbers. The plot often carries as much or more information as the numbers.

4.B.10 Alternate Presentations

We have alternate presentations of existing ideas.

1. Transposed trellis plots are sometimes helpful. In Figure13.13we show a set of boxplots with the response variable on the vertical axis. The vertical orientation places the response variable in the vertical direction and accords with how we have been trained to think of functions—levels of the independent variable along the abscissa and the response variable along the ordinate. In Section13.A we show in Figure13.17the same graphs with the response variable on the horizontal axis.

2. Odds-ratio CI plot (Figure15.10). The odds ratio p₁

q₁

/ p₂

q₂

does not, by construction, give information on both underlyingp₁- andp₂-values.

It is necessary to specify one of them to estimate the other. We backtransform the CI on the odds ratio to a CI on the probability scale and plot the CI ofp₂for all possible values ofp₁. The two axes have the same (0,1) probability scale.

Chapter 5

Dalam dokumen Second Edition (Halaman 146-152)