How to process the data
2.8 R graphics
2.8.1 Graphical systems
One of the most valuable part of every statistical software is the ability to make di- verse plots. Rsets here almost a record. In the base, default installation, several dozens of plot types are already present, more are from recommendedlatticepack- age, and much more are in the external packages from CRAN where more than a half of them (several thousands!) is able to produce at least one unique type of plot.
Therefore, there are several thousands plot types inR. But this is not all. All these plots could be enhanced by user! Here we will try to describe fundamental principles ofRgraphics.
Let us look on this example (Fig.2.3):
18OnWindowsandmacOS, this will open internal editor; onLinux, it is better to seteditoroption manually, e.g.,file.edit("hello.r", editor="geany").
> plot(1:20, main="Title")
> legend("topleft", pch=1, legend="My wonderful points")
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5 10 15 20
5101520
Title
Index
1:20
● My wonderful points
Figure 2.3: Example of the plot with title and legend.
(Curious reader will find here many things to experiment with. What, for example, ispch? Change its number in the second row andfind out. What if you supply20:1 instead of1:20? Pleasediscoverand explain.)
Commandplot()drawsthe basic plot whereas thelegend()addssome details to the already drawn output. These commands represent two basic types ofRplotting commands:
1. high-level commands whichcreatenew plot, and
2. low-level commands whichadd featuresto the existing plot.
Consider the following example:
> plot(1:20, type="n")
> mtext("Title", line=1.5, font=2)
> points(1:20)
> legend("topleft", pch=1, legend="My wonderful points")
(These commands make almostthe same plot as above! Why? Pleasefind out. And what is different?)
Note also thattypeargument of theplot()command has many values, and some produce interesting and potentially useful output. To know more,try p,l,c,s,h andbtypes; check also whatexample(plot)shows.
Naturally, the most important plotting command is theplot(). This is a “smart”
command19. It means thatplot()“understands” the type of the supplied object, and draws accordingly. For example,1:20is a sequence of numbers (numeric vec- tor, see below for more explanation), andplot()“knows” that it requires dots with coordinates corresponding to their indices (xaxis) and actual values (yaxis). If you supply to theplot()something else, the result most likely would be different. Here is an example (Fig.2.4):
> plot(cars)
> title(main="Cars from 1920s")
Here commands of both types are here again, but they were issued in a slightly dif- ferent way. cars is an embedded dataset (you may want to call ?cars which give you more information). This data is not a vector butdata frame(sort of table) with two columns, speedanddistance (actually, stopping distance). Functionplot() chooses thescatterplotas a best way to represent this kind of data. On that scatter- plot,xaxis corresponds with the first column, andyaxis—with the second.
We recommend tocheckwhat will happen if you supply the data frame with three columns (e.g., embeddedtreesdata) or contingency table (like embeddedTitanic orHairEyeColordata) to theplot().
There are innumerable ways to alter the plot. For example, this is a bit more fancy
“twenty points”:
> plot(1:20, pch=3, col=6, main="Title")
(Pleaserun this exampleyourself. What arecolandpch? What will happen if you setpch=0? If you setcol=0? Why?)
19The better term isgeneric command.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
5 10 15 20 25
020406080100120
speed
dist
Cars from 1920s
Figure 2.4: Example of plot showingcarsdata.
* * *
Sometimes, defaultRplots are considered to be “too laconic”. This is simply wrong.
Plotting system inRis inherited fromSwhere it was thoroughly developed on the base of systematic research made by W.S. Cleveland and others in Bell Labs. There were many experiments20. For example, in order to understand which plot types are easier to catch, they presented different plots and then asked to reproduce data numerically. The research resulted in recommendations of how to make graphic output more understandable and easy to read (please note that it is not always “more attractive”!)
20Cleveland W. S., McGill R. 1985. Graphical perception and graphical methods for analyzing scientific data. Science. 229(4716): 828–833.
In particular, they ended up with the conclusion that elementary graphical percep- tion tasks should be arranged from easiest to hardest like: position along a scale→ length→angle and slope→area→volume→color hue, color saturation and den- sity. So it is easy tolie with statistics, if your plot employs perception tasks mostly from the right site of this sequence. (Do you see now why pie charts are particularly bad? This is the reason why they often called “chartjunk”.)
They applied this paradigm to Sand consequently, inR almost everything (point shapes, colors, axes labels, plotting size) in default plots is based on the idea of in- telligible graphics. Moreover, even the order of point and color types represents the sequence from the most easily perceived to less easily perceived features.
Look on the plot from Fig.2.5. Guess how was it done, which commands were used?
* * *
Many packages extend the graphical capacities ofR. Second well-knownRgraphical subsystem comes from thelatticepackage (Fig.2.6):
> library(lattice)
> xyplot(1:20 ~ 1:20, main="title")
(We repeated1:20twice and added tilde becausexyplot()works slightly differently from theplot(). By default,latticeshould be already installed in your system21.) Packagelatticeis by default already installed on your system. To know which pack- ages are already installed, typelibrary().
Next, below is what will happen with the same1:20data if we apply functionqplot() from the third popularRgraphic subsystem,ggplot222package (Fig.2.7):
> library(ggplot2)
> qplot(1:20, 1:20, main="title")
* * *
We already mentioned above thatlibrary()command loads the package. But what if this package is absent in your installation?ggplot2is not installed by default.
21latticecame out of later ideas of W.S. Cleveland,trellis(conditional) plots (see below for more examples).
22ggplot2is now the most fashionableRgraphic system. Note, however, that it is based on the dif- ferent “ideology” which related more withSYSTATvisual statistic software and therefore is alien to R.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5 10 15 20
5101520
Figure 2.5: Exercise: which commands were used to make this plot?
In that case, you will need to download it from InternetRarchive (CRAN) and install.
This could be done withinstall.packages("ggplot2")command (note plural in the command name and quotes in argument). During installation, you will be asked first about preferable Internet mirror (it is usually good idea to choose the first).
Then, you may be asked about local or system-wide installation (local one works in most cases).
Finally,RforWindowsormacOSwill simplyunpackthe downloaded archive whereas RonLinux will compilethe package from source. This takes a bit more time and also could require some additional software to be installed. Actually, some packages want additional software regardless to the system.
title
1:20
1:20
5 10 15 20
5 10 15 20
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Figure 2.6: Example of plot with a title made withxyplot()command fromlattice package.
It is also useful to know how to do reverse operations. If you want to remove (unload) the package fromRmemory, usedetach(package:...). If you want to remove the package from disk, useremove.packages("..."). Finally, if you want to use the package command only once, usepackage::command(...)
Maximal length and maximal width of birds’ eggs are likely related. Please make a plot fromeggs.txtdata and confirm (or deny) this hypothesis. Explanations of characters are in companioneggs_c.txtfile.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
5 10 15 20
5 10 15 20
1:20
1:20
title
Figure 2.7: Example of plot with a title made withqplot()command fromggplot2 package.