OverviewTeaching: TBD min
Exercises: TBD minQuestions
How can I create publication-quality graphics in R?Objectives
To be able to use base plot to generate publication quality graphics.
Plotting our data is one of the best ways to quickly explore it and the various relationships between variables.
Today we’ll be learning about the base package, which implements commonly used plotting tasks without the need for additional packages. Base plotting is very stable, and allows highly customized plotting, but elaborate figures often require elaborate code.
In R, graphs are typically created interactively. We will use the
mtcars data, which are included in R as an example
data.frame. The data set comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. More details about it can be found in the corresponding help file (
#creating a scatterplot plot(mtcars$wt, mtcars$mpg) abline(lm(mtcars$mpg~mtcars$wt)) title("Regression of MPG on Weight")
plot( ) function plots weight vs. miles per gallon into RStudio “Plots” pane. If you are using the R GUI, or R from the command line, a plot window is opened.
The next line of code adds a regression line to this graph. We will learn more about the
lm() function, when we get to the statistics portion of this class.
The final line adds a title.
Saving Graphs You can save the graph in a variety of formats from the RStudio plots pane by using the “Export” button. You can also save the graph programmatically using one of several graphics device functions.
When saving to file using the above functions, you call the device function to specify the output, then execute your plotting commands, and finally close the target fiel with the command
#specify target file and format pdf("plot_wt_mpg.pdf") #make the graph plot(mtcars$wt, mtcars$mpg) abline(lm(mtcars$mpg~mtcars$wt)) title("Regression of MPG on Weight") #now close the graphics device and the target file dev.off()
You can control the dimensions of the output with additional arguments described in
There are many ways to create a scatterplot in R. The basic function is
plot(x, y), where
y are numeric vectors denoting the (x,y) points to plot.
# Simple Scatterplot
plot(mtcars$wt, mtcars$mpg, main="Scatterplot Example", xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19)
Now we used additional arguments to specify the title (
main) and axis labels (
xlim) in the plotting call.
Creating a new graph by issuing a high level plotting command (plot, hist, boxplot, etc.) will typically overwrite a previous graph. To add points or lines to an existing graph use functions like
plot(mtcars$wt, mtcars$mpg, main="Scatterplot Example", xlab="Car Weight ", ylab="Miles Per Gallon ", pch=19) abline(h=20, col="red") # horizontal line at mpg=20 abline(v=4, col="green") # vertical line at wt=4 abline(a=0,b=10, lty=2) # line with the equation y = a + b * x
Above we used arguments like
col="red", which changed the appearance of certain plot elements.
Base R offers a variety of plotting symbols, each identified with a numeric code to the
pch (“plotting character”) argument.
THe same applies to line types (via
lty). R provides 6 line types, and colors.
We can make a handy chart using the plotting functions different symbols
plot(1:25,rep(1,25), pch=1:25, col="black", bg="red") text(x=1:25,y=0.95, labels=1:25) #note that the labels are automatically transformed to character #we can also use arbitrary characters as plotting symbol points(1:8, rep(0.8,8), pch=c(".", "+", "O", "a", "b", "A", "B", "%")) text(1:8, rep(0.75,8), labels=c(".", "+", "O", "a", "b", "A", "B", "%")) #lastly we can add different line types in the top abline(h=1.4,lty=1, col=1) abline(h=1.35,lty=2, col=2) abline(h=1.3,lty=3, col=3) abline(h=1.25,lty=4, col=4) abline(h=1.2,lty=5, col=4) abline(h=1.15,lty=6, col=5)
There are numeric codes for 10 colors, but colors can also be specified by name. The possible names can be displayed with the
colors() function. More about color specification can be found in the guide by Earl Glynn: http://research.stowers-institute.org/efg/R/Color/Chart/index.htm
The plotting range of a single panel can be controlled using the
ylim arguments. By default R will use the ranges of the variables you plot, plus a small buffer on the edges. both arguments take a 2 element numeric vector of the form c(lower_limit,upper_limit).
You can create histograms with the function
x is a numeric vector of values to be plotted. The option
freq=FALSE plots probability densities instead of frequencies. The option
breaks= controls the number of bins.
# Simple Histogram hist(mtcars$mpg)
# Colored Histogram with Different Number of Bins hist(mtcars$mpg, breaks=12, col="red")
Boxplots can be created for individual variables or for variables by group. The format is
boxplot(x, data=), where
x is a formula and
data= denotes the data frame providing the data. An example of a formula is
y~group where a separate boxplot for numeric variable
y is generated for each value of
varwidth=TRUE to make boxplot widths proportional to the square root of the samples sizes. Add
horizontal=TRUE to reverse the axis orientation.
# simple boxplot boxplot(mtcars$mpg)
# Boxplot of MPG by Car Cylinders boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon")
A good resource for basic plots can be found on the Quick-R Graphs pages: http://www.statmethods.net/graphs/index.html
Many, many customisations can be achieved with the
par function. Most notably it is possible to arrange multiple plots on a grid using the
mfcol argument in
Both arguments take a numeric vector of the form c(nr, nc). Subsequent figures will be drawn in an nr-by-nc array on the device by columns (
mfcol), or rows (
par(mfrow=c(1,2)) # simple boxplot boxplot(mtcars$mpg) # Boxplot of MPG by Car Cylinders boxplot(mpg~cyl,data=mtcars, main="Car Milage Data", xlab="Number of Cylinders", ylab="Miles Per Gallon")
par usually persist until you clear the plotting device with the broom icon, or with
graphicsto create plots.