\chapter{Graphs and Maps}
\label{chap:graphsandmaps}
\SweaveOpts{keep.source=TRUE, pdf=FALSE, prefix.string=Chap05, grdevice=tikz.Swd}
<<options and clear workspace, echo=FALSE, eval=TRUE>>=
options(digits=3, show.signif.stars=FALSE, width=53)
rm(list=ls())
require(tikzDevice)
source("../SweaveTikZ.R")
@
%data files: read.table("SOI.txt", T); read.table("NAO.txt", T); read.table("SST.txt", T); read.table("ATL.txt", T); read.table("H.txt", T); read.table("Ivan.txt", T); read.table("LMI.txt", T); read.table("FLPop.txt", T); read.table("JulySST2005.txt", T)
%packages: lubridate, maps, classInt, maptools, rgdal, mapdata, ggplot2
%source code: NONE
%third party: NONE

\begin{quote} ``The ideal situation occurs when the things that we regard as beautiful are also regarded by other people as useful.''
\end{quote}
\indent---Donald Knuth\\

Graphs and maps help you reason about your data.  They also help you communicate your results.  A good graph gives you the most information in the shortest time, with the least ink in the smallest space \citep{Tufte1997}.  In this chapter we show you how to make graphs and maps using R.

A good strategy is to follow along with an open session, typing (or copying) the commands as you read.  Before you begin make sure you have the following data sets available in your working directory.  This is done by typing
<<read data, echo=TRUE, eval=TRUE>>=
SOI = read.table("SOI.txt", header=TRUE)
NAO = read.table("NAO.txt", header=TRUE)
SST = read.table("SST.txt", header=TRUE)
A = read.table("ATL.txt", header=TRUE)
US = read.table("H.txt", header=TRUE)
@
We begin with graphs.  Not all the code is shown but all is available on our website.

\section{Graphs}
\label{sec:graphs}

It's easy to make a graph.  Here we provide guidance to help you make an informative graph. It is a tutorial on how to create publishable figures from your data.  In R you have a few choices.  With the standard (base) graphics environment you can produce a variety of plots with fine details.  Most of the figures in this book are created using the standard graphics environment. The grid graphics environment is more flexible.  It allows you to design complex layouts with nested graphs where scaling is maintained when you resize the figure.  The {\bf lattice} and {\bf ggplot2} packages use grid graphics to create specialized graphing functions and methods.  The \verb@spplot@ function is an example of a plot method built with grid graphics that we use in this book to create maps of our spatial data.  The {\bf ggplot2} package is an implementation of the grammar of graphics combining advantages from the standard and lattice graphic environments.  We begin with the base graphics environment.

\subsection{Box plot}
\label{subsec:boxplot}

A box plot is a graph of the five-number summary.  When applied to a set of observations, the \verb@summary@ function produces the sample mean along with five other statistics including the minimum, the first quartile value, the median, the third quartile value, and the maximum.  The box plot graphs these numbers.  This is done using the \verb@boxplot@ function.  For example, to create a box plot of your October SOI data, type
<<create box plot, fig=FALSE, echo=TRUE, eval=FALSE>>=
boxplot(SOI$Oct, ylab="October SOI (s.d.)")
@
\begin{figure}
\centering
<<soiboxplot, fig=TRUE, echo=FALSE, eval=TRUE, width=4.5, height=3>>=
par(mfrow=c(1, 2), las=1, pty="s", mgp=c(2, .4, 0), tcl=-.3)
boxplot(SOI$Oct, ylab="October SOI [s.d.]")
mtext("a", side=3, line=1, adj=0, cex=1.1)
boxplot(SOI$Aug, ylab="August SOI [s.d.]")
mtext("b", side=3, line=1, adj=0,cex=1.1)
f = fivenum(SOI$Aug)
text(rep(1.25, 5), f, labels=c("minimum", "lower", "median", "upper",
  "maximum"), adj=c(0, 0), cex=.5)
@
\vspace{-2cm}
\caption{Box plot of the October SOI.}
\label{fig:boxplot}
\end{figure}
Figure~\ref{fig:boxplot} shows the results.  The line inside the box is the median value.   The bottom of the box (lower hinge) is the first quartile value and the top of the box (upper hinge) is  the third quartile.  The vertical line (whisker) from the top of the box extends to the maximum value and the vertical line from the bottom of the box extends to the minimum value.

Hinge values equal the quartiles exactly when there is an odd number of observations.  Otherwise hinges are the middle value of the lower (or upper) half of the observations if there is an odd number of observations below the median and are the middle of two values if there is an even number of observations below the median.   The \verb@fivenum@ function gives the five numbers used by \verb@boxplot@.  The height of the box is essentially the interquartile range (\verb@IQR@) and the range is the distance from the bottom of the lower whisker to the top of the upper whisker.

By default the whiskers are drawn as a dashed line extending from the box to the minimum and maximum data values.  Convention is to make the length of the whiskers no longer than 1.5 times the height of the box.  The outliers, data values larger or smaller than this range, are marked separately with points. Figure~\ref{fig:boxplot} also shows the box plot for the August SOI values.  The text identifies the values.  Here with the default options we see one data value greater than 1.5 times the interquartile range.  In this case the upper whisker extends to the last data value less than 1.5 $\times$ IQR.

For example, if you type
<<five numbers soi, echo=TRUE, eval=TRUE>>=
Q1 = fivenum(SOI$Aug)[2]
Q2 = fivenum(SOI$Aug)[3]
Q3 = fivenum(SOI$Aug)[4]
Q2 + (Q3 - Q1) * 1.5
@
you see one observation greater than \Sexpr{round(Q2+(Q3-Q1)*1.5,1)}.  In this case, the upper whisker ends at the next highest observation value less than \Sexpr{round(Q2+(Q3-Q1)*1.5,1)}.  Observations above and below the whiskers are considered outliers. You can find the value of the single outlier of the August SOI by typing
<<sort soi, echo=TRUE, eval=FALSE>>=
sort(SOI$Aug)
@
and noting the value \Sexpr{boxplot.stats(SOI$Aug)$stats[5]} is the largest observation in the data less than \Sexpr{round(Q2+(Q3-Q1)*1.5,1)}.

Your observations are said to be symmetric if the median is near the middle of the box and the length of the two whiskers are about equal.  A symmetric set of observations will also have about the same number of high and low outliers.

To summarize, 25\% of all your observations are below the lower quartile (below the box),  50\% are below (and above) the median, and 25\% are above the upper quartile.  The box contains 50\% of all your data.  The upper whisker extends from the upper quartile to the maximum and the lower quartile extends from the lower quartile value to the minimum except if they exceed 1.5 times the interquartile range above the upper or below the lower quartiles.  In this case outliers are plotted as points.  This outlier option can be turned off by setting the \verb@range@ argument to zero.

The box plot is an efficient graphical summary of your data, but it can be designed better.  For example, the box sides provide redundant information.  By removing the box lines altogether, the same information is available with less ink.  Figure~\ref{fig:tuftebox} is series of box plots representing the SOI for each month.  The dot represents the median; the ends of the lines towards the dot are the lower and upper quartile, respectively; the ends of the lines towards the bottom and top of the graph are the minimum and maximum values, respectively.
\begin{figure}
\centering
<<Tufteboxplotfig, fig=TRUE, echo=FALSE, eval=TRUE, width=4, height=3>>=
par(las=1, mgp=c(2, .4, 0), tcl=-.3)
plot(c(1, 12), c(-5, 5), type="n", xaxt="n", bty="n", xlab="", ylab="SOI [s.d.]")
axis(1, at=seq(1, 12, 1), labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
  "Aug", "Sep", "Oct", "Nov", "Dec"), cex=.5)
for(i in 1:12){
points(i, fivenum(SOI[, i+1])[3], pch=19)
lines(c(i, i), c(fivenum(SOI[, i+1])[1], fivenum(SOI[, i+1])[2]))
lines(c(i, i), c(fivenum(SOI[, i+1])[4], fivenum(SOI[, i+1])[5]))
}
@
\vspace{-1cm}
\caption{Five-number summary of the monthly SOI.}
\label{fig:tuftebox}
\end{figure}

\subsection{Histogram}
\label{subsec:histogram}

A histogram is a graph of the distribution of your observations.  It shows where the values tend to cluster and where they tend to be sparse.  The histogram is similar but not identical to a bar plot (see Chapter~\ref{chap:Rtutorial}).  The histogram uses bars to indicate frequency (or proportion) in data intervals, whereas a bar plot uses bars to indicate the frequency of data by categories.  The \verb@hist@ function creates a histogram.

As an example, consider NOAA's annual values of accumulated cyclone energy (ACE) for the North Atlantic and July SOI values.  Annual ACE is calculated by squaring the maximum wind speed for each six hour tropical cyclone observation and summing over all cyclones in the season.  The values obtained from NOAA (\url{http://www.aoml.noaa.gov/hrd/tcfaq/E11.html}) are expressed in units of knots squared $\times 10^4$.  You create the two histograms and plot them side-by-side.  First set the plotting parameters with the \verb@par@ function.  Details on plotting options are given in \cite{Murrell2005}.  After your histogram is plotted the function \verb@rug@ (like a floor carpet) adds tick marks along the horizontal axis at the location of each observation.
<<histogram ace, fig=FALSE, echo=TRUE, eval=FALSE>>=
par(mfrow=c(1, 2), pty="s")
hist(A$ACE)
rug(A$ACE)
hist(SOI$Jul)
rug(SOI$Jul)
@
\begin{figure}
\centering
<<histogramace, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.2>>=
c = .5144^2
par(mfrow=c(1, 2), pty="s", las=1, mgp=c(2, .4, 0), tcl=-.3)
hist(A$ACE * c, main="", xlab="ACE [$\\times$10$^4$ m$^2$ s$^{-2}$]")
rug(A$ACE * c)
mtext("a", side=3, line=1, adj=0, cex=1.1)
hist(SOI$Jul, main="", xlab="SOI [s.d.]")
rug(SOI$Jul)
mtext("b", side=3, line=1, adj=0, cex=1.1)
@
\vspace{-1cm}
\caption{Histograms of (a) ACE and (b) SOI.}
\label{fig:hist}
\end{figure}

Figure~\ref{fig:hist} shows the result.  Here we added an axis label, turned off the default title,  and placed text ('a' and 'b') in the figure margins.  Plot titles are useful in presentations, but are not needed in publication.  The default horizontal axis label is the name of the data vector.  The default vertical axis is frequency and is labeled accordingly.

The \verb@hist@ function has many options.  Default values for these options provide a good starting point, but you might want to make adjustments.  Thus it is good to know how the histogram is assembled.  First a contiguous collection of disjoint intervals, called bins (or classes), is chosen that cover the range of data values.  The default for the number bins is the value $\lceil \log_2(n)+1\rceil $, where $n$ is the sample size and $\lceil \rceil$ indicates the ceiling value (next largest integer).  If you type
<<ceiling function, echo=TRUE, eval=TRUE>>=
n = length(SOI$Jul)
ceiling(log(n, base=2) + 1)
@
you can see that adjustments are made to this number so that the cut points correspond to whole number data values.  In the case of ACE the adjustment results in 7 bins and in the case of the SOI it results in 11 bins.  Thus the computed number of bins is a suggestion that gets modified to make for nice breaks.

The bins are contiguous and disjoints so the intervals look like $(a, b]$ or $[a, b)$ where the interval $(a, b]$ means from $a$ to $b$ including $b$ but not $a$.  Next, the number of data values in each of the intervals is counted.  Finally, a bar is drawn above the interval so that the bar height is the number of data values (frequency).  A useful argument to make your histogram more visually understandable is \verb@prob=TRUE@ which allows you to set the bar height to the density, where the sum of the densities times the bar interval width equals one.

You conclude that ACE is positively skewed with a relatively few years having very large values.  Clearly ACE does not follow a normal distribution.  In contrast, the SOI appears quite symmetric with rather short tails as you would expect from a normal distribution.

\subsection{Density plot}
\label{subsec:densityplot}

A histogram outlines the general shape of your data.  Usually that is sufficient.   You can adjust the number of bins (or bin width) to get more or less detail on the shape.  An alternative is a density plot.  A density plot captures the distribution shape by essentially smoothing the histogram.  Instead of specifying the bin width you specify the amount (and type) of smoothing.

There are two steps to produce a density plot.  First you need to use the \verb@density@ function to obtain a set of kernel density estimates from your observations.  Second you need to plot these estimates, typically by using the \verb@plot@ method.

A kernel density is a function that provides an estimate of the average number of values at any location in the space defined by your data.  This is illustrated in Fig.~\ref{fig:kernels} where the October SOI values in the period 2005--2010 are indicated as a rug and a kernel density function is shown as the black curve.  The height of the function, representing the local density, is a sum of the heights of the individual kernels shown in red.
\begin{figure}
\centering
<<kerneldensity, fig=TRUE, echo=FALSE, eval=TRUE, width=4, height=3>>=
x = tail(SOI$Oct)
xlims = c(-3, 3)
bandw = .5
par(las=1, mgp=c(2, .4, 0), tcl=-.3)
plot(density(x,bw=bandw), xlim=xlims, type="n", main="", xlab="October SOI [s.d.]")
rug(x, lwd=3)
d = density(x[1], bw=bandw)
lines(d$x, d$y/length(x), xlim=xlims, col="red")
d = density(x[2], bw=bandw)
lines(d$x, d$y/length(x), xlim=xlims, col="red")
d = density(x[3], bw=bandw)
lines(d$x, d$y/length(x), xlim=xlims, col="red")
d = density(x[4], bw=bandw)
lines(d$x, d$y/length(x), xlim=xlims, col="red")
d = density(x[5], bw=bandw)
lines(d$x, d$y/length(x), xlim=xlims, col="red")
d = density(x[6], bw=bandw)
lines(d$x, d$y/length(x), xlim=xlims, col="red")
lines(density(x, bw=bandw), xlim=xlims, lwd=2)
@
\vspace{-.5cm}
\caption{Density of October SOI (2005--2010).}
\label{fig:kernels}
\end{figure}
The kernel is a Gaussian (normal) distribution centered at each data value.  The width of the kernel, called the bandwidth, controls the amount of smoothing.  The bandwidth is the standard deviation of the kernel in the \verb@density@ function.  This means the inflection points on the kernel occur one bandwidth away from the data location in units of the data values.  Here with the SOI in units of standard deviation the bandwidth equals .5 s.d.

A larger bandwidth produces a smoother density plot or a fixed number of observations because the kernels have greater overlap.  Figure~\ref{fig:bw} shows the density plot of June NAO values from the period 1851--2010 using bandwidths of .1, .2, .5, and 1.
\begin{figure}
\centering
<<kerneldensityplot, fig=TRUE, echo=FALSE, eval=TRUE, width=3.7, height=3.2>>=
par(mfrow=c(2, 2), mar=c(5, 5, 4, 2) + .1, mex=.7, las=1, mgp=c(3, .4, 0), tcl=-.3)
x = NAO$Jun
xlims = range(x)
plot(density(x, bw=.1), xlim=xlims, main="", xlab="June NAO [s.d.]", lwd=2)
rug(x)
mtext("a", side=3, line=1, adj=0, cex=1.1)
plot(density(x, bw=.2), xlim=xlims, main="", xlab="June NAO [s.d.]", lwd=2)
rug(x)
mtext("b", side=3,line=1,adj=0,cex=1.1)
plot(density(x, bw=.5), xlim=xlims, main="", xlab="June NAO [s.d.]", lwd=2)
rug(x)
mtext("c", side=3, line=1, adj=0, cex=1.1)
plot(density(x, bw=1), xlim=xlims, main="", xlab="June NAO [s.d.]", lwd=2)
rug(x)
mtext("d", side=3, line=1, adj=0, cex=1.1)
@
\vspace{-.2cm}
\caption{Density of June NAO. (a) .1, (b) .2, (c) .5, and (d) 1 s.d. bandwidth.}
\label{fig:bw}
\end{figure}
The smallest bandwidth produces a density plot that is has spikes as it captures the fine scale variability in the distribution of values.  As the bandwidth increases the spikes disappear and the density gets smoother.  The largest bandwidth produces a smooth symmetric density centered on the value of zero.

To create a density plot for the NAO values with a histogram overlay, type
<<histogram and density, fig=FALSE, echo=TRUE, eval=FALSE>>=
d = density(NAO$Jun, bw=.5)
plot(d, main="", xlab="June NAO [s.d.]",
  lwd=2, col="red")
hist(NAO$Jun, prob=TRUE, add=TRUE)
rug(NAO$Jun)
@
\begin{figure}
\centering
<<histogramanddensity, fig=TRUE, echo=FALSE, eval=TRUE, width=4, height=3>>=
par(las=1, mgp=c(2.3, .4, 0), tcl=-.3)
plot(density(NAO$Jun, bw=.5), main="", xlab="June NAO [s.d.]", lwd=2, col="red")
hist(NAO$Jun, prob=TRUE, add=TRUE)
rug(NAO$Jun)
@
\vspace{-.5cm}
\caption{Density and histogram of June NAO.}
\label{fig:densityhist}
\end{figure}
The density function takes your vector of data values as input and allows you to specify a bandwidth using the \verb@bw@ argument.  Here you are using the vector of June NAO values and a bandwidth of .5 s.d.  The bandwidth units are the same as the units of your data, here s.d. for the NAO.  The output is saved as a density object, here called \verb@d@.  The object is then plotted using the \verb@plot@ method.  You turn off the default plot title with the \verb@main=""@ and you specify a label for the values to be plotted below the horizontal axis.  You specify the line width as 2 and the line color as red.

You then add the histogram over the density line using the \verb@hist@ function.  You use the \verb@prob=TRUE@ argument to make the bar height proportional to the density.  The \verb@add=TRUE@ argument is needed so that the histogram plots on the same device.   One reason for plotting the histogram or density is to see whether your data have a normal distribution.  The Q-Q plot provides another way to make this assessment.

\subsection{Q-Q plot}
\label{subsec:qqplot}

A Q-Q plot is a graphical way to compare distributions.  It does this by plotting quantile (Q) values of one distribution against the corresponding quantile (Q) values of the other distribution.  In the case of assessing whether or not your data are normally distributed, the sample quantiles are plotted on the vertical axis and quantiles from a standard normal distribution are plotted along the horizontal axis.  In this case it is called a Q-Q normal plot.

That is, the $k$th smallest observation is plotted against the expected value of the $k$th smallest random value from a $N(0, 1)$ sample of size $n$.   The pattern of points in the plot is then used to compare your data against a normal distribution.  If your data are normally distributed then the points should align along the $y=x$ line.

This somewhat complicated procedure is done using the \verb@qqnorm@ function.  To make side-by-side Q-Q normal plots for the ACE values and the July SOI values you type
<<qqnorm plot, fig=FALSE, echo=TRUE, eval=FALSE>>=
par(mfrow=c(1, 2), pty="s")
qqnorm(A$ACE)
qqline(A$ACE, col="red")
qqnorm(SOI$Jul)
qqline(SOI$Jul, col="red")
@
\begin{figure}
\centering
<<qqnormplot, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.3>>=
c = .5144^2
par(mfrow=c(1, 2), pty="s", las=1, mgp=c(2, .4, 0), tcl=-.3)
qqnorm(A$ACE*c, main="", ylab=
  "Sample quantiles\n (ACE [$\\times$10$^4$ m$^2$s$^{-2}$])",
  xlab="Theoretical quantiles")
qqline(A$ACE*c, col="red")
mtext("a", side=3, line=1, adj=0, cex=1.1)
qqnorm(SOI$Jul, main="", ylab="Sample quantiles\n (SOI [s.d.])",
  xlab="Theoretical quantiles")
qqline(SOI$Jul, col="red")
mtext("b", side=3, line=1, adj=0, cex=1.1)
@
\vspace{-.75cm}
\caption{Q-Q normal plot of (a) ACE and (b) July SOI.}
\label{fig:qqnorm}
\end{figure}
The plots are shown in Fig.~\ref{fig:qqnorm}.  The quantiles are non-decreasing.  The $y=x$ line is added to the plot using the \verb@qqline@ function.  Additionally we adjust the vertical axis label and turn the default title off.

The plots show that July SOI values appear to have a normal distribution while the seasonal ACE does not.  For observations that have a positive skew, like the ACE, the pattern of points on a Q-Q normal plot is concave upward.  For observations that have a negative skew the pattern of points is concave downward.  For values that have a symmetric distribution but with fatter tails than the normal (e.g., the $t$-distribution), the pattern of points resembles an inverse sine function.

The Q-Q normal plot is useful in checking the residuals from a regression model.  The assumption is that the residuals are independent and identically distributed from a normal distribution centered on zero.  In Chapter~\ref{chap:classicalstatistics} you created a multiple linear regression model for August SST using March SST and year as the explanatory variables.  To examine the assumption of normally distributed residuals with a Q-Q normal plot type
<<qqnorm residuals, fig=FALSE, echo=TRUE, eval=FALSE>>=
model = lm(Aug ~ Year + Mar, data=SST)
qqnorm(model$residuals)
qqline(model$residuals, col="red")
@
The points align along the $y=x$ axis indicating a normal distribution.

\subsection{Scatter plot}
\label{subsec:scatterplot}

The \verb@plot@ method is used to create a scatter plot.  The values of one variable are plotted against the values of the other variable as points in a Cartesian plane (see Chapter~\ref{chap:Rtutorial}).  The values named in the first argument are plotted along the horizontal axis.

This pairing of two variables is useful in generating and testing hypotheses about a possible  relationship.  In the context of correlation which variable gets plotted on which axis is irrelevant.  Either way, the scatter of points illustrates the amount of correlation.  However, in the context of a statistical model, by convention the dependent variable (the variable you are interested in explaining) is plotted on the vertical axis and the explanatory variable is plotted on the horizontal axis.  For example, if your interest is whether pre-hurricane season ocean warmth (e.g., June SST) is related to ACE, your model is
<<linear model ace, echo=TRUE, eval=TRUE>>=
ace = A$ACE*.5144^2
sst = SST$Jun
model = lm(ace ~ sst)
@
and you plot ACE on the vertical axis.  Since your slope and intercept coefficients from the linear regression model are saved as part of the object \verb@model@, you can first create a scatter plot then use the \verb@abline@ function to add the linear regression line.  Here the function extracts the intercept and slope coefficient values from the model object and draws the straight line using the point-intercept formula.

Note here you use the model formula syntax (\verb@ace ~ sst@) as the first argument in the \verb@plot@ function.
<<scatter ace and sst, fig=FALSE, echo=TRUE, eval=FALSE>>=
plot(ace ~ sst, ylab=expression(
   paste("ACE [x", 10^4," ", m^2, s^-2,"]")),
   xlab=expression(paste("SST [",degree,"C]")))
abline(model, col="red", lwd=2)
@
\begin{figure}
\centering
<<acesstscatterplot, fig=TRUE, echo=FALSE, eval=TRUE, width=3.5, height=3.5>>=
ci.lwr = predict(model, data.frame(sst=sort(sst)), level=.95,
  interval="confidence")[, 2]
ci.upr = predict(model, data.frame(sst=sort(sst)), level=.95,
  interval="confidence")[, 3]
par(pty="s", las=1, mgp=c(2, .4, 0), tcl=-.3)
plot(ace ~ sst, type="n",
  ylab="ACE [$\\times$10$^4$ m$^2$ s$^{-2}$]",
  xlab="SST [$^\\circ$C]")
grid()
xx = c(sort(sst), rev(sort(sst)))
yy = c(ci.upr, rev(ci.lwr))
polygon(xx, yy, col="gray", border="gray")
points(sst, ace, pch=19, cex=.8)
abline(model, col="red", lwd=2)
@
\vspace{-.5cm}
\caption{Scatter plot and linear regression line of ACE and June SST.}
\label{fig:ACEvsJunSST}
\end{figure}
Figure~\ref{fig:ACEvsJunSST} is the result.  The relationship between ACE and SST is summarized by the linear regression model shown by the straight line.  The slope of the line indicates that for every 1$^\circ$C increase in SST the average value of ACE increases by \Sexpr{round(coef(model)[2],1)}$\times 10^4$~m$^2$/s$^2$ (type \verb@coef(model[2])@).

Since the regression line is based on a sample of data you should display it inside a band of uncertainty.  As we saw in Chapter~\ref{chap:classicalstatistics} there are two types of uncertainty bands; a confidence band (narrow) and a prediction band (wide).  The confidence band reflects the uncertainty about the line itself, which like the standard error of the mean indicates the precision by which you know the mean.  In regression, the mean is not constant but rather a function of the explanatory variable.

The 95\% confidence band is shown in Fig.~\ref{fig:ACEvsJunSST}.  The width of the band is inverse related to the sample size.  In a large sample of data, the confidence band will be narrow reflecting a well-determined line.  Note that it's impossible to draw a horizontal line that fits completely within the band.  This indicates that there is a significant relationship between ACE and SST.

The band is narrowest in the middle which is understood by the fact that the predicted value at the mean SST will be the mean of ACE, whatever the slope, and thus the standard error of the predicted value at this point is the standard error of the mean of ACE.  At other values of SST we need to add the variability associated with the estimated slope.  This variability is larger for values of SST farther from the mean, which is why the band looks like a bow tie.

The prediction band adds another layer of uncertainty; the uncertainty about {\it future} values of ACE.  The prediction band captures the majority of the observed points in the scatter plot.  Unlike the confidence band, the width of the prediction band depends strongly on the assumption of normally distributed errors with a constant variance across the values of the explanatory variable.

\subsection{Conditional scatter plot}
\label{subsec:conditionalscatterplot}

Separate scatter plots {\it conditional} on the values of a third variable can be quite informative.  This is done with the \verb@coplot@ function.  The syntax is the same as above except you add the name of the conditioning variable after a vertical bar.

For example, as you saw above, there is a positive relationship between ACE and SST.  The conditioning plot answers the question; is there a change in the relationship depending on values of the third variable.  Here you use August SOI values as the conditioning variable and type
<<coplot, fig=FALSE, echo=TRUE, eval=FALSE>>=
soi = SOI$Aug
coplot(ace ~ sst | soi, panel=panel.smooth)
@
The syntax is read `conditioning plot of ACE versus SST given values of SOI.'  The function divides the range of the conditioning variable (SOI) into six intervals with each interval having approximately the same number of years.  The range of SOI values in each interval overlaps by  50\%.  The conditioning intervals are plotted in the top panel as horizontal bars (shingles).  The plot is shown in Fig.~\ref{fig:coplot}.

The scatter plots of ACE and SST are arranged in a matrix of panels below the shingles.  The panels are arranged from lower left to upper right.  The lower left panel corresponds to the lowest range of SOI values (less than about $-$1 s.d.) and the upper right panel corresponds to the highest range of SOI values (greater than about $+$.5 s.d.).  Half of the data points in a panel are shared with the panel to the left and half of the data points are shared with the panel to the right.  This is indicated by the amount of shingle overlap.
\begin{figure}
\centering
<<coplotfigure, fig=TRUE, echo=FALSE, eval=TRUE, width=4.7, height=5>>=
par(las=0, mgp=c(2, .4, 0), tcl=-.3)
soi = SOI$Aug
n = length(soi)
coplot(ace[16:n] ~ sst[16:n] | soi[16:n], panel=panel.smooth, pch=16,
   ylab="ACE [$\\times$10$^4$ m$^2$ s$^{-2}$]",
   xlab=c("SST [$^\\circ$C]", "Conditioning variable: SOI [s.d.]"))
@
\vspace{0cm}
\caption{Scatter plots of ACE and SST conditional on the SOI.}
\label{fig:coplot}
\end{figure}

Results show a positive, nearly linear, relationship between ACE and SST for all ranges of SOI values.  However over the SOI range between $-$1.5 and 0 the relationship is nonlinear.  ACE is least sensitive to SST when SOI is the most negative (El Ni\~no years) as indicated by the nearly flat line in the lower-left panel.  The argument \verb@panel@ adds a local linear curve (red line) through the set of points in each plot.

\section{Time series}
\label{sec:timeseries}

Hurricane data often take the form of a time series.  A time series is a sequence of data values measured at successive times and spaced at uniform intervals.  You can treat a time series as a vector and use structured data functions (see Chapter~\ref{chap:Rtutorial}) to generate time series.

However, additional functions are available for data that are converted to time-series objects.  Time-series objects are created using the \verb@ts@ function.  You do this with the monthly NAO data frame as follows.  First create a matrix of the monthly values skipping the year column in the data frame.  Second take the transpose of this matrix (switch the rows with the columns) using the \verb@t@ function and save the matrix is a vector.  Finally, create a time-series object, specifying the frequency of values and the start month.  Here the first value is from January 1851.
<<time series nao, echo=TRUE, eval=TRUE>>=
nao.m = as.matrix(NAO[, 2:13])
nao.v = as.vector(t(nao.m))
nao.ts = ts(nao.v, frequency=12, start=c(1851, 1))
@
For comparison, also create a time-series object for the cumulative sum of the monthly SOI values.  The is done with the \verb@cumsum@ function applied to your data vector.
<<cumsum nao, echo=TRUE, eval=TRUE>>=
nao.cts = ts(cumsum(nao.v),
   frequency=12, start=c(1851, 1))
@
This results in objects of class {\tt ts}, which is used for time series having numeric time information.  Additional classes for working with time series data that can handle dates and other types of time information are available in other packages.  For example, the {\bf fts} package implements regular and irregular time series based on {\tt POSIXct} time stamps (see \S\ref{subsec:datesandtimes}) and the {\bf zoo} package provides functions for most time series classes.

\subsection{Time-series graph}
\label{subsec:timeseriesgraph}

The objects of class {\tt ts} make it easy to plot your data as a time series.  For instance, you plot the cumulative sum of the NAO values using the \verb@plot@ method.  The method recognizes the object as a time series and plots it accordingly eliminating the need to specify a separate time variable.
<<plot time series, fig=FALSE, echo=TRUE, eval=FALSE>>=
plot(nao.cts)
@
\begin{figure}
\centering
<<plottimeseries, fig=TRUE, echo=FALSE, eval=TRUE, width=4, height=3>>=
par(las=1, mgp=c(2, .4, 0), tcl=-.3)
plot(nao.cts, xlab="Year", ylab="Cumulative NAO [s.d.]")
abline(h=0, col="red")
grid()
lines(nao.cts, lwd=2)
@
\vspace{-.75cm}
\caption{Time series of the cumulative sum of NAO values.}
\label{fig:tsNAO}
\end{figure}

Figure~\ref{fig:tsNAO} shows the result. The cumulative sum indicates a pattern typical of a random walk.  That is, over the long term there is a tendency for more positive-value months leading to a `wandering' of the cumulative sum away from the zero line.  This tendency begins to reverse in the late 20th century.

\subsection{Autocorrelation}
\label{subsec:autocorrelation}

Autocorrelation is correlation between values of a single variable.  For time data it refers to single series correlated with itself as a function of temporal lag.  For spatial data it refers to single variable correlated with itself as a function of spatial lag, which can be a vector of distance and orientation (see Chapter~\ref{chap:spatialmodels}).  In both cases the term `autocorrelation function' is used, but with spatial data the term is often qualified with the word `spatial.'

As an example, save 30 random values from a standard normal distribution in a vector where the elements are considered ordered in time.  First create a time-series object.  Then use the \verb@lag.plot@ function to create a scatter plot of the time series against a lagged copy where the lagged copy starts one time interval earlier.
<<lag plot, echo=TRUE, eval=FALSE>>=
t0 = ts(rnorm(30))
lag.plot(t0, lag=1)
@
With $N$ values, the plot for lag one contains $N-1$ points.  The points are plotted using the text number indicating the temporal order so that the first point labeled `1' is given by the the coordinates (\verb@t0[1]@, \verb@t0[2]@).  The correlation at lag one can be inferred by the scatter of points.  The plot can be repeated for any number of lags, but with higher lags the number of points decreases and plots are drawn for each lag.

You use the autocorrelation function (\verb@acf@) to quantify the correlation at various temporal lags.  The function accepts univariate and multivariate numeric time-series objects and produces a plot of the autocorrelation values as a function of lag.  For example, to create a plot of the autocorrelation function for the NAO time series object created in the previous section, type
<<autocorrelation NAO, echo=TRUE, eval=FALSE>>=
acf(nao.ts, xlab="Lag [Years]",
  ylab="Autocorrelation")
@
\begin{figure}
\centering
<<NAOacfpcf, echo=FALSE, eval=TRUE, fig=TRUE, width=4.6, height=3.2>>=
par(mfrow=c(1, 2), pty="s", las=1, mgp=c(2.5, .4, 0), tcl=-.3)
acf(nao.ts, ci.col="red", xlab="Lag [years]", main="",
  ylab="Autocorrelation")
mtext("a", side=3, line=1, adj=0, cex=1.1)
pacf(nao.ts, ci.col="red", xlab="Lag [years]", main="",
  ylab="Partial autocorrelation")
mtext("b", side=3, line=1, adj=0, cex=1.1)
@
\vspace{-1cm}
\caption{Autocorrelation and partial autocorrelation functions of monthly NAO.}
\label{fig:aut}
\end{figure}
The lag values plotted on the horizontal axis are plotted in units of time rather than numbers of observations (see Fig.~\ref{fig:aut}).  Dashed lines are the 95\% confidence limits.  Here the time-series object is created using monthly frequency, so the lags are given in fractions of 12 with 1.0 corresponding to a year.  The maximum lag is calculated as $10\times \log_{10}N$ where $N$ is the number of observations.  This can be changed using the argument \verb@lag.max@.

The lag-zero autocorrelation is fixed at 1 by convention.  The non-zero autocorrelation are all less then .1 in absolute value indicative of an uncorrelated process.  By default the plot includes 95\% confidence limits (\verb@ci@) computed as $\pm 1.96/\sqrt{N}$.

The partial autocorrelation function \verb@pacf@ computes the autocorrelation at lag $k$ after the linear dependencies between lags 1 to $k-1$ are removed.  The partial autocorrelation is used to identify the extent of the lag in an autoregressive model.  Here the partial autocorrelation vacillates between positive and negative values indicative of a moving-average process.\footnote{A moving-average process is one in which the expectation of the current value of the series is linear related to previous white noise errors.}

If your regression model uses time-series data it is important to examine the autocorrelation in the model residuals.  If residuals from your regression model have significant autocorrelation then the assumption of independence is violated.  This violation does not bias the coefficient estimates, but with positive autocorrelation the standard errors on the coefficients tend to be too small giving you unwarranted confidence in your inferences.

\subsection{Dates and times}
\label{subsec:datesandtimes}

You have various options for working with date and time data in R.  The \verb@as.Date@ function gives you flexibility in handling dates through the \verb@format@ argument.  The default format is a four-digit year, a month, then a day, separated by dashes or slashes.  For example, the character string \verb@"1992-8-24"@ will be accepted as a date by typing
<<as date andrew, echo=TRUE, eval=TRUE>>=
Andrew = as.Date("1992-8-24")
@
Although the print method displays it as a character string, the object is a {\tt Date} class stored as the number of days since January 1, 1970, with negative numbers for earlier dates.

If your input dates are not in the standard year, month, day order, a format string can be composed using the elements shown in Table~\ref{tab:formatdatecodes}.  For instance, if your date is specified as August 29, 2005 then you type
<<as date katrina, echo=TRUE, eval=TRUE>>=
Katrina = as.Date("August 29, 2005",
   format="%B %d, %Y")
@
\begin{table}
\begin{center}
\caption{\label{tab:formatdatecodes} Format codes for dates.}
\begin{tabular}{ll} \hline
Code  & Value \\ \hline
\%d   & Day of the month (decimal number) \\
\%m   & Month (decimal number) \\
\%b   & Month (abbreviated, e.g., Jan) \\
\%B   & Month (full name) \\
\%y   & Year (2 digit) \\
\%Y   & Year (4 digit) \\ \hline
\end{tabular}
\end{center}
\end{table}

Without knowing how many leap years between hurricanes Andrew and Katrina, you can find the number of days between them by typing
<<days between andrew and katrina, echo=TRUE, eval=TRUE>>=
difftime(Katrina, Andrew, units="days")
@
Or you can obtain the number of days from today since Andrew by typing
<<days since andrew, echo=TRUE, eval=FALSE>>=
difftime(Sys.Date(), Andrew, units="days")
@
The function \verb@Sys.Date@ with no arguments gives the current day in year-month-day format as a {\tt Date} object.

The portable operating system interface (POSIX) has formats for dates and times, with functionality for converting between time zones \cite{Spector2008}.  The POSIX date/time classes store times to the nearest second.  There are two such classes differing only in the way the values are stored internally.  The {\tt POSIXct} class stores date/time values as the number of seconds since January 1, 1970, while the {\tt POSIXlt} class stores them as a list.  The list contains elements for second, minute, hour, day, month, and year among others.

The default input format for POSIX dates consist of the year, month, and day, separated by slashes or dashes with time information followed after white space.  The time information is is in the format hour:minutes:seconds or simply hour:minutes.  For example, according to the U.S. National Hurricane Center, Hurricane Andrew hit Homestead Air Force Base at 0905 UTC on August 24, 1992.  You add time information to your Andrew date object and convert it to a {\tt POSIXct} object.
<<add time info to andrew time, echo=TRUE, eval=TRUE>>=
Andrew = as.POSIXct(paste(Andrew, "09:05"),
  tz="GMT")
@
You then retrieve your local time from your operating system as a character string and use the date-time conversion \verb@strptime@ function to convert the string to a POSIXlt class.
<<get now time info, echo=TRUE, eval=TRUE>>=
mytime = strptime(Sys.time(), format=
  "%Y-%m-%d %H:%M:%S", tz="EST5EDT")
@
Our time zone is U.S. Eastern time, so we use \verb@tz="EST5EDT"@.  You then find the number of hours since Andrew's landfall by typing,
<<hours since andrew, echo=TRUE, eval=TRUE>>=
difftime(mytime, Andrew, units="hours")
@
Note that time zones are not portable, but \verb@EST5EDT@ comes pretty close.

Additional functionality for working with times is available in the {\bf chron} and {\bf lubridate} packages.  In particular, {\bf lubridate} (great package name) makes it easy to work with dates and times by providing functions to identify and parse date-time data, extract and modify components (years, months, days, hours, minutes, and seconds), perform accurate math on date-times, handle time zones and Daylight Savings Time \citep{GrolemundWickham2011}.

For example, to return the day of the week from your object {\tt Andrew} you use the \verb@wday@ function in the package by typing,
<<lubridate, echo=TRUE, eval=TRUE>>=
require(lubridate)
wday(Andrew, label=TRUE, abbr=FALSE)
@
If you lived in south Florida, what a \Sexpr{as.character(wday(Andrew, label=TRUE,abbr=FALSE))} it was.  Other examples of useful functions in the package related to the Andrew time object include, the year, was it a leap year, what week of the year was it, and what local time was it.  Finally, what is your current time in Chicago?
<<lubridate examples, echo=TRUE, eval=FALSE>>=
year(Andrew)
leap_year(Andrew)
week(Andrew)
with_tz(Andrew,tz="America/New_york")
now(tz="America/Chicago")
@

\section{Maps}
\label{sec:maps}

A great pleasure in working with graphs is the chance to visualize patterns.  Maps are among the most compelling graphics as the space they map is the space in which hurricanes occur.  We can use them to find interesting patterns that are otherwise hidden.  Various packages are available for creating maps.  Here we look at some examples.

\subsection{Boundaries}
\label{subsec:boundaries}

Sometimes all that is needed is a reference map to show your study location.  This can be created using state and country boundaries.  For example, the {\bf maps} package is used to draw country and state borders.  To draw a map of the United States with state boundaries, type
<<map example, echo=TRUE, eval=FALSE>>=
require(maps)
map("state")
@
The call to \verb@map@ creates the country outline and adds the state boundaries.  The map is shown in Fig.~\ref{fig:usamap}.  The package contains outlines for countries around the world (e.g., type \verb@map()@).
\begin{figure}
\centering
<<mapexample, fig=TRUE, echo=FALSE, eval=TRUE, width=3.5, height=3.5>>=
par(las=1)
require(maps)
map("state", interior=FALSE)
map("state", boundary=FALSE, col="gray", add=TRUE)
@
\vspace{-2cm}
\caption{Map with state boundaries.}
\label{fig:usamap}
\end{figure}

The coordinate system is latitude and longitude, so you can overlay other spatial data.  As an example, first input the track of Hurricane Ivan (2004) as it approached the U.S. Gulf coast.  Then list the first six rows of data.
<<read ivan data, echo=TRUE, eval=TRUE>>=
Ivan = read.table("Ivan.txt", header=TRUE)
head(Ivan)
@
Among other attributes, the data frame \verb@Ivan@ contains the latitude and longitude position of the hurricane every hour from 24 hours before landfall until 12 hours after landfall.

Here your geographic domain is the southeast, so first create a character vector of state names.
<<states character string, echo=TRUE, eval=TRUE>>=
cs = c('texas', 'louisiana', 'mississippi',
  'alabama', 'florida', 'georgia', 'south carolina')
@
Next use the \verb@map@ function with this list to plot the state boundaries and fill the state polygons with a gray shade.  Finally connect the hourly location points with the \verb@lines@ function and add an arrow head to the last two locations.
<<ivan track map, fig=FALSE, echo=TRUE, eval=FALSE>>=
map("state", region=cs, boundary=FALSE, col="gray",
  fill=TRUE)
Lo = Ivan$Lon
La = Ivan$Lat
n = length(Lo)
lines(Lo, La, lwd=2.5, col="red")
arrows(Lo[n - 1], La[n - 1], Lo[n], La[n], lwd=2.5,
   length=.1, col="red")
@
\begin{figure}
\centering
<<ivantrackmap, fig=TRUE, echo=FALSE, eval=TRUE, width=3.5, height=3.5>>=
par(las=1)
map("state", region=cs, boundary=FALSE, col="gray", fill=TRUE)
Lo = Ivan$Lon
La = Ivan$Lat
n = length(Lo)
lines(Lo, La, lwd=2.5, col="red")
arrows(Lo[n - 1], La[n - 1], Lo[n], La[n], lwd=2.5, length=0.1, col="red")
@
\vspace{-2cm}
\caption{Track of Hurricane Ivan (2004) before and after landfall.}
\label{fig:Ivantrack}
\end{figure}
The result is shown in Fig.~\ref{fig:Ivantrack}.  Hurricane Ivan moves northward from the central Gulf of Mexico and makes landfall in the western panhandle region of Florida before moving into southeastern Alabama.

The scale of the map is defined as the ratio of the map distance in a particular unit (e.g., centimeters) to the actual distance in the same unit.  Small scale describes maps of large regions where this ratio is small and large scale describes maps of small regions where the ratio is larger.  The boundary data in the {\bf maps} package is sufficient for use with small scale maps as the number of boundary points is not sufficient for close-up (high resolution) views. Higher-resolution boundary data are available in the {\bf mapdata} package.

\subsection{Data types}
\label{subsec:datatypes}

The type of map you make depends on the type of spatial data. Broadly speaking there are three types of spatial data; point, areal, and field data.  Point data are event locations.  Any location in a continuous spatial domain may have an event.  The events may carry additional information, called `marks.'  Interest centers on the distribution of events and on whether there are clusters of events.  The set of all locations where hurricanes first reached their maximum intensity is an example of point data.  The events are the location of the hurricane at maximum intensity and a mark could be the corresponding wind speed.

Areal data are aggregated or group values within fixed polygon areas.  The set of areas form a lattice so the data are sometimes called `lattice data.'  Interest typically centers on how the values change across the domain and how much correlation exists within neighborhoods defined by polygon contiguity or distance from polygon centroids. County-wide population is an example of areal data.  The values may be the number of people living in the county or a population density indicating the average number of people per area.

Field data are measurements or observations of some spatially continuous variable, like pressure or temperature.  The values are given at certain locations and the interest centers on using these values to create a continuous surface from which values can be inferred at any location.  Sea-surface temperature is an example of field data.  The values may be at random located sites or they may be on a fixed grid.

\subsubsection{Point data}
\label{subsubsec:pointdata}

Consider the set of events defined by the location at which a hurricane first reaches its lifetime maximum intensity.  The data are available in the file {\it LMI.txt} and are input by typing
<<input lifetime maximum intensity data, echo=TRUE, eval=TRUE>>=
LMI.df = read.table("LMI.txt", header=TRUE)
LMI.df$WmaxS = LMI.df$WmaxS * .5144
head(LMI.df[, c(4:10, 11)])
@
The \verb@Wmax@ column (not shown) is a spline interpolated maximum wind speed and \verb@WmaxS@ is first smoothed then spline interpolated to allow time derivatives to be computed.  Chapter~\ref{chap:datasets} provides more details and explains how this data set is constructed.

The raw wind speed values are given in 5~kt increments.  Although knots (kt) are the operational unit used for reporting tropical cyclone intensity to the public in the United States, here you use the SI units of m~s$^{-1}$ .  We use the term `intensity' as shorthand for `maximum wind speed,' where maximum wind speed refers to the estimated fastest wind velocity somewhere in the core of the hurricane.  Lifetime maximum refers to the highest maximum wind speed during the life of the hurricane.

You draw a map of the event locations with the \verb@plot@ method using the longitude coordinate as the $x$ variable and the latitude coordinate as the $y$ variable by typing
<<locationmap, echo=TRUE, eval=FALSE>>=
with(LMI.df, plot(lon, lat, pch=19))
map("world", col="gray", add=TRUE)
grid()
@
Adding country borders and latitude/longitude grid lines (\verb@grid@ function) enhances the geographic information.  The argument \verb@pch@ specifies a point character using an integer code.  Here 19 refers to a solid circle (type \verb@?points@ for more information).  The \verb@with@ function allows you use the column names from the data frame in the \verb@plot@ method.

Note the order of function calls.  By plotting the events first, then adding the country borders, the borders are clipped to the plot window.  The dimensions of the plot window are default to be slightly larger than the range of the longitude and latitude coordinates.  The function chooses a reasonable number of axis tics that are placed along the range of coordinate values at reasonable intervals.

Since the events are marked by storm intensity it is informative to add this information to the map.  Hurricane intensity, as indexed by an estimate of the wind speed maximum at 10~m height somewhere inside the hurricane, is a continuous variable.  You can choose a set of discrete intensity intervals and group the events by these class intervals.  For example, you might want to choose the Saffir/Simpson hurricane intensity scale.

To efficiently communicate differences in intensities with colors, you should limit the number classes to six or less.  The package {\bf classInt} is a collection of functions for choosing class intervals.  Here you require the package and create a vector of lifetime maxima.  You then obtain class boundaries using the \verb@classIntervals@ function.  Here the number of class intervals is set to five and the method of determining the interval breaks is based on Jenks optimization  (\verb@style="jenks"@).  Given the number of classes, the optimization minimizes the variance of the values within the intervals while maximizing the variance between the intervals.
<<class intervals, echo=TRUE, eval=TRUE>>=
require(classInt)
lmi = LMI.df$WmaxS
q5 = classIntervals(lmi, n=5, style="jenks",
   dataPrecision=1)
@
The \verb@dataPrecision@ argument sets the number of digits to the right of the decimal place.

Next you need to choose a palette of colors.  This is best left to someone with an understanding of hues and color separation schemes.  The palettes described and printed in \cite{BrewerEtAl2003} for continuous, diverging, and categorical variables can be examined on maps at \url{http://colorbrewer2.org/}.  Select the HEX radio button for a color palette of your choice and then copy and paste the hex code into a character vector preceded by the pound symbol.

For example, here you create a character vector (\verb@cls@) of length 5 containing the hex codes from the color brewer website from a sequential color ramp ranging between yellow, orange, and red.
<<hex colors, echo=TRUE, eval=TRUE>>=
cls = c("#FFFFB2", "#FECC5C", "#FD8D3C", "#F03B20",
  "#BD0026")
@
To use your own set of colors simply modify this list.  A character vector of color hex codes is generated automatically with functions in the {\bf colorRamps} package (see Chapter~\ref{chap:spatialmodels}).

The empirical cumulative distribution function of cyclone intensities with the corresponding class intervals and colors is then plotted by typing
<<plot cumulative frequency, fig=FALSE, echo=TRUE, eval=FALSE>>=
plot(q5, pal=cls, main="", xlab=
  expression(paste("Wind Speed [m ", s^-1,"]")),
  ylab="Cumulative Frequency")
@
The graph is shown in Fig.~\ref{fig:lmi:cdf}.  The points (with horizontal dashes) are the lifetime maximum intensity wind speeds in rank order from lowest to highest.  You can see that half of all hurricanes have lifetime maximum intensities greater than \Sexpr{round(quantile(lmi,.5),0)}~m~s$^{-1}$.
\begin{figure}
\centering
<<cumulativefreqplot, fig=TRUE, echo=FALSE, eval=TRUE, width=3.2, height=3.2>>=
par(las=1, mgp=c(2, .4, 0), tcl=-.3)
plot(q5, pal=cls, main="", xlab="Lifetime maximum wind speed [m s$^{-1}$]",
   ylab="Cumulative frequency")
@
\vspace{-.5cm}
\caption{Cumulative distribution of lifetime maximum intensity.  Vertical lines and corresponding color bar mark the class intervals with the number of classes set at five.}
\label{fig:lmi:cdf}
\end{figure}

Once you are satisfied with the class intervals and color palette, you can plot the events on a map.  First you need to assign a color for each event depending on its wind speed value.  This is done with the \verb@findColours@ function as
<<find colors, echo=TRUE, eval=TRUE>>=
q5c = findColours(q5, cls)
@
Now, instead of black dots with a color bar, each value is assigned a color corresponding to the class interval. For convenience you create the axis labels and save them as an expression object.  You do this with the \verb@expression@ and \verb@paste@ functions to get the degree symbol.
<<axis labels, echo=TRUE, eval=TRUE>>=
xl = expression(paste("Longitude [",{}^o,"E]"))
yl = expression(paste("Latitude [",{}^o,"N]"))
@
Since the degree symbol is not attached to a character you use \verb@{}@ in front of the superscript symbol.  You again use the \verb@plot@ method on the location coordinates, but this time set the color argument to the corresponding vector of colors saved in \verb@q5c@.
<<location plot, fig=FALSE, echo=TRUE, eval=FALSE>>=
plot(LMI.df$lon, LMI.df$lat, xlab=xl, ylab=yl,
   col=q5c, pch=19)
points(LMI.df$lon, LMI.df$lat)
@
To improve the map, you add country boundaries, place axis labels in the top and right margins, and add a coordinate grid.
<<add country boundaries, fig=FALSE, echo=TRUE, eval=FALSE>>=
map("world", add=TRUE)
axis(3)
axis(4)
grid()
@
To complete the map you add a legend by typing
<<add legend, fig=FALSE, echo=TRUE, eval=FALSE>>=
legend("bottomright", bg="white",
   fill=attr(q5c, "palette"),
   legend=names(attr(q5c, "table")),
   title=expression(paste("Wind Speed [m "
   , s^-1, "]")))
@
Note that fill colors and names for the legend are obtained using the \verb@attr@ function on the \verb@q5c@ object.  The function retrieves the table attribute of the object.  The result is shown in Fig.~\ref{fig:lmi:map}.  Colors indicate the wind speed in five classes as described in Fig.~\ref{fig:lmi:cdf}.
\begin{figure}
\centering
<<lifetimemaxmap, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.2>>=
par(las=1, mgp=c(2, .4, 0), tcl=-.3)
plot(LMI.df$lon, LMI.df$lat, xlab="Longitude [$^\\circ$E]",
  ylab="Latitude [$^\\circ$N]", type="n")
map("world", add=TRUE, col="gray", fill=TRUE)
grid()
points(LMI.df$lon, LMI.df$lat, col=q5c, pch=19)
points(LMI.df$lon, LMI.df$lat)
axis(3)
axis(4)
legend("bottomright", bg="white", fill=attr(q5c, "palette"), cex=.7,
  legend=names(attr(q5c, "table")),
  title="Wind speed [m s$^{-1}$]")
@
\vspace{-.5cm}
\caption{Location of lifetime maximum wind speed.}
\label{fig:lmi:map}
\end{figure}

The spatial distribution of lifetime maxima is fairly uniform over the ocean for locations west of the $-$40$^\circ$E longitude.  Fewer events are noted over the eastern Caribbean Sea and southwestern Gulf of Mexico.  Events over the western Caribbean tend to have the highest intensities.  Also, as you might expect, there is a tendency for a hurricane that reaches its lifetime maximum at lower latitudes to have a higher intensity.

\subsubsection{Areal data}
\label{subsubsec:arealdata}

A shapefile stores geometry and attribute information for spatial data.  The geometry for a feature is stored as a shape consisting of a set of vector coordinates.  Shapefiles support point, line, and area data.  Area data are represented as closed loop polygons.  Each attribute record has a one-to-one relationship with the associated shape record.  For example, a shapefile might consist of the set of polygons for the counties in Florida and an attribute might be population.  Associated with each county population record (attribute) is an associated shape record.

The shapefile is actually a set of several files in a single directory.  The three individual files with extensions \verb@*.shp@ (file of geometries), \verb@*.shx@ (index file to the geometries), and \verb@*.dbf@ (file for storing attribute data) form the core of the directory.  Note there is no standard for specifying missing attribute values.  The \verb@*.prj@ file, if present, contains the coordinate reference system (CRS; see \S\ref{sec:crs}).

Information in a shapefile format makes it easy to map.  As an example, consider the U.S.~Census Bureau boundary file for the state of Florida \url{http://www.census.gov/cgi-bin/geo/shapefiles/national-files}.   Browse to Current State and Equivalent, Select State, then Florida.  Download the zipped file. Unzip it to your R working directory folder.  To make things a bit easier for typing, rename the directory and the shapefiles to \verb@FL@.

The \verb@readShapeSpatial@ function from the {\bf maptools} package reads in the polygon shapefile consisting of the boundaries of the 67 Florida counties.
<<read shapefile, echo=TRUE, eval=TRUE>>=
require(maptools)
FLpoly = readShapeSpatial("FL/FL")
class(FLpoly)
@
Note the shapefiles are in directory \verb@FL@ with file names the same as the directory name.  The object \verb@FLpoly@ is a {\tt SpatialPolygonsDataFrame} class.  It extends the class {\tt data.frame} by adding geographic information (see \cite{BivandEtAl2008}).

You can use the \verb@plot@ method to produce a map of the polygon borders.  Of greater interest is a map displaying an attribute of the polygons.  For instance, demographic data at the county level are important for emergency managers.  First read in a table of the percentage change in population over the ten year period 2000 to 2010.
<<read florida population data, echo=TRUE, eval=TRUE>>=
FLPop = read.table("FLPop.txt", header=TRUE)
names(FLPop)
@
Here the table rows are arranged in the order of the polygons.  You assign the column {\tt Change} to the data slot of the spatial data frame by typing
<<add column to data slot, echo=TRUE, eval=TRUE>>=
FLpoly$Change = FLPop$Change
@
Then use the function\verb@spplot@ to create a choropleth map of the attribute {\tt Change}.
<<spplot function, fig=FALSE, echo=TRUE, eval=FALSE>>=
spplot(FLpoly, "Change")
@
Results are shown in Fig.~\ref{fig:FLpopchange}.  The map shows that with the exception of Monroe and Pinellas counties population throughout the state increased over this period.  Largest population increases are noted over portions of north Florida.
\begin{figure}
\centering
<<flpopulationchange, fig=TRUE, echo=FALSE, eval=TRUE, width=4, height=3>>=
al = colorRampPalette(c("yellow", "green", "blue"), space="Lab")
mar = .3
print(spplot(FLpoly, "Change", col.regions=al(6), at=seq(-20, 100, 20),
  colorkey=list(space="bottom", cex=.7, labels=paste(seq(-20, 100, 20))),
  sub=list("Population [\\% change]", cex=.8, font=1),
  xlim=c(FLpoly@bbox[1] - mar,
  FLpoly@bbox[3] + mar), ylim=c(FLpoly@bbox[2] - mar, FLpoly@bbox[4] + mar)))
@
\vspace{0cm}
\caption{Population change in Florida counties.}
\label{fig:FLpopchange}
\end{figure}

The \verb@spplot@ method is available in the {\bf sp} package.  It is an example of a lattice plot method \citep{Sarkar2008} for spatial data with attributes.  The function returns a plot of class {\tt trellis}.  If the function does not automatically bring up your graphics device you need to wrap it in the \verb@print@ function.  Missing values in the attributes are not allowed.

\subsubsection{Field data}
\label{subsubsec:fielddata}

Climate data are often presented as a grid of values.  For example, NOAA-CIRES 20th Century Reanalysis version 2 provides monthly sea-surface temperature values at latitude-longitude intersections.  A portion of these data are available in the file {\it JulySST2005.txt}.  The data are the SST values on a 2$^\circ$~latitude-longitude grid for the month of July 2005.  The grid is bounded by $-$100 and 10$^\circ$E longitudes and the equator and 70$^\circ$N latitude.

First input the data and convert the column of SST values to a matrix using the \verb@matrix@ function specifying the number of columns as the number of longitudes.  The number of rows is inferred based on the length of the vector.  By default the matrix is filled by columns.  Next create two structured vectors, one of the meridians and other of the parallels using the \verb@seq@ function.  Specify the geographic limits and an interval of 2$^\circ$ in both directions.
<<read sst data, echo=TRUE, eval=TRUE>>=
sst.df = read.table("JulySST2005.txt", header=TRUE)
sst = matrix(sst.df$SST, ncol=36)
lo = seq(-100, 10, 2)
la = seq(0, 70, 2)
@

To create a map of the SST field first choose a set of colors.  Since the values represent temperature you want the colors to go from blue (cool) to red (warm).  R provides a number of color palettes including \verb@rainbow@, \verb@heat.colors@, \verb@cm.colors@, \verb@topo.colors@, \verb@grey.colors@, and \verb@terrain.colors@.  The palettes are functions that generate a sequence of color codes interpolated between two or more colors.  The \verb@cm.colors@ is the default palette in \verb@sp.plot@ and the colors diverge from white to cyan and magenta.

More color options from the website given in \S\ref{subsubsec:pointdata}.  The package {\bf RColorBrewer} provides the palettes described in \cite{BrewerEtAl2003}.  Palettes are available for continuous, diverging, and categorical variables and for choices of print and screen projection.  The {\bf sp} package has the \verb@bpy.colors@ function that produces a range of colors from blue to yellow that work for color and black-and-white print.  You can create your own function using the \verb@colorRampPalette@ function and specifying the colors you want.  Here you save the function as \verb@bwr@ and use a set of three colors.  The number of colors to interpolate is the argument to the \verb@bwr@ function.
<<color ramp palette, echo=TRUE, eval=TRUE>>=
bwr = colorRampPalette(c("blue", "white", "red"))
@

The function \verb@image@ creates a grid of rectangles with colors corresponding to the values in the third argument as specified by the palette and the number of colors set here at 20.  The first two arguments correspond to the two dimensional location of the rectangles.  The $x$ and $y$ labels use the \verb@expression@ and \verb@paste@ functions to get the degree symbol.  To complete the graph you add country boundaries and place axis labels in the top and right margins (margins 3 and 4).
<<image function, fig=FALSE, echo=TRUE, eval=FALSE>>=
image(lo, la, sst, col=bwr(20), xlab=xl, ylab=yl)
map("world", add=TRUE)
axis(3)
axis(4)
@
Note that \verb@image@ interprets the matrix of SST values as a table with the $x$-axis corresponding to the row number and the $y$-axis to the column number, with column one at the bottom.  This is a 90$^\circ$ counter-clockwise rotation of the conventional matrix layout.

You overlay a contour plot of the SST data using the \verb@contour@ function.  First determine the range of the SST values and round to the nearest whole integer.  Note there are missing values (over land) so you need to use the \verb@na.rm@ argument in the \verb@range@ function.
<<sst range, echo=TRUE, eval=TRUE>>=
r = round(range(sst, na.rm=TRUE))
@
Next create a string of temperature values at equal intervals between this range.  Contours are drawn at these values.
<<temperature levels, echo=TRUE, eval=TRUE>>=
levs = seq(r[1], r[2], 2)
levs
@
Then paste the character string `C' onto the interval labels.  The corresponding list will be used as contour labels.
<<contour plot, echo=TRUE, eval=FALSE>>=
cl = paste(levs, "C")
contour(lo, la, sst, levels=levs, labels=cl,
  add=TRUE)
@

The result is shown in Fig.~\ref{fig:SSTJuly2005}.  Ocean temperatures above about 28$^\circ$C are warm enough to support the development of hurricanes.  This covers a large area from the west coast of Africa westward through the Caribbean and Gulf of Mexico and northward toward Bermuda.
\begin{figure}
\centering
<<contourplot, fig=TRUE, echo=FALSE, eval=TRUE, width=4.5, height=3>>=
par(las=1, mgp=c(2, .4, 0), tcl=-.3)
cl = paste(levs, "C")
xl = "Longitude [$^\\circ$E]"
yl = "Latitude [$^\\circ$N]"
image(lo, la, sst, col=bwr(20), xlab=xl, ylab=yl)
axis(3)
axis(4)
map("world", add=TRUE)
contour(lo, la, sst, levels=levs, labels=cl, add=TRUE)
@
\vspace{-.5cm}
\caption{Sea surface temperature field from July 2005.}
\label{fig:SSTJuly2005}
\end{figure}

\section{Coordinate Reference Systems}
\label{sec:crs}

For data covering a large geographic area you need a map with a projected coordinate reference system (CRS).  A geographic CRS includes a model for the shape of the Earth (oblate spheroid) plus latitudes and longitudes.  Longitudes and latitudes can be used to create a two-dimensional coordinate system for plotting hurricane data but this framework is for a sphere rather than a flat map.

A projected CRS is a two-dimensional approximation of the Earth as a flat surface.  It includes a model for the Earth's shape plus a specific geometric model for projecting coordinates to the plane.  The PROJ.4 Cartographic Projections library uses a \verb@tab=value@ representation of a CRS, with a tag and value pair within a single character string and the Geospatial Data Abstraction Library (GDAL) contains code for translating between different CRSs.  Both the PROJ.4 and GDAL libraries are available in the {\bf rgdal} package \citep{KeittEtAl2012}.

Here you specify a geographic CRS and save it in a CRS object called \verb@ll_crs@ (lat-lon coordinate reference system).  At the time of writing, there are no MAC OS X binaries for the {\bf rgdal} package.  Appendix~\ref{app:installfromsource} gives you steps for installing the package from the source code and for getting a binary from the CRAN extras repository.
<<require rgdal and mapdata, echo=TRUE, eval=TRUE>>=
require(rgdal)
require(mapdata)
ll_crs = CRS("+proj=longlat +ellps=WGS84")
@
The only values used autonomously in CRS objects are whether the string is a character \verb@NA@ (missing) for an unknown CRS, and whether it contains the string \verb@longlat@, in which case the CRS is geographic coordinates \citep{BivandEtAl2008}.

There are a number of different tags, always beginning with `+', and separated from the value with `=', using white space to separate the tag/value pairs.  Here you specify the Earth's shape using the World Geodetic System (WGS) 1984, which is the reference coordinate system used by the Global Positioning System to reference the Earth's center of mass.

As an example, you create a \verb@SpatialPoints@ object called \verb@LMI_ll@ by combining the matrix of event coordinates (location of lifetime maximum intensity) in native longitude and latitude degrees with the CRS object defined above.
<<create spatialpoints object, echo=TRUE, eval=TRUE>>=
LMI_mat = cbind(LMI.df$lon, LMI.df$lat)
LMI_ll = SpatialPoints(LMI_mat,
   proj4string=ll_crs)
summary(LMI_ll)
@

Here you are interested in transforming the geographic CRS into a Lambert conformal conic (LCC) planar projection.  The projection superimposes a cone over the sphere of the Earth, with two reference parallels secant to the globe and intersecting it.  The LCC projection is used for aeronautical charts.  In particular it is used by the U.S.~National Hurricane Center (NHC) in their seasonal summary maps.  Other projections, ellipsoids, and datum are available and a list of the various tag options can be generated by typing
<<other projections, echo=TRUE, eval=FALSE>>=
projInfo(type = "proj")
@

Besides the projection tag (lcc) you need to specify the two secant parallels and a meridian.  The NHC summary maps use the parallels 30 and 60$^\circ$N and a meridian of 60$^\circ$W.  First save the CRS as a character string then use the \verb@spTransform@ function to transform the latitude-longitude coordinates to coordinates of a LCC planar projection.
<<specify crs, echo=TRUE, eval=TRUE>>=
lcc_crs = CRS("+proj=lcc +lat_1=60 +lat_2=30
   +lon_0=-60")
LMI_lcc = spTransform(LMI_ll, lcc_crs)
@

This transforms the original set of longitude/latitude event coordinates to a set of projected event coordinates.  But you need to repeat this transformation for each of the map components.  For instance to transform the country borders, first you save them from a call to the \verb@map@ function.  The function includes arguments to specify a longitude/latitude bounding box.  Second, you convert the returned map object to a spatial lines object with the \verb@map2SpatialLines@ function using a geographic CRS.  Finally, you transform the coordinates of the spatial lines object to the LCC coordinates.
<<transform map object to spatiallines, echo=TRUE, eval=TRUE>>=
brd = map('world', xlim=c(-100, 0), ylim=c(5, 50),
   interior=FALSE, plot=FALSE)
brd_ll = map2SpatialLines(brd, proj4string=ll_crs)
brd_lcc = spTransform(brd_ll, lcc_crs)
@

To include longitude/latitude grid lines you need to use the \verb@gridlines@ function on the longitude/latitude borders and then transform them to LCC coordinates.  Similarly to include grid labels you need to convert the locations in longitude/latitude space to LCC space.
<<grid lines, echo=TRUE, eval=TRUE>>=
grd_ll = gridlines(brd_ll)
grd_lcc = spTransform(grd_ll, lcc_crs)
at_ll = gridat(brd_ll)
at_lcc = spTransform(at_ll, lcc_crs)
@

Finally to plot the events on a projected map first plot the grid then add the country borders and event locations.  Use the \verb@text@ function to add the grid labels and include a box around the plot.
<<plot events, fig=FALSE, echo=TRUE, eval=FALSE>>=
plot(grd_lcc, col="grey60", lty="dotted")
plot(brd_lcc, col="grey60", add=TRUE)
plot(LMI_lcc, pch=19, add=TRUE, cex=.7)
text(coordinates(at_lcc), pos=at_lcc$pos, 
   offset=at_lcc$offset-.3, labels=
   parse(text=as.character(at_lcc$labels)),
   cex=.6)
@
The result is shown in Fig.~\ref{fig:lmi:projmap}.  Conformal maps preserve angles and shapes of small figures, but not size.  The size distortion is zero at the two reference latitudes.  This is useful for hurricane tracking maps.
\begin{figure}
\centering
<<lifetimemaxprojectedmap, fig=TRUE, echo=FALSE, eval=TRUE, width=4.7, height=3.6>>=
par(las=1)
labs = c("100$^\\circ$W", "80$^\\circ$W", "60$^\\circ$W", "40$^\\circ$W",
         "20$^\\circ$W", "0$^\\circ$", "10$^\\circ$N", "20$^\\circ$N",
         "30$^\\circ$N", "40$^\\circ$N", "50$^\\circ$N")
plot(grd_lcc, col="grey60", lty="dotted")
plot(brd_lcc, col="grey60", add=TRUE)
plot(LMI_lcc, pch=20, add=TRUE)
text(coordinates(at_lcc), pos=at_lcc$pos, offset=at_lcc$offset-.3,
  labels=labs, cex=.6)
#box()
@
\vspace{-2cm}
\caption{Lifetime maximum intensity events on a Lambert conic conformal map.}
\label{fig:lmi:projmap}
\end{figure}

The \verb@spplot@ method for points, lines, and polygons has advantages over successive calls to \verb@plot@.  You will make use of this in Chapter~\ref{chap:spatialmodels}.

\section{Export}
\label{sec:export}

The {\bf rgdal} package contains drivers (software component plugged in on demand) for reading and writing spatial vector data using the OGR\footnote{Historically, OGR was an abbreviation for `OpenGIS Simple Features Reference Implementation.'  However, since OGR is not fully compliant with the OpenGIS Simple Feature specification and is not approved as a reference implementation, the name was changed to `OGR Simple Features Library.' OGR is the prefix used everywhere in the library source for class names, filenames, etc.} Simple Features Library modeled on the OpenGIS simple features data model supported by the Open Geospatial Consortium, Inc.\textregistered.  If the data have a CRS it will be read or written.  The availability of OGR drivers depends on your computing platform.  To get a list of the drivers available on your machine, type \verb@ogrDrivers()@.

Here you consider two examples.  First export the lifetime maximum intensity events as a Keyhole Markup Language (KML) for overlay using Google Earth\texttrademark, and then export the events as an ESRI\texttrademark shapefile suitable for input into ArcMap\textregistered, the main component of ESRI\texttrademark's Geographic Information System (GIS).

First create a spatial points data frame from the spatial points object.  This is done using the \verb@SpatialPointsDataFrame@ function.  The first argument is the coordinates of the spatial points object.   The underlying CRS for Google Earth\texttrademark is geographical in the WGS84 datum, so you use the \verb@LMI_ll@ object defined above and specify the argument \verb@proj4string@ as the character string \verb@ll_crs@, also defined above.
<<create spatialpointsdataframe, echo=TRUE, eval=TRUE>>=
LMI_sdf = SpatialPointsDataFrame(coordinates(LMI_ll),
   proj4string=ll_crs, data=as(LMI.df, "data.frame")
   [c("WmaxS")])
class(LMI_sdf)
@
The resulting spatial points data frame (\verb@LMI_sdf@) contains a data slot with a single variable {\tt WmaxS} from the {\tt LMI.df} non-spatial data frame, which was specified by the \verb@data@ argument.

To compactly display the structure of the object, type
<<structure of spatialpointsdataframe, echo=TRUE, eval=TRUE, results=hide>>=
str(LMI_sdf, max.level=3)
@
The argument \verb@max.level@ specifies the level of nesting (e.g., lists containing sub lists).  By default all nesting levels are shown and this can produce too much output for spatial objects.  Note there are five slots with names {\tt data}, {\tt coords.nrs}, {\tt coords}, {\tt bbox}, and {\tt proj4string}.  The data slot contains a single variable.

The \verb@writeOGR@ function takes as input the spatial data frame object and the name of the data layer and outputs a file in the working directory of R with a name given by the \verb@dsn@ argument and in a format given by the \verb@driver@ argument.
<<write OGR, echo=TRUE, eval=TRUE>>=
writeOGR(LMI_sdf, layer="WmaxS", dsn="LMI.kml",
  driver="KML", overwrite_layer=TRUE)
@
The resulting file can be viewed in Google Earth\texttrademark with pushpins for event locations.  The pins can be selected revealing the layer values.  You will see how to create an overlay image in Chapter~\ref{chap:spatialmodels}.

You can also export to a shapefile.  Since shapefiles can have arbitrary CRS, first transform your spatial data frame into the Lambert conic conformal used by the NHC.
<<structure of events, echo=TRUE, results=hide>>=
LMI_sdf2 = spTransform(LMI_sdf, lcc_crs)
str(LMI_sdf2, max.level=2)
@
Note that the coordinate values are not longitude and latitude and neither are the dimensions of the bounding box ({\tt bbox} slot).

You export using the driver \verb@ESRI Shapefile@.  The argument \verb@dsn@ is a folder name.
<<write OGR 2, echo=TRUE, eval=TRUE>>=
drv = "ESRI Shapefile"
writeOGR(LMI_sdf2, layer="WmaxS", dsn="WmaxS",
  driver=drv, overwrite_layer=TRUE)
@
The output contains a set of four files in the {\tt Wmax} folder including a \verb@.prj@ file with the fully specified coordinate reference system.  The data can be imported as a layer to ArcMap\textregistered.

\section{Other Graphic Packages}
\label{sec:othergraphicpackages}

R's traditional (standard) graphics offer a nice set of tools for making statistical plots including box plots, histograms, and scatter plots.  These basic plot types can be produced using a single function call.  Yet some plots require a lot of work and even simple changes can sometimes be tedius.  This is particularly true when you want to make a series of related plots for different groups.  Two alternatives are worth mentioning.

\subsection{lattice}

The {\bf lattice} package \citep{Sarkar2008} contains functions for creating trellis graphs for a wide variety of plot types.  A trellis graph displays a variable or the relationship between variables, conditioned on one or more other variables.

In simple usage, lattice functions work like traditional graphics functions.  As an example of a lattice graphic function that produces a density plot of the June NAO values type
<<lattice example, echo=TRUE, eval=FALSE>>=
require(lattice)
densityplot(~ Jun, data=NAO)
@
The function's syntax includes the name of the variable and the name of the data frame.  The variable is preceeded by the tilde symbol.  By default the density plot includes the values as points jittered above the horizontal axis.

The power comes from being able to easily create a series of plots with the same axes (trellis) as you did with the \verb@coplot@ function in \S\ref{subsec:conditionalscatterplot}.  For instance in an exploratory analysis you might want to see if the annual U.S. hurricane count is related to the NAO.  You first create a variable that splits the NAO into four groups.
<<create shingle for the nao, echo=TRUE, eval=FALSE>>=
steer = equal.count(NAO$Jun, number=4, overlap=.1)
@
The grouping variable has class {\tt shingle} and the number of years in each group is the same.  The \verb@overlap@ argument indicates the fraction of overlap used to group the years.  If you want to leave gaps you specify a negative fraction.  You can type \verb@plot(steer)@ to see the range of values for each group.

Next you use the \verb@histogram@ function to plot the percentage of hurricanes by count conditional on your grouping variable.
<<histogram of counts by steer, echo=TRUE, eval=FALSE>>=
histogram(~ US$All | steer, breaks=seq(0, 8))
@
The vertical line indicates the variable to follow is the conditioning variable.  The \verb@breaks@ argument is used on the hurricane counts.  The resulting four-panel graph is arranged from lower left to upper right with increasing values of the grouping variable.  Each panel contains a histogram of U.S. hurricane counts drawn using identical scale for the corresponding range of NAO values.  The relative range is shown above each panel as a strip (shingle).

Lattice functions produce an object of class {\tt trellis} that contains a description of the plot.  The print method for objects of this class does the actual drawing of the plot.  For example, the following code does the same as above.
<<lattice example 2, echo=TRUE, eval=FALSE>>=
dplot = densityplot(~Jun, data=NAO)
print(dplot)
@
But now you can use the \verb@update@ function to modify the plot design.  For example, to add an axis label you type
<<add axis label, echo=TRUE, eval=FALSE>>=
update(dplot, xlab="June NAO (s.d.)")
@
To save the modified plot for additional changes you will need to reassign it.

\subsection{ggplot2}

The {\bf ggplot2} package \citep{Wickham2009} contains plotting functions that are more flexible than the traditional R graphics.  The \verb@gg@  stands for the `Grammar of Graphics,' a theory of how to create a graphics system \citep{Wilkinson2005}. The grammar specifies how a graphic maps data to attributes of geometric objects.  The attributes are things like color, shape, and size and the geometric objects are things like points, lines, bars, and polygons.

The plot is drawn on a specific coordinate system (which can be geographical) and it may contain statistical manipulations of the data.  Faceting can be used to replicate the plot but with different subets of your data.  The application of the grammar provides greater power to help you devise graphics specific to your needs, which can help you better understand your data.  Here we give a few examples to help you get started.

Returning to your October SOI values.  To create a histogram with a bin width of one standard deviation (units of SOI), type
<<histogram qplot, echo=TRUE, eval=FALSE>>=
require(ggplot2)
qplot(Oct, data=SOI, geom="histogram", binwidth=1)
@
The \verb@geom@ argument (short for geometric object) represents what you actually see on the plot with the default geometric objects being points and lines.

Figure~\ref{fig:ggplothistograms} shows histograms of the October SOI for two different bin widths.  Note the default use of grids and a background gray shade.  This can be changed with the \verb@theme_set@ function.
\begin{figure}
\centering
<<ggplothistograms, fig=TRUE, echo=FALSE, eval=TRUE, width=4.5, height=3>>=
require(ggplot2)
require(grid)
p1 = qplot(Oct, data=SOI, geom="histogram", binwidth=1, ylab="Count",
  xlab="October SOI [s.d.]")
p2 = qplot(Oct, data=SOI, geom="histogram", binwidth=.2, ylab="Count",
  xlab="October SOI [s.d.]")
grid.newpage()
pushViewport(viewport(layout=grid.layout(1, 2)))
vplayout = function(x, y)
    viewport(layout.pos.row=x, layout.pos.col=y)
print(p1 + theme_bw() +
  opts(axis.title.x=theme_text(vjust=0),
       axis.title.y=theme_text(size=10, angle=90),
       title="a", plot.title=theme_text(hjust=0, size=11),
       aspect.ratio=1),
  vp=vplayout(1, 1))
print(p2 + theme_bw() +
  opts(axis.title.x=theme_text(vjust=0),
       axis.title.y=theme_text(size=10, angle=90),
       title="b", plot.title=theme_text(hjust=0, size=11),
       aspect.ratio=1),
  vp=vplayout(1, 2))
@
\vspace{-1cm}
\caption{Histograms of October SOI.}
\label{fig:ggplothistograms}
\end{figure}

You create a scatter plot using the same \verb@qplot@ function and in the same way as \verb@plot@.  Here you specify the data with an argument.  The default geometric object is the point.
<<scatter plot ggplot, echo=TRUE, eval=FALSE>>=
qplot(Aug, Sep, data=SOI)
@
You add a smoothing function (an example of a statistical manipulation of your data) by including \verb@smooth@ as a character string in the \verb@geom@ argument.
<<scatter smooth ggplot, echo=TRUE, eval=FALSE>>=
qplot(Aug, Sep, data=SOI, geom=c("point", "smooth"))
@
The default method for smoothing is local regression.  You can change this to a linear regression by specifying \verb@method="lm"@.  Scatter plots with both types of smoothers are shown in Fig.~\ref{fig:scattersmoothggplot}.  The graph on the left uses the default local smoothing and the graph on the right uses a linear regression.
\begin{figure}
\centering
<<scattersmoothggplot, fig=TRUE, echo=FALSE, eval=TRUE, width=4.5, height=3>>=
bestfit1 = geom_smooth(method="lm", color='red')
bestfit2 = geom_smooth(color='red')
p1 = qplot(Aug, Sep, data=subset(SOI, Year >= 1866),
  xlab="August SOI [s.d.]", ylab="September SOI [s.d.]") + bestfit1
p2 = qplot(Aug, Sep, data=subset(SOI, Year >= 1866),
  xlab="August SOI [s.d.]", ylab="September SOI [s.d.]") + bestfit2
grid.newpage()
pushViewport(viewport(layout = grid.layout(1, 2)))
vplayout <- function(x, y)
    viewport(layout.pos.row = x, layout.pos.col = y)
print(p1 + theme_bw() +
  opts(axis.title.x=theme_text(vjust=0),
       axis.title.y=theme_text(size=10, angle=90),
       title="a", plot.title=theme_text(hjust=0, size=11),
       aspect.ratio=1),
  vp=vplayout(1, 1))
print(p2 + theme_bw() +
  opts(axis.title.x=theme_text(vjust=0),
       axis.title.y=theme_text(size=10, angle=90),
       title="b", plot.title=theme_text(hjust=0, size=11),
       aspect.ratio=1),
  vp=vplayout(1, 2))
@
\vspace{-1cm}
\caption{Scatter plots of August and September SOI.}
\label{fig:scattersmoothggplot}
\end{figure}
The \verb@geom@ plots the points and adds a best fit line through them.  The line is drawn by connecting predictions of the best fit model at a set of equally spaced values of the explanatory variable (here August SOI) over the range of data values.  A 95\% confidence band on the predicted values is also included.

Plots are built layer by layer.  Layers are regular R objects and so can be stored as variables.  This makes it easy for you to write clean code with a minimal amount of duplication.  For instance, a set of plots can be enhanced by adding new data as a separate layer.  As an example, here is code to produce the left plot in Fig.~\ref{fig:scattersmoothggplot}.
<<layers, echo=TRUE, eval=FALSE>>=
bestfit = geom_smooth(method="lm", color='red')
pts = qplot(Aug, Sep, data=SOI)
pts + bestfit
@
The \verb@bestfit@ layer is created and saved as a {\bf geom} object and the \verb@pts@ layer is created from the \verb@qplot@ function.  The two layers are added and then rendered to your graphics device in the final line of code.

As a final example of the grammar-of-graphics plot, consider again the NAO time series object you created in \S\ref{sec:timeseries}.  You create a vector of times at which the series was sampled using the \verb@times@ function.  Here you use the line {\bf geom} instead of the default point.
<<time series ggplot, fig=FALSE, echo=TRUE, eval=FALSE>>=
tm = time(nao.ts)
qplot(tm, nao.ts, geom="line")
@
\begin{figure}
\centering
<<timeseriesggplot, fig=TRUE, echo=FALSE, eval=TRUE, width=4, height=2.6>>=
tm = time(nao.ts)
p1 = qplot(tm, nao.ts, geom=c("line"), xlim=c(1850, 2011),
  xlab="Year", ylab="North Atlantic Oscillation [s.d.]") + 
  geom_smooth(stat="smooth", span=.1, col="red") + theme_bw() +
  opts(axis.title.x=theme_text(vjust=0),
       axis.title.y=theme_text(size=10, angle=90))
print(p1)
@
\vspace{-.5cm}
\caption{Time series of the monthly NAO.  The red line is a local smoother.}
\label{fig:timeseriesggplot}
\end{figure}
Results are shown in Fig.~\ref{fig:timeseriesggplot}.  The values fluctuate widely from one month to the next, but there is no long-term trend.  A local regression smoother (\verb@geom_smooth@) using a span of 10\% of the data indicates a tendency for a greater number of negative NAO values since the start of the 21st century.

As with the \verb@plot@ function, the first two arguments to \verb@qplot@ are the ordinate and abscissa data vectors, but you can use the optional argument \verb@data@ to specify column names in a data frame.  The {\bf ggplot} function, which allows greater flexibility, accepts only data frames.  The philosophy is that your data are important, and it is better to be explicit about exactly what is done with it.  The functions in {\bf plyr} and {\bf reshape} packages help you create data frames from other data objects \citep{Teetor2011}.

\subsection{ggmap}

The {\bf ggmap} package \citep{KahleWickham2012} extends the grammar of graphics to maps.  The function \verb@ggmap@ queries the Google Maps server or OpenStreetMap server for a map at a specified location and zoom.  For example to grab a map of Tallahassee, Florida type
<<grab a map of Tallahassee, eval=FALSE, echo=TRUE>>=
require(ggmap)
Tally = ggmap(location = "Tallahassee", zoom=13)
str(Tally)
@
The result is an object of class {\tt ggmap} with a matrix (640 $\times$ 640) of character strings specifing the fill color for each raster.

The level of zoom ranges from 0 for the entire world to 19 for the highest.  The default zoom is 10.  The default map type (\verb@maptype@) is terrain with options for `roadmap', `mobile', `hybrid', and others. To plot the map on your graphics device, type
<<plotmap, eval=FALSE, echo=TRUE>>=
ggmapplot(Tally)
@

To determine a center for your map you can use the \verb@geocode@ function to get a location from Google Maps.  For example to determine the location of Florida State University, type
Geocode a location using google maps.
<<geocode FSU, echo=TRUE, eval=FALSE>>=
geocode("Florida State University")
@

This chapter showed you how to produce graphs and maps with R. A good graphic helps you understand your data and communicate your results.  We began by looking at how to make bar charts, histograms, density plots, scatter plots, and graphs involving time.  We then looked at some utilities for drawing maps and described the types of spatial data.  We showed you how to create different coordinate reference systems and transform between them.  We also showed you how to export your graphs and maps.  We ended by taking a look at two additional graphics systems.  You will become better acquainted with these tools as you work through the book.