\chapter{Graphs and Maps} \label{chap:graphsandmaps} \SweaveOpts{keep.source=TRUE, pdf=FALSE, prefix.string=Chap05, grdevice=tikz.Swd} <>= options(digits=3, show.signif.stars=FALSE, width=53) rm(list=ls()) require(tikzDevice) source("../SweaveTikZ.R") @ %data files: read.table("SOI.txt", T); read.table("NAO.txt", T); read.table("SST.txt", T); read.table("ATL.txt", T); read.table("H.txt", T); read.table("Ivan.txt", T); read.table("LMI.txt", T); read.table("FLPop.txt", T); read.table("JulySST2005.txt", T) %packages: lubridate, maps, classInt, maptools, rgdal, mapdata, ggplot2 %source code: NONE %third party: NONE \begin{quote} ``The ideal situation occurs when the things that we regard as beautiful are also regarded by other people as useful.'' \end{quote} \indent---Donald Knuth\\ Graphs and maps help you reason about your data. They also help you communicate your results. A good graph gives you the most information in the shortest time, with the least ink in the smallest space \citep{Tufte1997}. In this chapter we show you how to make graphs and maps using R. A good strategy is to follow along with an open session, typing (or copying) the commands as you read. Before you begin make sure you have the following data sets available in your working directory. This is done by typing <>= SOI = read.table("SOI.txt", header=TRUE) NAO = read.table("NAO.txt", header=TRUE) SST = read.table("SST.txt", header=TRUE) A = read.table("ATL.txt", header=TRUE) US = read.table("H.txt", header=TRUE) @ We begin with graphs. Not all the code is shown but all is available on our website. \section{Graphs} \label{sec:graphs} It's easy to make a graph. Here we provide guidance to help you make an informative graph. It is a tutorial on how to create publishable figures from your data. In R you have a few choices. With the standard (base) graphics environment you can produce a variety of plots with fine details. Most of the figures in this book are created using the standard graphics environment. The grid graphics environment is more flexible. It allows you to design complex layouts with nested graphs where scaling is maintained when you resize the figure. The {\bf lattice} and {\bf ggplot2} packages use grid graphics to create specialized graphing functions and methods. The \verb@spplot@ function is an example of a plot method built with grid graphics that we use in this book to create maps of our spatial data. The {\bf ggplot2} package is an implementation of the grammar of graphics combining advantages from the standard and lattice graphic environments. We begin with the base graphics environment. \subsection{Box plot} \label{subsec:boxplot} A box plot is a graph of the five-number summary. When applied to a set of observations, the \verb@summary@ function produces the sample mean along with five other statistics including the minimum, the first quartile value, the median, the third quartile value, and the maximum. The box plot graphs these numbers. This is done using the \verb@boxplot@ function. For example, to create a box plot of your October SOI data, type <>= boxplot(SOI$Oct, ylab="October SOI (s.d.)") @ \begin{figure} \centering <>= par(mfrow=c(1, 2), las=1, pty="s", mgp=c(2, .4, 0), tcl=-.3) boxplot(SOI$Oct, ylab="October SOI [s.d.]") mtext("a", side=3, line=1, adj=0, cex=1.1) boxplot(SOI$Aug, ylab="August SOI [s.d.]") mtext("b", side=3, line=1, adj=0,cex=1.1) f = fivenum(SOI$Aug) text(rep(1.25, 5), f, labels=c("minimum", "lower", "median", "upper", "maximum"), adj=c(0, 0), cex=.5) @ \vspace{-2cm} \caption{Box plot of the October SOI.} \label{fig:boxplot} \end{figure} Figure~\ref{fig:boxplot} shows the results. The line inside the box is the median value. The bottom of the box (lower hinge) is the first quartile value and the top of the box (upper hinge) is the third quartile. The vertical line (whisker) from the top of the box extends to the maximum value and the vertical line from the bottom of the box extends to the minimum value. Hinge values equal the quartiles exactly when there is an odd number of observations. Otherwise hinges are the middle value of the lower (or upper) half of the observations if there is an odd number of observations below the median and are the middle of two values if there is an even number of observations below the median. The \verb@fivenum@ function gives the five numbers used by \verb@boxplot@. The height of the box is essentially the interquartile range (\verb@IQR@) and the range is the distance from the bottom of the lower whisker to the top of the upper whisker. By default the whiskers are drawn as a dashed line extending from the box to the minimum and maximum data values. Convention is to make the length of the whiskers no longer than 1.5 times the height of the box. The outliers, data values larger or smaller than this range, are marked separately with points. Figure~\ref{fig:boxplot} also shows the box plot for the August SOI values. The text identifies the values. Here with the default options we see one data value greater than 1.5 times the interquartile range. In this case the upper whisker extends to the last data value less than 1.5 $\times$ IQR. For example, if you type <>= Q1 = fivenum(SOI$Aug)[2] Q2 = fivenum(SOI$Aug)[3] Q3 = fivenum(SOI$Aug)[4] Q2 + (Q3 - Q1) * 1.5 @ you see one observation greater than \Sexpr{round(Q2+(Q3-Q1)*1.5,1)}. In this case, the upper whisker ends at the next highest observation value less than \Sexpr{round(Q2+(Q3-Q1)*1.5,1)}. Observations above and below the whiskers are considered outliers. You can find the value of the single outlier of the August SOI by typing <>= sort(SOI$Aug) @ and noting the value \Sexpr{boxplot.stats(SOI$Aug)$stats[5]} is the largest observation in the data less than \Sexpr{round(Q2+(Q3-Q1)*1.5,1)}. Your observations are said to be symmetric if the median is near the middle of the box and the length of the two whiskers are about equal. A symmetric set of observations will also have about the same number of high and low outliers. To summarize, 25\% of all your observations are below the lower quartile (below the box), 50\% are below (and above) the median, and 25\% are above the upper quartile. The box contains 50\% of all your data. The upper whisker extends from the upper quartile to the maximum and the lower quartile extends from the lower quartile value to the minimum except if they exceed 1.5 times the interquartile range above the upper or below the lower quartiles. In this case outliers are plotted as points. This outlier option can be turned off by setting the \verb@range@ argument to zero. The box plot is an efficient graphical summary of your data, but it can be designed better. For example, the box sides provide redundant information. By removing the box lines altogether, the same information is available with less ink. Figure~\ref{fig:tuftebox} is series of box plots representing the SOI for each month. The dot represents the median; the ends of the lines towards the dot are the lower and upper quartile, respectively; the ends of the lines towards the bottom and top of the graph are the minimum and maximum values, respectively. \begin{figure} \centering <>= par(las=1, mgp=c(2, .4, 0), tcl=-.3) plot(c(1, 12), c(-5, 5), type="n", xaxt="n", bty="n", xlab="", ylab="SOI [s.d.]") axis(1, at=seq(1, 12, 1), labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"), cex=.5) for(i in 1:12){ points(i, fivenum(SOI[, i+1])[3], pch=19) lines(c(i, i), c(fivenum(SOI[, i+1])[1], fivenum(SOI[, i+1])[2])) lines(c(i, i), c(fivenum(SOI[, i+1])[4], fivenum(SOI[, i+1])[5])) } @ \vspace{-1cm} \caption{Five-number summary of the monthly SOI.} \label{fig:tuftebox} \end{figure} \subsection{Histogram} \label{subsec:histogram} A histogram is a graph of the distribution of your observations. It shows where the values tend to cluster and where they tend to be sparse. The histogram is similar but not identical to a bar plot (see Chapter~\ref{chap:Rtutorial}). The histogram uses bars to indicate frequency (or proportion) in data intervals, whereas a bar plot uses bars to indicate the frequency of data by categories. The \verb@hist@ function creates a histogram. As an example, consider NOAA's annual values of accumulated cyclone energy (ACE) for the North Atlantic and July SOI values. Annual ACE is calculated by squaring the maximum wind speed for each six hour tropical cyclone observation and summing over all cyclones in the season. The values obtained from NOAA (\url{http://www.aoml.noaa.gov/hrd/tcfaq/E11.html}) are expressed in units of knots squared $\times 10^4$. You create the two histograms and plot them side-by-side. First set the plotting parameters with the \verb@par@ function. Details on plotting options are given in \cite{Murrell2005}. After your histogram is plotted the function \verb@rug@ (like a floor carpet) adds tick marks along the horizontal axis at the location of each observation. <>= par(mfrow=c(1, 2), pty="s") hist(A$ACE) rug(A$ACE) hist(SOI$Jul) rug(SOI$Jul) @ \begin{figure} \centering <>= c = .5144^2 par(mfrow=c(1, 2), pty="s", las=1, mgp=c(2, .4, 0), tcl=-.3) hist(A$ACE * c, main="", xlab="ACE [$\\times$10$^4$ m$^2$ s$^{-2}$]") rug(A$ACE * c) mtext("a", side=3, line=1, adj=0, cex=1.1) hist(SOI$Jul, main="", xlab="SOI [s.d.]") rug(SOI$Jul) mtext("b", side=3, line=1, adj=0, cex=1.1) @ \vspace{-1cm} \caption{Histograms of (a) ACE and (b) SOI.} \label{fig:hist} \end{figure} Figure~\ref{fig:hist} shows the result. Here we added an axis label, turned off the default title, and placed text ('a' and 'b') in the figure margins. Plot titles are useful in presentations, but are not needed in publication. The default horizontal axis label is the name of the data vector. The default vertical axis is frequency and is labeled accordingly. The \verb@hist@ function has many options. Default values for these options provide a good starting point, but you might want to make adjustments. Thus it is good to know how the histogram is assembled. First a contiguous collection of disjoint intervals, called bins (or classes), is chosen that cover the range of data values. The default for the number bins is the value $\lceil \log_2(n)+1\rceil $, where $n$ is the sample size and $\lceil \rceil$ indicates the ceiling value (next largest integer). If you type <>= n = length(SOI$Jul) ceiling(log(n, base=2) + 1) @ you can see that adjustments are made to this number so that the cut points correspond to whole number data values. In the case of ACE the adjustment results in 7 bins and in the case of the SOI it results in 11 bins. Thus the computed number of bins is a suggestion that gets modified to make for nice breaks. The bins are contiguous and disjoints so the intervals look like $(a, b]$ or $[a, b)$ where the interval $(a, b]$ means from $a$ to $b$ including $b$ but not $a$. Next, the number of data values in each of the intervals is counted. Finally, a bar is drawn above the interval so that the bar height is the number of data values (frequency). A useful argument to make your histogram more visually understandable is \verb@prob=TRUE@ which allows you to set the bar height to the density, where the sum of the densities times the bar interval width equals one. You conclude that ACE is positively skewed with a relatively few years having very large values. Clearly ACE does not follow a normal distribution. In contrast, the SOI appears quite symmetric with rather short tails as you would expect from a normal distribution. \subsection{Density plot} \label{subsec:densityplot} A histogram outlines the general shape of your data. Usually that is sufficient. You can adjust the number of bins (or bin width) to get more or less detail on the shape. An alternative is a density plot. A density plot captures the distribution shape by essentially smoothing the histogram. Instead of specifying the bin width you specify the amount (and type) of smoothing. There are two steps to produce a density plot. First you need to use the \verb@density@ function to obtain a set of kernel density estimates from your observations. Second you need to plot these estimates, typically by using the \verb@plot@ method. A kernel density is a function that provides an estimate of the average number of values at any location in the space defined by your data. This is illustrated in Fig.~\ref{fig:kernels} where the October SOI values in the period 2005--2010 are indicated as a rug and a kernel density function is shown as the black curve. The height of the function, representing the local density, is a sum of the heights of the individual kernels shown in red. \begin{figure} \centering <>= x = tail(SOI$Oct) xlims = c(-3, 3) bandw = .5 par(las=1, mgp=c(2, .4, 0), tcl=-.3) plot(density(x,bw=bandw), xlim=xlims, type="n", main="", xlab="October SOI [s.d.]") rug(x, lwd=3) d = density(x[1], bw=bandw) lines(d$x, d$y/length(x), xlim=xlims, col="red") d = density(x[2], bw=bandw) lines(d$x, d$y/length(x), xlim=xlims, col="red") d = density(x[3], bw=bandw) lines(d$x, d$y/length(x), xlim=xlims, col="red") d = density(x[4], bw=bandw) lines(d$x, d$y/length(x), xlim=xlims, col="red") d = density(x[5], bw=bandw) lines(d$x, d$y/length(x), xlim=xlims, col="red") d = density(x[6], bw=bandw) lines(d$x, d$y/length(x), xlim=xlims, col="red") lines(density(x, bw=bandw), xlim=xlims, lwd=2) @ \vspace{-.5cm} \caption{Density of October SOI (2005--2010).} \label{fig:kernels} \end{figure} The kernel is a Gaussian (normal) distribution centered at each data value. The width of the kernel, called the bandwidth, controls the amount of smoothing. The bandwidth is the standard deviation of the kernel in the \verb@density@ function. This means the inflection points on the kernel occur one bandwidth away from the data location in units of the data values. Here with the SOI in units of standard deviation the bandwidth equals .5 s.d. A larger bandwidth produces a smoother density plot or a fixed number of observations because the kernels have greater overlap. Figure~\ref{fig:bw} shows the density plot of June NAO values from the period 1851--2010 using bandwidths of .1, .2, .5, and 1. \begin{figure} \centering <>= par(mfrow=c(2, 2), mar=c(5, 5, 4, 2) + .1, mex=.7, las=1, mgp=c(3, .4, 0), tcl=-.3) x = NAO$Jun xlims = range(x) plot(density(x, bw=.1), xlim=xlims, main="", xlab="June NAO [s.d.]", lwd=2) rug(x) mtext("a", side=3, line=1, adj=0, cex=1.1) plot(density(x, bw=.2), xlim=xlims, main="", xlab="June NAO [s.d.]", lwd=2) rug(x) mtext("b", side=3,line=1,adj=0,cex=1.1) plot(density(x, bw=.5), xlim=xlims, main="", xlab="June NAO [s.d.]", lwd=2) rug(x) mtext("c", side=3, line=1, adj=0, cex=1.1) plot(density(x, bw=1), xlim=xlims, main="", xlab="June NAO [s.d.]", lwd=2) rug(x) mtext("d", side=3, line=1, adj=0, cex=1.1) @ \vspace{-.2cm} \caption{Density of June NAO. (a) .1, (b) .2, (c) .5, and (d) 1 s.d. bandwidth.} \label{fig:bw} \end{figure} The smallest bandwidth produces a density plot that is has spikes as it captures the fine scale variability in the distribution of values. As the bandwidth increases the spikes disappear and the density gets smoother. The largest bandwidth produces a smooth symmetric density centered on the value of zero. To create a density plot for the NAO values with a histogram overlay, type <>= d = density(NAO$Jun, bw=.5) plot(d, main="", xlab="June NAO [s.d.]", lwd=2, col="red") hist(NAO$Jun, prob=TRUE, add=TRUE) rug(NAO$Jun) @ \begin{figure} \centering <>= par(las=1, mgp=c(2.3, .4, 0), tcl=-.3) plot(density(NAO$Jun, bw=.5), main="", xlab="June NAO [s.d.]", lwd=2, col="red") hist(NAO$Jun, prob=TRUE, add=TRUE) rug(NAO$Jun) @ \vspace{-.5cm} \caption{Density and histogram of June NAO.} \label{fig:densityhist} \end{figure} The density function takes your vector of data values as input and allows you to specify a bandwidth using the \verb@bw@ argument. Here you are using the vector of June NAO values and a bandwidth of .5 s.d. The bandwidth units are the same as the units of your data, here s.d. for the NAO. The output is saved as a density object, here called \verb@d@. The object is then plotted using the \verb@plot@ method. You turn off the default plot title with the \verb@main=""@ and you specify a label for the values to be plotted below the horizontal axis. You specify the line width as 2 and the line color as red. You then add the histogram over the density line using the \verb@hist@ function. You use the \verb@prob=TRUE@ argument to make the bar height proportional to the density. The \verb@add=TRUE@ argument is needed so that the histogram plots on the same device. One reason for plotting the histogram or density is to see whether your data have a normal distribution. The Q-Q plot provides another way to make this assessment. \subsection{Q-Q plot} \label{subsec:qqplot} A Q-Q plot is a graphical way to compare distributions. It does this by plotting quantile (Q) values of one distribution against the corresponding quantile (Q) values of the other distribution. In the case of assessing whether or not your data are normally distributed, the sample quantiles are plotted on the vertical axis and quantiles from a standard normal distribution are plotted along the horizontal axis. In this case it is called a Q-Q normal plot. That is, the $k$th smallest observation is plotted against the expected value of the $k$th smallest random value from a $N(0, 1)$ sample of size $n$. The pattern of points in the plot is then used to compare your data against a normal distribution. If your data are normally distributed then the points should align along the $y=x$ line. This somewhat complicated procedure is done using the \verb@qqnorm@ function. To make side-by-side Q-Q normal plots for the ACE values and the July SOI values you type <>= par(mfrow=c(1, 2), pty="s") qqnorm(A$ACE) qqline(A$ACE, col="red") qqnorm(SOI$Jul) qqline(SOI$Jul, col="red") @ \begin{figure} \centering <>= c = .5144^2 par(mfrow=c(1, 2), pty="s", las=1, mgp=c(2, .4, 0), tcl=-.3) qqnorm(A$ACE*c, main="", ylab= "Sample quantiles\n (ACE [$\\times$10$^4$ m$^2$s$^{-2}$])", xlab="Theoretical quantiles") qqline(A$ACE*c, col="red") mtext("a", side=3, line=1, adj=0, cex=1.1) qqnorm(SOI$Jul, main="", ylab="Sample quantiles\n (SOI [s.d.])", xlab="Theoretical quantiles") qqline(SOI$Jul, col="red") mtext("b", side=3, line=1, adj=0, cex=1.1) @ \vspace{-.75cm} \caption{Q-Q normal plot of (a) ACE and (b) July SOI.} \label{fig:qqnorm} \end{figure} The plots are shown in Fig.~\ref{fig:qqnorm}. The quantiles are non-decreasing. The $y=x$ line is added to the plot using the \verb@qqline@ function. Additionally we adjust the vertical axis label and turn the default title off. The plots show that July SOI values appear to have a normal distribution while the seasonal ACE does not. For observations that have a positive skew, like the ACE, the pattern of points on a Q-Q normal plot is concave upward. For observations that have a negative skew the pattern of points is concave downward. For values that have a symmetric distribution but with fatter tails than the normal (e.g., the $t$-distribution), the pattern of points resembles an inverse sine function. The Q-Q normal plot is useful in checking the residuals from a regression model. The assumption is that the residuals are independent and identically distributed from a normal distribution centered on zero. In Chapter~\ref{chap:classicalstatistics} you created a multiple linear regression model for August SST using March SST and year as the explanatory variables. To examine the assumption of normally distributed residuals with a Q-Q normal plot type <>= model = lm(Aug ~ Year + Mar, data=SST) qqnorm(model$residuals) qqline(model$residuals, col="red") @ The points align along the $y=x$ axis indicating a normal distribution. \subsection{Scatter plot} \label{subsec:scatterplot} The \verb@plot@ method is used to create a scatter plot. The values of one variable are plotted against the values of the other variable as points in a Cartesian plane (see Chapter~\ref{chap:Rtutorial}). The values named in the first argument are plotted along the horizontal axis. This pairing of two variables is useful in generating and testing hypotheses about a possible relationship. In the context of correlation which variable gets plotted on which axis is irrelevant. Either way, the scatter of points illustrates the amount of correlation. However, in the context of a statistical model, by convention the dependent variable (the variable you are interested in explaining) is plotted on the vertical axis and the explanatory variable is plotted on the horizontal axis. For example, if your interest is whether pre-hurricane season ocean warmth (e.g., June SST) is related to ACE, your model is <>= ace = A$ACE*.5144^2 sst = SST$Jun model = lm(ace ~ sst) @ and you plot ACE on the vertical axis. Since your slope and intercept coefficients from the linear regression model are saved as part of the object \verb@model@, you can first create a scatter plot then use the \verb@abline@ function to add the linear regression line. Here the function extracts the intercept and slope coefficient values from the model object and draws the straight line using the point-intercept formula. Note here you use the model formula syntax (\verb@ace ~ sst@) as the first argument in the \verb@plot@ function. <>= plot(ace ~ sst, ylab=expression( paste("ACE [x", 10^4," ", m^2, s^-2,"]")), xlab=expression(paste("SST [",degree,"C]"))) abline(model, col="red", lwd=2) @ \begin{figure} \centering <>= ci.lwr = predict(model, data.frame(sst=sort(sst)), level=.95, interval="confidence")[, 2] ci.upr = predict(model, data.frame(sst=sort(sst)), level=.95, interval="confidence")[, 3] par(pty="s", las=1, mgp=c(2, .4, 0), tcl=-.3) plot(ace ~ sst, type="n", ylab="ACE [$\\times$10$^4$ m$^2$ s$^{-2}$]", xlab="SST [$^\\circ$C]") grid() xx = c(sort(sst), rev(sort(sst))) yy = c(ci.upr, rev(ci.lwr)) polygon(xx, yy, col="gray", border="gray") points(sst, ace, pch=19, cex=.8) abline(model, col="red", lwd=2) @ \vspace{-.5cm} \caption{Scatter plot and linear regression line of ACE and June SST.} \label{fig:ACEvsJunSST} \end{figure} Figure~\ref{fig:ACEvsJunSST} is the result. The relationship between ACE and SST is summarized by the linear regression model shown by the straight line. The slope of the line indicates that for every 1$^\circ$C increase in SST the average value of ACE increases by \Sexpr{round(coef(model)[2],1)}$\times 10^4$~m$^2$/s$^2$ (type \verb@coef(model[2])@). Since the regression line is based on a sample of data you should display it inside a band of uncertainty. As we saw in Chapter~\ref{chap:classicalstatistics} there are two types of uncertainty bands; a confidence band (narrow) and a prediction band (wide). The confidence band reflects the uncertainty about the line itself, which like the standard error of the mean indicates the precision by which you know the mean. In regression, the mean is not constant but rather a function of the explanatory variable. The 95\% confidence band is shown in Fig.~\ref{fig:ACEvsJunSST}. The width of the band is inverse related to the sample size. In a large sample of data, the confidence band will be narrow reflecting a well-determined line. Note that it's impossible to draw a horizontal line that fits completely within the band. This indicates that there is a significant relationship between ACE and SST. The band is narrowest in the middle which is understood by the fact that the predicted value at the mean SST will be the mean of ACE, whatever the slope, and thus the standard error of the predicted value at this point is the standard error of the mean of ACE. At other values of SST we need to add the variability associated with the estimated slope. This variability is larger for values of SST farther from the mean, which is why the band looks like a bow tie. The prediction band adds another layer of uncertainty; the uncertainty about {\it future} values of ACE. The prediction band captures the majority of the observed points in the scatter plot. Unlike the confidence band, the width of the prediction band depends strongly on the assumption of normally distributed errors with a constant variance across the values of the explanatory variable. \subsection{Conditional scatter plot} \label{subsec:conditionalscatterplot} Separate scatter plots {\it conditional} on the values of a third variable can be quite informative. This is done with the \verb@coplot@ function. The syntax is the same as above except you add the name of the conditioning variable after a vertical bar. For example, as you saw above, there is a positive relationship between ACE and SST. The conditioning plot answers the question; is there a change in the relationship depending on values of the third variable. Here you use August SOI values as the conditioning variable and type <>= soi = SOI$Aug coplot(ace ~ sst | soi, panel=panel.smooth) @ The syntax is read `conditioning plot of ACE versus SST given values of SOI.' The function divides the range of the conditioning variable (SOI) into six intervals with each interval having approximately the same number of years. The range of SOI values in each interval overlaps by 50\%. The conditioning intervals are plotted in the top panel as horizontal bars (shingles). The plot is shown in Fig.~\ref{fig:coplot}. The scatter plots of ACE and SST are arranged in a matrix of panels below the shingles. The panels are arranged from lower left to upper right. The lower left panel corresponds to the lowest range of SOI values (less than about $-$1 s.d.) and the upper right panel corresponds to the highest range of SOI values (greater than about $+$.5 s.d.). Half of the data points in a panel are shared with the panel to the left and half of the data points are shared with the panel to the right. This is indicated by the amount of shingle overlap. \begin{figure} \centering <>= par(las=0, mgp=c(2, .4, 0), tcl=-.3) soi = SOI$Aug n = length(soi) coplot(ace[16:n] ~ sst[16:n] | soi[16:n], panel=panel.smooth, pch=16, ylab="ACE [$\\times$10$^4$ m$^2$ s$^{-2}$]", xlab=c("SST [$^\\circ$C]", "Conditioning variable: SOI [s.d.]")) @ \vspace{0cm} \caption{Scatter plots of ACE and SST conditional on the SOI.} \label{fig:coplot} \end{figure} Results show a positive, nearly linear, relationship between ACE and SST for all ranges of SOI values. However over the SOI range between $-$1.5 and 0 the relationship is nonlinear. ACE is least sensitive to SST when SOI is the most negative (El Ni\~no years) as indicated by the nearly flat line in the lower-left panel. The argument \verb@panel@ adds a local linear curve (red line) through the set of points in each plot. \section{Time series} \label{sec:timeseries} Hurricane data often take the form of a time series. A time series is a sequence of data values measured at successive times and spaced at uniform intervals. You can treat a time series as a vector and use structured data functions (see Chapter~\ref{chap:Rtutorial}) to generate time series. However, additional functions are available for data that are converted to time-series objects. Time-series objects are created using the \verb@ts@ function. You do this with the monthly NAO data frame as follows. First create a matrix of the monthly values skipping the year column in the data frame. Second take the transpose of this matrix (switch the rows with the columns) using the \verb@t@ function and save the matrix is a vector. Finally, create a time-series object, specifying the frequency of values and the start month. Here the first value is from January 1851. <