\chapter{Spatial Models} \label{chap:spatialmodels} \SweaveOpts{keep.source=TRUE, pdf=FALSE, prefix.string=Chap09, grdevice=tikz.Swd} <>= options(digits=3, show.signif.stars=FALSE, width=53) rm(list=ls()) require(tikzDevice) source("../SweaveTikZ.R") @ %data files: load("best.use.RData"); read.table("sstJuly2005.txt", T); read.table("FayRain.txt", T) %packages: gstat, maptools, sp, rgdal, maps, colorRamps, spdep, spgwr %source code: NONE %third party: NONE \begin{quote} ``Design is a question of substance, not just form.'' \end{quote} \indent ---Adriano Olivetti\\ In Chapter~\ref{chap:frequencymodels} annual counts are used to create rate models and in Chapter~\ref{chap:intensitymodels} lifetime maximum winds are used to create intensity models. Here we show you how to use cyclone track data together with climate field data to create spatial models. Spatial models make use of location information in data. Geographic coordinates locate the cyclone's center on the surface of the Earth and wind speed provides an attribute. Spatial models make use of cyclone location separate from cyclone attributes. Given a common spatial support, these models accommodate climate data including indices (e.g., North Atlantic Oscillation) and fields (e.g., sea-surface temperature). \section{Track Hexagons} Here we show you how to create a spatial framework for combining cyclone data with spatially continuous climate data. The method tessellates the basin with hexagons and populates them with local cyclone and climate information \citep{ElsnerHodgesJagger2012}. \subsection{Spatial points data frame} In Chapter~\ref{chap:graphsandmaps} you learned how to create a spatial data frame using functions from the {\bf sp} package \citep{BivandEtAl2008}. Let's review. Here you are interested in wind speeds along the entire track for all tropical storms and hurricanes during the 2005 North Atlantic season. You begin by creating a data frame from the {\it best.use.RData} file where you subset on year and wind speed and convert the speed to meters per second. <>= load("best.use.RData") W.df = subset(best.use, Yr==2005 & WmaxS >= 34 & Type=="*") W.df$WmaxS = W.df$WmaxS * .5144 @ The asterisk on for \verb@Type@ indicates tropical cyclone as opposed to a tropical wave or extratropical cyclone. The number of rows in your data frame is the total number of cyclone hours (\Sexpr{nrow(W.df)}) and you save this by typing <>= ch = nrow(W.df) @ Next, assign the \verb@lon@ and \verb@lat@ columns of the data frame as spatial coordinates using the \verb@coordinates@ function ({\bf sp}). Make a copy of your data frame keeping only the spatial coordinates and the wind speed columns. <>= require(sp) W.sdf = W.df[c("lon", "lat", "WmaxS")] coordinates(W.sdf) = c("lon", "lat") str(W.sdf, max.level=2, strict.width="cut") @ The result is a spatial points data frame with five slots. The data slot is a data frame and contains the attributes (here only wind speed). The coordinate (\verb@coord@) slot contains the longitude and latitude columns from the original data frame and the coordinate numbers (here two spatial dimensions) are given in the \verb@coords.nrs@ slot. Note that these two columns are not in the data slot. The bounding box (\verb@bbox@) slot is a matrix containing the maximal extent of the hourly positions as defined by the lower left and upper right longitude/latitude coordinates. A summary of the information in your spatial points data frame is obtained by typing <

>= summary(W.sdf) @ There are \Sexpr{dim(W.sdf)[1]} hourly cyclone hours. The projection (\verb@proj4string@) slot contains an \verb@NA@ character indicating that it has not yet been specified. You give your spatial data frame a coordinate reference system using the \verb@CRS@ function. Here you specify a geographic reference (intersections of parallels and meridians) as a character string in an object called \verb@ll_crs@, then use the \verb@proj4string@ function to generate a CRS object and assign it to your spatial data frame. <>= ll = "+proj=longlat +datum=WGS84" proj4string(W.sdf) = CRS(ll) @ Check this slot by typing <>= slot(W.sdf, "proj4string") @ Next you transform the geographic CRS into a Lambert conformal conic (LCC) planar projection using the parallels 30 and 60$^\circ$N and a center longitude of 60$^\circ$W. First save the reference system as a CRS object then use the \verb@spTransform@ function from the {\bf rgdal} package. <>= lcc = "+proj=lcc +lat_1=60 +lat_2=30 +lon_0=-60" require(rgdal) W.sdf = spTransform(W.sdf, CRS(lcc)) bbox(W.sdf) @ The coordinates are now planar rather than geographical although the coordinate names remain the same (\verb@lon@ and \verb@lat@). Transformation to a plane makes it easy to perform geometric calculations. \subsection{Hexagon tessellation} Next you construct a hexagon tessellation of the cyclone tracks. This is done by situating the points representing the center of each hexagon and then drawing the hexagon boundaries. First you create a set of equally-spaced points in a hexagon grid configuration within the bounding box defined by your spatial points data frame using the \verb@spsample@ function. Here you specify the number of hexagons, but the actual number will vary depending on the bounding box. You expand the bounding box by multiplying each coordinate value by 20\% and use an offset vector to fix the position of the hexagons over the tracks. <>= hpt = spsample(W.sdf, type="hexagonal", n=250, bb=bbox(W.sdf) * 1.2, offset=c(1, -1)) @ The expansion and offset values require a bit of trail-and-error. The methods used in spatial sampling function assume that the geometry has planar coordinates, so your spatial data object should not have geographic coordinates. Next you call the function to convert the points to hexagons. The more points the smaller the area of each hexagon. <>= hpg = HexPoints2SpatialPolygons(hpt) @ The number of polygons generated and the area of each polygon is obtained by typing <>= np = length(hpg@polygons) area = hpg@polygons[[1]]@area np; area @ This results in \Sexpr{np} equal-area hexagons. The length unit of the LCC projection is meters. To convert the area from square meters to square kilometers, multiply by 10$^{-6}$. Thus the area of each hexagon is approximately \Sexpr{round(area*1.0e-6,0)}~km$^2$. \subsection{Overlays} With your hexagons and cyclone locations having the same projection, next you obtain the maximum intensity and number of observations per grid. The function \verb@over@ combines points (or grids) and polygons by performing point-in-polygon operations on all point-polygon combinations. First you obtain a vector containing the hexagon identification number for each hourly cyclone observation by typing <>= hexid = over(x=W.sdf, y=hpg) @ The length of the vector is the number of hourly observations. Not all hexagons have cyclones, so you subset your spatial polygons object keeping only those that do. <>= hpg = hpg[unique(hexid)] @ Next you create a data frame with a single column listing the maximum wind speed over the set of all cyclone observations in each hexagon. <>= int = over(x=hpg, y=W.sdf, fn=max) colnames(int) = c("WmaxS") head(int) @ The result is a data frame with a single column representing the maximum wind speed value over all cyclone observations in each hexagon. The row names are the hexagon numbers prefixed with {\tt ID}. Finally, you combine the spatial polygons with the maximum per hexagon wind speeds into a spatial polygons data frame. <>= hspdf = SpatialPolygonsDataFrame(hpg, int, match.ID = TRUE) @ You plot the hourly storm locations together with an overlay of your hexagons by typing <>= plot(hspdf) plot(W.sdf, pch=20, cex=.3, add=TRUE) @ <>= W.sdf = W.df[c("lon", "lat", "WmaxS")] coordinates(W.sdf) = c("lon", "lat") ll = "+proj=longlat +datum=WGS84" proj4string(W.sdf) = CRS(ll) bbox(W.sdf) lcc = "+proj=lcc +lat_1=60 +lat_2=30 +lon_0=-60" W.sdf = spTransform(W.sdf, CRS(lcc)) bb = matrix(c(-100, -5, -100, 45, -10, 45, -10, -5, -100, -5), byrow=TRUE, ncol=2) bb.sp = SpatialPolygons(list(Polygons(list(Polygon(bb)), "1")), proj4string=CRS(ll)) bb.sp = spTransform(bb.sp, CRS(lcc)) hpt = spsample(bb.sp, type="hexagonal", n=250, offset=c(.5, -.5)) hpg = HexPoints2SpatialPolygons(hpt) hpgk = hpg[sort(unique(over(x=W.sdf, y=hpg)))] int = over(x=hpgk, y=W.sdf, fn=max) colnames(int) = c("WmaxS") hspdf = SpatialPolygonsDataFrame(hpg, int, match.ID = TRUE) plot(hspdf) plot(W.sdf, pch=20, cex=.3, add=TRUE) @ Some hexagons contain many cyclone observations while others contain only one or two. This difference might be important in modeling intensity so you add total cyclone hours as a second attribute. First, replace the data in the spatial points data frame with an index of ones. <>= W.sdf@data = data.frame(num=rep(1, ch)) @ Then perform an overlay of the hexagons on the cyclone locations with the function argument (\verb@fn@) set to \verb@sum@). Check that the sum of the counts over all grids equals the total number of cyclone hours. <>= co = over(x=hspdf, y=W.sdf, fn=sum) sum(co) == ch @ Finally, add the counts to the spatial polygons data frame. Here the rows of \verb@co@ correspond to those in the data slot of \verb@hspdf@ so there is no need to match the polygons IDs. <>= hspdf$count = co[, 1] head(slot(hspdf, "data")) @ Here you list the first six rows of the data frame that correspond to the first six hexagon grids. \subsection{Maps} You now have a spatial polygons data frame with each polygon as an equal-area hexagon on a LCC projection and two attributes (maximum wind speed and cyclone count) in the data slot. Choropleth maps of cyclone attributes are created using the \verb@spplot@ method introduced in Chapter~\ref{chap:graphsandmaps}. Before mapping you create a couple of lists that are used by the \verb@sp.layout@ argument. One specifies the set of hourly locations from the spatial points data frame and assigns values to plot arguments. <>= l1 = list("sp.points", W.sdf, pch=20, col="gray", cex=.3) @ Another specifies the coastlines as a spatial lines object. Some work is needed up front. First require the {\bf maps} and {\bf maptools} packages. The first package contains the \verb@map@ function to generate an object of country borders and the second contains the \verb@map2SpatialLines@ conversion function. The conversion is made to geographic coordinates, which are then transformed to the LCC projection of your above spatial objects. You set geographic coordinate limits on the map domain to limit the amount of conversion and projection calculations. <>= require(maps) require(maptools) cl = map("world", xlim=c(-120, 20), ylim=c(-10, 70), plot=FALSE) clp = map2SpatialLines(cl, proj4string=CRS(ll)) clp = spTransform(clp, CRS(lcc)) l2 = list("sp.lines", clp, col="gray") @ Depicting attribute levels on a map is done using a color ramp. A color ramp creates a character vector of color hex codes. The number of colors is specified by the argument in the color ramp function. Numerous color ramp functions are available in the {\bf colorRamps} package \citep{Keitt2009}. Acquire the package and assign 20 colors to the vector \verb@cr@ using the blue to yellow color ramp. <>= require(colorRamps) cr = blue2yellow(20) @ If the number of colors is less than the number of levels the colors get recycled. A map of the cyclone frequency is made by typing <>= spplot(hspdf, "count", col="white", col.regions=blue2yellow(20), sp.layout=list(l1, l2), colorkey=list(space="bottom"), sub="Cyclone Hours") @ Similarly a map of the highest cyclone intensity is made by typing <>= spplot(hspdf, "WmaxS", col="white", col.regions=blue2red(20), sp.layout=list(l2), colorkey=list(space="bottom"), sub="Highest Intensity (m/s)") @ The results are shown in Fig.~\ref{fig:freqint}. \begin{figure} \centering <>= require(grid) p1 = spplot(hspdf, "count", col="white", col.regions=blue2yellow(20), sp.layout=list(l1, l2), at=seq(0, 140, 20), colorkey=list(space="bottom", labels=list(labels=paste(seq(0, 140, 20)), cex=.8)), sub=list("Cyclone hours", cex=.8, font=1)) p2 = spplot(hspdf, "WmaxS", col="white", col.regions=blue2red(20), sp.layout=list(l2), at=seq(10, 90, 10), colorkey=list(space="bottom", labels=list(labels=paste(seq(10, 90, 10)), cex=.8)), sub=list("Highest intensity [m s$^{-1}$]", cex=.8, font=1)) p1 = update(p1, main=textGrob("a", x=unit(.05, "npc"))) p2 = update(p2, main=textGrob("b", x=unit(.05, "npc"))) print(p1, split=c(1, 1, 1, 2), more=TRUE) print(p2, split=c(1, 2, 1, 2), more=FALSE) @ \vspace{0cm} \caption{Cyclone frequency and intensity. (a) Hours and (b) highest intensity.} \label{fig:freqint} \end{figure} A reas along the southeastern coastline of the United States have the greatest number of cyclone hours while the region from the western Caribbean into ^the Gulf of Mexico has the highest intensities. \section{SST Data} ^^^^W^e^ retur^^^^^^^^^n to these data here. Input ^t^hem by typing <>= sst = read.table("JulySST2005.txt", header=TRUE) head(sst) @ The data are listed in the column labeled {\tt SST}. You treat these values as point data because they have longitudes and latitudes although they are regional averages. At 25$^\circ$N latitude the 2$^\circ$ spacing of the SST values covers^ an area of approximately the s^^ame size as the hexagon grids used in the previous ^^^^^^^^patial data frame. Recall you need to first assign a projection string to the {\tt proj4string} slot. <>= coordinates(sst) = c("lon", "lat") proj4string(sst) = CRS(ll) sst = spTransform(sst, CRS(lcc)) @ To examine the SST data as spatial points, type <>= spplot(sst, "SST", col.regions=rev(heat.colors(20))) @ This produces a plot of the spatial points data frame, where the SST attribute is specified as a character string in the second argument. Your interest is the average SST within each hexagon. Again, use \verb@over@ to combine your SST values with the track polygons setting the argument \verb@fn@ as \verb@mean@. <>= ssta = over(x=hspdf, y=sst, fn=mean) @ The result is a data frame with a single column representing the average over all SST values in each hexagon. The row names are the hexagon numbers prefixed with {\tt ID}, however, hexagons completely over land have missing SST values. Next, you add the average SSTs as an attribute to spatial polygon data frame and then remove hexagons with missing SST values. <>=] hspdf$sst = ssta$SST hspdf = hspdf[!is.na(hspdf$sst),] str(slot(hspdf, "data")) @ Finally you create the July SST map corresponding to where the cyclones occurred during the season by typing <>= spplot(hspdf, "sst", col="white", col.regions=blue2red(20), sp.layout=list(l1, l2), colorkey=list(space="bottom"), sub="Sea Surface Temperature (C)") @ Results are shown in Fig.~\ref{fig:sst}. Ocean temperatures exceed 26$^\circ$C over a wide swath of the North Atlantic extending into the Caribbean Sea and Gulf of Mexico. Coldest waters are noted off the coast of Nova Scotia and Newfoundland. \begin{figure} \centering <>= print(spplot(hspdf, "sst", col="white", col.regions=blue2red(20), sp.layout=list(l1, l2), at=seq(14, 30, 2), colorkey=list(space="bottom", labels=list(cex=.8)), sub=list("Sea surface temperature [$^\\circ$C]", cex=.8, font=1))) @ \vspace{0cm} \caption{Sea-surface temperature in the cyclone hexagons.} \label{fig:sst} \end{figure} \section{SST and Intensity} Analyzing and modeling your SST and cyclone data is easy with your spatial polygons data frame. For instance, the average maximum wind speed in regions where the SST exceeds 28$^\circ$C is obtained by typing <>= mean(hspdf$WmaxS[hspdf$sst > 28]) @ Here you treat your spatial data frame as you would a regular data frame. Continuing you create side-by-side box plots of wind speed conditional on the collocated values of SST (see Chapter~\ref{chap:graphsandmaps}) by typing <>= boxplot(hspdf$WmaxS~hspdf$sst > 28) @ Importantly, the spatial information allows you to analyze relationships on a map. Figure~\ref{fig:grouped} shows the hexagons colored by groups defined by a two-way table of cyclone intensity and SST using above and below median values. The median SST and storm intensity values calculated from your data are \Sexpr{round(median(hspdf$sst),1)}$^\circ$C and \Sexpr{round(median(hspdf$WmaxS),1)}~m~s$^{-1}$, respectively. \begin{figure} \centering <>= mT = median(hspdf$sst); mI = median(hspdf$WmaxS) hspdf$group = ifelse(hspdf$sst >= mT & hspdf$WmaxS >= mI, 4, ifelse(hspdf$sst >= mT & hspdf$WmaxS < mI, 3, ifelse(hspdf$sst <= mT & hspdf$WmaxS >= mI, 2, ifelse(hspdf$sst <= mT & hspdf$WmaxS < mI, 1, 0)))) mycolors = c("blue", "cyan", "magenta", "red") txt = list(c("Low SST, Low intensity", "Low SST, High intensity", "High SST, Low intensity", "High SST, High intensity")) print(spplot(hspdf, "group", col="white", col.regions=mycolors, sp.layout=list(l1, l2), colorkey=F, key=list(space="bottom", rect=list(col=mycolors), text=txt, cex=.7))) @ \vspace{0cm} \caption{SST and cyclone intensity. Groups are based on median values.} \label{fig:grouped} \end{figure} Red hexagons indicate regions of high intensity and relatively high ocean temperature and blue hexagons indicate regions of low intensity and relatively low ocean temperature. More interesting are regions of mismatch. Magenta hexagons show low intensity coupled with relatively high ocean temperature indicating `under-performing' cyclones (cyclones weaker than the thermodynamic potential of their environment). In contrast, cyan hexagons show high intensity coupled with relatively low temperature indicating `over-performing' cyclones (cyclones stronger than the thermodynamic potential of their environment). Maps like these provide insight into the relationship between hurricanes and climate not accessible with basin-level analyzes. Next we show how you how to model these spatial data. We begin with a look at spatial correlation and then use a spatial regression model to quantify the regional change in intensity as a function of SST. \section{Spatial Autocorrelation} Values in neighboring hexagons will tend to be more similar than values in hexagons farther away. Spatial autocorrelation quantifies the degree of similarity across geographic space. It is a bit more complex than the familiar temporal autocorrelation because space is multi-dimensional and multi-directional. \subsection{Moran's I} A measure of spatial autocorrelation is Moran's $I$ \citep{Moran1950} defined as \begin{equation} I = \frac{m}{s} \frac{y^TWy}{y^Ty} \label{eq:moransI} \end{equation} where $m$ is the number of hexagons, $y$ is the vector of values within each hexagon (e.g., cyclone intensities) where the values are deviations from the overall mean, $W$ is a weights matrix, $s$ is the sum over all the weights, and the subscript $T$ indicates the transpose operator. Values of Moran's $I$ range from $-$1 to $+$1 with a value of zero indicating a pattern with no spatial autocorrelation. Although not widely used in climate studies, \cite{deBeursHenebry2008} use it to identify spatially coherent eco-regions and biomes related to the North Atlantic oscillation. To compute Moran's $I$ you need a weights matrix. The weights matrix is square with the number of rows and columns equal to the number of hexagons. Here the weight in row $i$, column $j$ of the matrix is assigned a non-zero value if hexagon $i$ is contiguous with hexagon $j$, otherwise it is assigned a zero. The {\bf spdep} package \citep{Bivand2011} has functions for creating weights based on contiguity (and distance) neighbors. The process requires two steps. First, you use the \verb@poly2nb@ function on your spatial polygons data frame to create a contiguity-based neighborhood list object. <>= require(spdep) hexnb = poly2nb(hspdf) @ A summary of the neighborhood list is obtained by typing <

>= hexnb @ The list is ordered by hexagon number starting with the southwest-most hexagon. It has \Sexpr{length(hexnb[[1]])} neighbors; hexagon numbers \Sexpr{hexnb[[1]][1]} and \Sexpr{hexnb[[1]][2]}. Hexagon numbers increase to the east and north. A hexagon has at most six contiguous neighbors. Hexagons at the borders have fewer neighbors. A graph of the hexagon connectivity, here defined by the first-order contiguity, is made by typing <>= plot(hexnb, coordinates(hspdf)) plot(hspdf, add=TRUE) @ A summary method applied to the neighborhood list (\verb@summary(hexnb)@) reveals the average number of neighbors and the distribution of connectivity among the hexagons. You turn the neighborhood list object into a \verb@listw@ object using the \verb@nb2listw@ function that duplicates the neighborhood list and adds the weights. The \verb@style@ argument determines the weighting scheme. With the argument value set to \verb@W@, the weights are the inverse of the number of neighbors. For instance, the six neighbors of a fully connected hexagon each get a weight of 1/6. <>= wts = nb2listw(hexnb, style="W") summary(wts) @ Now you are ready to compute the value of Moran's $I$. This is done using the \verb@moran@ function. The first argument is the variable of interest followed by the name of the \verb@listw@ object. Also needed is the number of hexagons and the global sum of the weights, which is obtained using the \verb@Szero@ function. <>= m = length(hspdf$WmaxS) s = Szero(wts) moran(hspdf$WmaxS, wts, n=m, S0=s) @ The function returns Moran's $I$ and the sample kurtosis.\footnote{Kurtosis is a measure of the peakedness of the distribution. A normal distribution has a kurtosis of 3.} The value of \Sexpr{round(moran(hspdf$WmaxS,wts,n=m,S0=s)$I,2)} indicates fairly high spatial autocorrelation in cyclone intensity. This is expected since the strength of a cyclone as it moves across the hexagons often does not vary by much and because stronger cyclones tend to occur at lower latitudes. \subsection{Spatial lag variable} Insight into the Moran's $I$ statistic is obtained by noting that it is equivalent to the slope coefficient in a regression model of $Wy$ on $y$, where $Wy$ is called the spatial lag variable (see Eq.~\ref{eq:moransI}). Let \verb@y@ be the set of cyclone intensities in each hexagon, then you create a spatial lag intensity variable using the \verb@lag.listw@ function. <>= y = hspdf$WmaxS Wy = lag.listw(wts, y) @ Thus for each cyclone intensity value in the vector object {\tt y} there is a corresponding value in the vector object {\tt Wy} representing the mean intensity over the neighboring hexagons. The neighbor average does not include the value in {\tt y}, so for a completely connected hexagon the average is taken over the adjoining six neighboring values. A scatter plot of the neighborhood average intensity versus the intensity in each hexagon shows the spatial autocorrelation relationship. The slope of a least-squares regression line through the points is the value of Moran's $I$. Use the code below to create the scatter plot shown in Fig.~\ref{fig:moranscatterplot}. <>= par(las=1, pty="s") plot(y, Wy, pch=20, xlab="Intensity (m/s)", ylab="Neighborhood Avg Intensity (m/s)") abline(lm(Wy ~ y), lwd=2, col="red") @ \begin{figure} \centering <>= par(las=1, pty="s", mgp=c(2, .4, 0), tcl=-.3) plot(y, Wy, pch=20, xlab="Intensity [m s$^{-1}$]", ylab="Neighborhood avg intensity [m s$^{-1}$]") grid() abline(lm(Wy ~ y), lwd=2, col="red") points(y, Wy, pch=20) @ \vspace{-.7cm} \caption{Moran's scatter plot for cyclone intensity.} \label{fig:moranscatterplot} \end{figure} The scatter plot (Moran's scatter plot) suggests the possibility of a significant amount of spatial autocorrelation. High intensity hexagons tend to be surrounded, on average, by other hexagons with high intensity and vice versa as evidenced by the upward slope of the regression line. \subsection{Statistical significance} The expected value of Moran's $I$ under the hypothesis of no spatial autocorrelation is \begin{equation} E(I) = \frac{-1}{n-1} \end{equation} where $n$ is the number of hexagons. This allows you to test the significance of your sample Moran's $I$. The test is available in the \verb@moran.test@ function. The first argument is the vector of intensities and the second is the spatial weights matrix in the weights list form. <>= moran.test(y, wts) @ The output shows the standard deviate computed by taking the difference between the estimated $I$ and it's expected value under the null hypothesis of no autocorrelation. The difference is divided by the square root of the difference between the variance of $I$ and the square of the mean of $I$. The $p$-value is the chance of observing a standard deviate this large or larger assuming there is no spatial autocorrelation. The $p$-value is extremely small leading you to conclude there is significant autocorrelation. The output also includes the value of Moran's $I$ along with its expected value and variance. By default the variance of $I$ is computed by randomizing the intensities across the hexagon tiling. If the values are normally distributed, a version of the test that has more statistical power (higher probability that the test will reject the null hypothesis of no autocorrelation when there is, in fact, autocorrelation) by adding the argument \verb@random=FALSE@. Moran's $I$ and the corresponding significance test are sensitive to the definition of neighbors and to the neighborhood weights, so conclusions should be stated as conditional on your definition of neighborhoods. \section{Spatial Regression Models} Models that take advantage of spatial autocorrelation are called spatial regression models. If significant autocorrelation exists, spatial regression models have parameters that are more stable and statistical tests that are more reliable than non-spatial alternatives. For instance, confidence intervals on a regression slope will have the proper coverage probabilities and prediction errors will be smaller. Autocorrelation is included in a regression model by adding a spatial lag dependent variable or by including a spatially correlated error term \citep{AnselinEtAl2006}. Spatial autocorrelation can also enter a model by allowing the relationship between the response and the explanatory variable to vary across the domain. This is called geographically-weighted regression \citep{BrunsdonEtAl1998, FotheringhamEtAl2000}. Geographically-weighted regression (GWR) allows you to see where an explanatory variable contributes strongly to the relationship and where it contributes weakly. It is similar to a local linear regression. For example, you compare a standard linear regression model of intensity on SST with a local linear regression of the same relationship by typing <>= x = hspdf$sst par(las=1, pty="s") plot(x, y, pch=20, xlab="SST (C)", ylab="Intensity (m/s)") abline(lm(y ~ x), lwd=2) lines(loess.smooth(x, y, span=.85), lwd=2, col="red") @ \begin{figure} \centering <>= x = hspdf$sst par(las=1, pty="s", mgp=c(2, .4, 0), tcl=-.3) plot(x, y, pch=20, xlab="SST [$^\\circ$C]", ylab="Intensity [m s$^{-1}$]") grid() abline(lm(y ~ x), lwd=2) lines(loess.smooth(x, y, span=.85), lwd=2, col="red") points(x, y, pch=20) legend("topleft", legend=c("Linear regression", "Local linear regression"), lwd=2, col=c(1, 2), cex=.8, bg="white") @ \vspace{-.7cm} \caption{Linear and local linear regressions of cyclone intensity on SST.} \label{fig:locallinear} \end{figure} With the local regression the relationship between intensity and SST changes for different values of SST (Fig.~\ref{fig:locallinear}). For lower values of SST the relationship has a gentle slope and with higher values of SST the relationship has a steeper slope. In contrast, the `global' linear regression results in a single moderate slope across all values of SST. Fitting is done locally in the domain of SST. That is for a point $s$ along the SST axis, the fit is made using points in a neighborhood of $s$ weighted by their distance to $s$. The size of the neighborhood is controlled by the \verb@span@ argument. With \verb@span=.85@ the neighborhood includes 85\% of the SST values. With GWR this localization is done in the domain of geographic space. For example, a regression of cyclone intensity on SST is performed using the paired intensity and SST values at each of the \Sexpr{length(y)} hexagons. For each hexagon the weight associated with a paired value in another hexagon is inversely proportional to physical distance between the two hexagons. In this way the relationship between intensity and SST is localized. \subsection{Linear regression} The standard regression model consists of a vector $y$ of response values and a matrix $X$ containing the set of explanatory variables plus a row vector of one's. The relationship is modeled as \begin{equation} y = X\beta+ \varepsilon \end{equation} where $\beta$ is a vector of regression coefficients and $\varepsilon \sim N(0, \sigma^2I)$ is a vector of independent and identically distributed residuals with variance $\sigma^2$. The maximum likelihood estimate of $\beta$ is given by \begin{equation} \hat \beta = (X^TX)^{-1}X^T y. \end{equation} You begin with a linear regression of cyclone intensity on SST across the set of hexagons. Although your interest is the relationship between intensity and SST you know that cyclone intensity within the hexagon will likely also depend on the number of cyclone hours. In general a hexagon with more cyclone hours will have a higher intensity. Thus your regression model includes SST and cyclone hours as explanatory variables in which case the SST coefficient from the regression is an estimate of the effect of SST on intensity after accounting for cyclone occurrence. You use the \verb@lm@ function to create a linear regression model (see Chapter~\ref{chap:classicalstatistics}). The \verb@summary@ method is used to obtain statistical information about the model. <>= model = lm(WmaxS ~ sst + count, data=hspdf) summary(model) @ The formula is specified as \verb@WmaxS ~ sst + count@ to indicate the mean \verb@WmaxS@ is related to \verb@sst@ and \verb@count@. Results show SST and cyclone hours are important in statistically explaining cyclone intensity across the set of hexagons. The parameter on the SST variable is interpreted to mean that for every 1$^\circ$C increase in sea-surface temperature cyclone intensity goes up by \Sexpr{round(model$coefficients[2],2)} m~s$^{-1}$ after accounting for cyclone hours. The model explains \Sexpr{round(summary(model)$r.squared*100,1)}\% of the variation in cyclone intensity. These results represent the average relationship over the basin. \subsection{Geographically-weighted regression} The SST parameter is replaced by a vector of parameters in GWR one for each hexagon. The relationship between the response vector and the explanatory variables is expressed mathematically as \begin{equation} y = X\beta(g)+ \varepsilon \end{equation} where $g$ is a vector of geographic locations, here the set of hexagons with cyclone intensities and \begin{equation} \hat \beta(g) = (X^TWX)^{-1}X^TW y \end{equation} where $W$ is a weights matrix given by \begin{equation} W =\exp(-D^2/h^2) \end{equation} where $D$ is a matrix of pairwise distances between the hexagons and $h$ is the bandwidth. The elements of the weights matrix, $w_{ij}$, are proportional to the influence hexagons $j$ have on hexagons $i$ in determining the relationship between $X$ and $y$. Weights are determined by an inverse-distance function (kernel) so that values in nearby hexagons have greater influence on the local relationship compared with values in hexagons farther away. The bandwidth controls the amount of smoothing. It is chosen as a trade-off between variance and bias. A bandwidth too narrow (steep gradients on the kernel) results in large variations in the parameter estimates (large variance). A bandwidth too wide leads to a large bias as the parameter estimates are influenced by processes that do not represent the local conditions. Functions for selecting a bandwidth and running GWR are available in the {\bf spgwr} package \citep{BivandYu2011}. First require the package and select the bandwidth using the \verb@gwr.sel@ function. The first argument is the model formula as in \verb@lm@ and the second is the data frame. The data frame can be a spatial points or spatial polygons data frame. <