Bayesian Models
===============

"Errors using inadequate data are much less than those using no data at all."---Charles Babbage

Space-Time Model
----------------
CHANGE TO INLA

You save your most ambitious model for last.  It draws on your knowledge of frequency models (Chapter~\ref{chap:frequencymodels}), spatial models (Chapter~\ref{chap:spatialmodels}), and Bayesian methods.  Substantial progress has been made in understanding and predicting hurricane activity on the seasonal and longer time scales over the basin as a whole.  Much of this progress comes from statistical models describe in this book.

Some observations for using INLA.

1. Normalize all variables to mean 0 and variance 1. 
2. Do not use constr=F in the f() syntax, seems to have some problems. Separate out covariate (weight) as fixed effect
3. Check the posterior means of the hyper parameters they should be similar to the variance of the mean random effects. You don't want the means to be relatively orders of magnitude larger.
4. Check for outliers in the estimated random effects.
5. When we perform a location shift of a covariate, the equivalent spatial Besag model induces a random intercept, thus always include a Besag random effect with the same neighboorhood structure (i.e. weights=1) when the covariate values are relative such as SST in Celsius. 

Here is some discussion....
For example consider a simple model (where s is a location and t is time)

y[s,t]~dpois(lambda[s,t])
log(lambda[s,t]))=nu[s,t]

Assume we have one covariate, for the sake of argument x[s,t] and that nu[s,t] have a multivariate normal distribution in space. Assume further that the spatial relationships in this distribution can be described using the INLA model form formula=y~x+f(id,x,model="besag",constr=F,...), sometimes it works better to add back a fixed term and a constr=F...

Suppose we divide x[s,t]=x0+x1[s]+x2t[s,t], say where x0 is some intercept x1[s] is some constant for each spatial location (cell) and x2[s,t] the space time residual. Here we have for all spatial locations s,  sum(x2[s,t],over t)=0  and sum(x1[s])=0 

Let us remove the sum to zero constraint for the discussion.

The model is eta[s,t]=b0+b1[s]*x[s,t]+e(s,t), where conditional on the hyperparameters b0 is fixed the intercept and b1 is b1c+b1s[s]*x[s,t] where b1c is fixed (internally) and b1s[s]* has an ICAR spatial distribution with sum(b1s[s])=0). (here lambda[s,t] the Poisson rate= exp(eta[s,t]) , and eta[s,t] is the "structured additive predictor".  Here e(s,t) is the residual or unstructured term (i.i.d. N(0,tau) say.

If you expand out eta[s,t] you get the following:

eta[s,t]=b0+b1[s]*x0+b1*x1[s]+b1*x2[s,t] +e(s,t)

In this expansion we see that you can have several varying terms the first is to note as you change x0, say by using Kelvin instead of Celsius for SST you have a new term b1[s]*x0 which does not appear in the new model. In the linear model when you center the covariates it only affects the intercept, here it affects the random intercept for each region,s. This can be added back into the model, and is transparent to the model only if the model includes a term such as b3[s], where b3[s] is a ICAR model using the same neighborhood structure. It does seem possible to split the term b1  into two separate terms since 1,x1[s],x1[s,t] are orthogonal components. 

We have often used this approach with SST say  SST=SST0+SST[s]+SST[s,t] and SST1[s] +SST2[s,t] is the mean temperature at a given site and SST2 is the local temporal anomaly. The term b1'+b2'[s] (by splitting b1 and b2)

The full model would then be:

Let id be the spatial id and id1=id2

formula=y~(1)+xs+xst+f(id,model="besag")+f(id1,x1,model="besag")+f(id2,x2,model="besag") where the (1) is understood and not needed, and we understand that the model is constrained.
and the only possible partial model would be

formula=y~x+f(id,model="besag")+f(id1,x,model="besag")

Some comments when reading the paper: http://www.math.ntnu.no/inla/r-inla.org/doc/latent/linear.pdf

1. The fixed parameters are themselves "hyperparameters" and in fact they are not estimated in step one, as they are considered hyperparameters. 
2. The random effects (values) are estimated in each step.
3. One can create any posterior of interest to estimate, provided it is given to INLA.
4. INLA does not work with the GPD distribution for xi<0 i.e. most of our work with hurricanes ;(

However, significant gaps remain in our knowledge of what regulates hurricane activity {\it regionally}.  Here your goal is a multilevel (hierarchical) statistical model to better understand and predict regional hurricane activity.  Multilevel models are not new but have gained popularity with the growth of computing power and better software.  However, they have yet to be employed to study hurricane climate.  For a comprehensive modern treatment of statistics for spatio-temporal data see \cite{CressieWikle2011}.  You begin with a set of local regressions.

\subsection{Lattice data}
\label{subsec:spacetimelatticedata}

Here you use the spatial hexagon framework described in Chapter~\ref{chap:spatialmodels} to create a space-time data set consisting of cyclone counts at each hexagon for each year and accompanying covariate climate information.  Some of the covariate information is at the local (hexagon) level and some of it is at the regional level (e.g., climate indices).  Single-level models, including linear and generalized linear models, are commonly used to describe basin-wide cyclone activity.  Multilevel models allow you to model variations in relationships at the individual hexagon level and for individual years.  They are capable of describing and explaining within-basin variations.

Some data organization is needed.  First input the hourly best-track data, the netCDF SST grids, and the annual aggregated counts and climate covariates that you arranged in Chapter~\ref{chap:datasets}.  Specify also the range of years over which you want to model and make a copy of the best-track data frame.
<<load the data sets from chapter six, echo=TRUE, eval=TRUE>>=
load("best.use.RData")
load("ncdataframe.RData")
load("annual.RData")
years = 1886:2009
Wind.df = best.use
@

Next define the hexagon tiling.  Here you follow closely the work flow outlined in Chapter~\ref{chap:spatialmodels}.  Acquire the {\bf sp} package.  Then assign coordinates to the location columns and add geographic projection information as a coordinate reference system.
<<assign coordinates and projection information, echo=TRUE, eval=TRUE>>=
require(sp)
coordinates(Wind.df) = c("lon", "lat")
coordinates(ncdataframe) = c("lon", "lat")
ll = "+proj=longlat +ellps=WGS84"
proj4string(Wind.df) = CRS(ll)
proj4string(ncdataframe) = CRS(ll)
slot(Wind.df, "coords")[1:3, ]
@
With the \verb@spTransform@ function ({\bf rgdal}), you change the geographic CRS to a Lambert conformal conic (LCC) planar projection using the parallels 15 and 45$^\circ$N and a center longitude of 60$^\circ$W.
<<transform to lambert conformal conic, echo=TRUE, eval=TRUE>>=
lcc = "+proj=lcc +lat_1=45 +lat_2=15 +lon_0=-60"
require(rgdal)
Wind.sdf = spTransform(Wind.df, CRS(lcc))
SST.sdf = spTransform(ncdataframe, CRS(lcc))
slot(Wind.sdf, "coords")[1:3, ]
@
The transformation does not rename the old coordinates, but the values are the LCC projected coordinates.  You compare the bounding boxes to make sure that the cyclone data are contained in the SST data.
<<compare bounding boxes, echo=TRUE, eval=TRUE>>=
bbox(Wind.sdf)
bbox(SST.sdf)
@

Next, generate the hexagons.  First sample the hexagon centers using the bounding box from the cyclone data.  Specify the number of centers to be 250 and fix the offset so that the sampler will choose the same set of centers given the number of centers and the bounding box.  Then create a spatial polygons object
<<generate hexagon centers, echo=TRUE, eval=TRUE>>=
hpt = spsample(Wind.sdf, type="hexagonal",
  n=250, bb=bbox(Wind.sdf) * 1.2, offset=c(1, -1))
hpg = HexPoints2SpatialPolygons(hpt)
@
<<number and area of polygons, echo=FALSE, eval=TRUE>>=
np = length(hpg@polygons)
area = hpg@polygons[[1]]@area
@
This results in \Sexpr{np} hexagons each with an area of approximately \Sexpr{round(area*1.0e-6,0)}~km$^2$.

Next overlay the hexagons on the cyclone and SST locations separately.
<<overlay hexagons on the cyclone and SST locations, echo=TRUE, eval=TRUE>>=
Wind.hexid = over(x=Wind.sdf, y=hpg)
SST.hexid = over(x=SST.sdf, y=hpg)
@
This creates a vector containing the hexagon identification number for each hourly cyclone observation.  The length of the vector is the number of hourly observations.  Similarly for the SST data the integer vector has elements indicating in which hexagon the SST value occurs.

You then use the \verb@split@ function to divide the data frame into groups defined by the hexagon number.  The groups are saved as lists.  Each list is a data frame containing only the cyclones occurring in the particular hexagon.  You do this for the cyclone and SST data.
<<split data frame by hexagon number, echo=TRUE, eval=TRUE>>=
Wind.split = split(Wind.sdf@data, Wind.hexid)
SST.split = split(SST.sdf@data, SST.hexid)
@
You find that the first hexagon contains \Sexpr{dim(Wind.split[[1]])[1]} cyclone observations.  To view a selected set of the columns from this data frame, type
<<print cyclone data frame from hexagon one, echo=TRUE, eval=TRUE>>=
Wind.split[[1]][c(1:2, 5:7, 9, 11)]
@
A given hexagon tends to capture more than one cyclone hour.  To view a selected set of SST grid values from the corresponding hexagon, type
<<print sst data frame from hexagon one, echo=TRUE, eval=TRUE>>=
SST.split[[1]][, 1:5]
@
The hexagon contains \Sexpr{dim(SST.split[[1]])[1]} SST grid values and there are \Sexpr{dim(SST.split[[1]])[2]} months as separate columns starting in January 1854 (\verb@Y1854M01@).  Next, reassign names to match those corresponding to the hexagons.
<<assign names to list according to polygon ids, echo=TRUE, eval=TRUE>>=
names(Wind.split) = sapply(hpg@polygons, function(x)
  x@ID)[as.numeric(names(Wind.split))]
names(SST.split) = sapply(hpg@polygons, function(x)
  x@ID)[as.numeric(names(SST.split))]
@

There are hexagon grids with cyclone data over land areas (no SST data) and there are areas over the ocean where no cyclones occur.  Thus you subset each to match hexagons with both cyclones and SST data.
<<subset cyclones and sst data, echo=TRUE, eval=TRUE>>=
Wind.subset = Wind.split[names(Wind.split) %in%
  names(SST.split)]
SST.subset = SST.split[names(SST.split) %in%
  names(Wind.split)]
@
The function \verb@%in%@ returns a logical vector indicating whether there is a match for the names in \verb@Wind.split@ from the set of names in \verb@SST.split@.  The data sets are now aligned. There are \Sexpr{length(SST.subset)} hexagons with both cyclone and SST data.  Note that for the cyclone data, you could subset \verb@best.use@ on \verb@M==FALSE@ to remove cyclone observations over land.

Next, compute the average SST within each hexagon by month and save them as a data frame.
<<hexagon average sst, echo=TRUE, eval=TRUE>>=
SST.mean = data.frame(t(sapply(SST.subset,
  function(x) colMeans(x))))
head(SST.mean)[1:5]
@
The data frame is organized with the hexagon identifier as the row (observation) and consecutive months as the columns (variables).  Thus there are \Sexpr{dim(SST.mean)[1]} rows and \Sexpr{dim(SST.mean)[2]} columns.  Your interest in SST is more narrowly focused on the months of August through October when hurricanes occur.  To generate these values by year, type
<<august through october sst, echo=TRUE, eval=TRUE>>=
SSTYearMonth = strsplit(substring(colnames(SST.mean),
  2), "M")
SSTYear = as.numeric(sapply(SSTYearMonth,
  function(x) x[1]))
SSTMonth = as.numeric(sapply(SSTYearMonth,
  function(x) x[2]))
SSTKeep = which(SSTMonth %in% c(8, 9, 10))
SSTYear = SSTYear[SSTKeep]
@
The vector \verb@SSTKeep@ list the column numbers corresponding to August, September, and October for each year and the vector \verb@SSTYear@ is the set of years for those months.

Next subset \verb@SST.mean@ by \verb@SSTKeep@ and then compute the August-October average for each year.
<<august through october average, echo=TRUE, eval=TRUE>>=
SST.mean.keep = SST.mean[, SSTKeep]
SST.mean.year = sapply(unique(SSTYear), function(x)
  as.vector(rowMeans(SST.mean.keep[,
  which(x==SSTYear)])))
dimnames(SST.mean.year) =
  list(id=rownames(SST.mean.keep),
  Year=paste("Y",unique(SSTYear), sep=""))
SST.mean.year.subset = SST.mean.year[,
  paste("Y", years, sep="")]
SST.pdf = SpatialPolygonsDataFrame(
  hpg[rownames(SST.mean.year.subset)],
  data.frame(SST.mean.year.subset))
@
The data slot in the spatial polygons data frame \verb@SST.pdf@ contains the average SST during the hurricane season for each year.  To list the first few rows and columns, type
<<list sst pdf, echo=TRUE, eval=TRUE>>=
slot(SST.pdf, "data")[1:5, 1:6]
@

You can plot the SST data you did in Chapter~\ref{chap:spatialmodels} using the \verb@spplot@ method.  First obtain the map borders in geographic coordinates and project them using the same CRS as your data.
<<get map borders, echo=TRUE, eval=TRUE>>=
require(maps)
require(maptools)
require(colorRamps)
cl = map("world", xlim=c(-120, 20),
  ylim=c(-10, 70), plot=FALSE)
clp = map2SpatialLines(cl, proj4string=CRS(ll))
clp = spTransform(clp, CRS(lcc))
l2 = list("sp.lines", clp, col="darkgray")
@
Then obtain the color ramps and plot the values from the year 1959, for example.
<<1959 hurricane season sst, echo=TRUE, eval=FALSE>>=
spplot(SST.pdf, "Y1959", col="white",
   col.regions=blue2red(20), pretty=TRUE,
   colorkey=list(space="bottom"),
   sp.layout=list(l2),
   sub="Sea Surface Temperature (C)")
@
Your plot graphically presents how the data are organized.

Next you need to generate a data set of cyclones that correspond to these hexagons in space and time.  First, generate a list of the cyclone's maximum intensity by hexagon using your \verb@get.max@ function and count the number of cyclones per hexagon per year.
<<wind maximum per hexagon by year, echo=TRUE, eval=TRUE>>=
source("getmax.R")
Wind.max = lapply(Wind.subset, function(x)
  get.max(x, maxfield="WmaxS"))
Wind.count = t(sapply(Wind.max, function(x)
  table(factor(subset(x, WmaxS >= 33 &
  Yr %in% years)$Yr, level=years))))
colnames(Wind.count) = paste("Y",
  colnames(Wind.count), sep="")
Wind.count = data.frame(Wind.count)
Wind.pdf = SpatialPolygonsDataFrame(
  hpg[rownames(Wind.count)], Wind.count)
@
As an example, to list the hurricane counts by hexagon for 1959, type
<<hurricane counts in 1959 by hexagon, echo=TRUE, eval=FALSE>>=
slot(Wind.pdf, "data")["Y1959"]
@

\subsection{Local independent regressions}

With your annual hurricane counts and seasonal SST collocated in each polygon you build local (hexagon level) independent Poisson regressions (LIPR).  Here `independent' refers to separate regressions one for each hexagon.  Along with the hexagon-level SST variable, you include the SOI as a covariate.  The SOI varies by year but not by hexagon.  The SOI covariate was organized in Chapter~\ref{chap:datasets} and saved in {\it annual.RData}.

Load the annual climate covariates and subset the SOI and SST columns for the years specified in the previous section.  These are your regional covariates.
<<load annual data from chapter six, echo=TRUE, eval=TRUE>>=
load("annual.RData")
Cov = subset(annual, Year %in% years)[,
  c("soi", "sst", "Year")]
@
Then create a data frame of your hexagon-level SST covariate and the hurricane counts from the data slots in the corresponding spatial polygon data frames.
<<local sst and counts as data frames, echo=TRUE, eval=TRUE>>=
LSST = slot(SST.pdf, "data")
Count = slot(Wind.pdf, "data")
@

Here you build \Sexpr{dim(LSST)[1]} separate regressions, one for each hexagon.  Your annual count, indicating the number of hurricanes whose centers passed through, varies by hexagon and by year and you have local SST and regional SOI as covariates.  Both are averages over the months of August-October.

The model is a Poisson regression with a logarithmic link function.
<<poisson regression models, echo=TRUE, eval=TRUE>>=
lipr = lapply(1:nrow(Count), function(i)
  glm(unlist(Count[i, ]) ~ unlist(LSST[i, ]) +
  Cov$soi + Cov$Year, family="poisson"))
@
The standardized coefficients for both covariates indicating the strength of the relationship with annual counts are saved for each hexagon in the matrix \verb@zvals@ that you turn into a spatial polygons data frame.
<<standardized coefficients, echo=TRUE, eval=TRUE>>=
zvals = t(sapply(lipr, function(x)
  summary(x)$coef[, 3]))
rownames(zvals) = rownames(Count)
colnames(zvals) = c("Intercept", "Local.SST",
  "SOI", "Year")
zvals.pdf = SpatialPolygonsDataFrame(
  hpg[rownames(zvals)], data.frame(zvals))
@

To map the results, first generate a color ramp function, then use the \verb@spplot@ method.
<<map local SST and SOI coefficients, fig=FALSE, echo=TRUE, eval=FALSE>>=
al = colorRampPalette(c("blue","white","red"),
  space="Lab")
spplot(zvals.pdf, c("Local.SST","SOI"),
   col.regions=al(20), col="white",
   names.attr=c("SST", "SOI"),
   at=seq(-5, 5),
   colorkey=list(space="bottom",
   labels=paste(seq(-5, 5))),
   sp.layout=list(l2),
   sub="Standardized Coefficient")
@
\begin{figure}
\centering
<<poissonregressions, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.4>>=
sb = trellis.par.get("strip.background")
sb[["col"]][1] = "white"
trellis.par.set("strip.background", sb)
al = colorRampPalette(c("blue", "white", "red"),
  space="Lab")
p1 = spplot(zvals.pdf, c("Local.SST", "SOI"),
  col="white", col.regions=al(20), at=seq(-5, 5),
  names.attr=c("Local SST", "SOI"),
  colorkey=list(space="bottom",
  labels=paste(seq(-5, 5))), sp.layout=list(l2),
  par.settings=list(fontsize=list(text=10)),
  sub=list("Standardized coefficient", cex=1, font=1))
print(p1)
@
\vspace{0cm}
\caption{Poisson regression coefficients of counts on local SST and SOI.}
\label{fig:poissonregressions}
\end{figure}
The maps shown in Fig.~\ref{fig:poissonregressions} represent \Sexpr{dim(LSST)[1]} independent hypothesis tests, one for each of the hexagons.  In general the probability of a hurricane is higher where the ocean is warm and when SOI is positive (La Ni\~na conditions); a result you would anticipate from your basin-wide models (Chapter~\ref{chap:frequencymodels}).  The SST and SOI effects are strongest over the central and western North Atlantic at low latitudes.  Curiously, the SST effect is muted along much of the eastern coast of the United States and along portions of the Mexican coast northward through Texas.  This statistically explains why, despite warming seas, the U.S. hurricane counts do not show an increase over the past century and a half.

Your model contains the year as a covariate to address changes to occurrence rates over time.  The exponent of the coefficient on the year term is the factor by which the occurrence rates have been changing per year.  The factor is shown for each grid in Fig.~\ref{fig:annualratechange}.
\begin{figure}
\centering
<<annualratechanges, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.3>>=
rcf = cbind(sapply(lipr, function(x)
  exp(summary(x)$coef["Cov$Year", 1])))
rownames(rcf) = rownames(Count)
colnames(rcf) = c("RCF")
rcf.pdf = SpatialPolygonsDataFrame(
  hpg[rownames(rcf)], data.frame(rcf))
al = colorRampPalette(c("blue", "white", "red"), space="Lab")
p1 = spplot(rcf.pdf, c("RCF"), col="white",
   col.regions=al(20),
   at=seq(.98, 1.02, .005),
   colorkey=list(space="bottom",
   labels=paste(seq(.98, 1.02, .005))),
   sp.layout=list(l2),
   par.settings=list(fontsize=list(text=10)),
   sub=list("Hurricane rate change factor [/year]", cex=1, font=1))
print(p1)
@
\vspace{0cm}
\caption{Factor by which hurricane rates have changed per year.}
\label{fig:annualratechange}
\end{figure}
The factors range between 0.98 and 1.02 annually depending on location.  A factor of ne indicates no change, less than one a decreasing trend, and greater than one an increasing trend.  Since the data covers the period beginning in 1886, the increasing trends over the central and eastern North Atlantic might be due in part to better surveillance over time.  However, the downward trend over a large part of the Caribbean Sea into the Gulf of Mexico is intriguing.  It might be related to increasing wind shear or continental aerosols.  In fact you can see the downward trend along parts of the U.S. coastline from Louisiana to South Carolina and over New England.

Model residuals, defined as the observed count minus the predicted rate for a given year and hexagon, should also be mapped.  Here you compute and map the residuals for the 2005 hurricane season.  First compute the residuals by typing
<<compute residuals, echo=TRUE, eval=TRUE>>=
preds = t(sapply(lipr, function(x)
  predict(x, type="response")))
rownames(preds) = rownames(Count)
err2005 = Count[, "Y2005"] - preds[, "Y2005"]
err.pdf = SpatialPolygonsDataFrame(
  hpg[names(err2005)], data.frame(err2005))
@
Then select a color ramp function and create a choropleth map with the \verb@spplot@ method.
<<map residuals, fig=FALSE, echo=TRUE, eval=FALSE>>=
al = colorRampPalette(c("blue", "white", "red"),
  space="Lab")
spplot(err.pdf, c("err2005"), col="white",
  col.regions=al(20), at=seq(-5, 5, 1),
  colorkey=list(space="bottom",
  labels=paste(seq(-5, 5, 1))),
  sp.layout=list(l2),
  par.settings=list(fontsize=list(text=10)),
  sub="Observed [count] - Predicted [rate]")
@
\begin{figure}
\centering
<<residuals2005, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.3>>=
al = colorRampPalette(c("blue", "white", "red"), space="Lab")
p1 = spplot(err.pdf, c("err2005"), col="white",
  col.regions=al(20), at=seq(-5, 5, 1), colorkey=list(space="bottom",
  labels=paste(seq(-5, 5, 1))), sp.layout=list(l2),
  par.settings=list(fontsize=list(text=10)),
  sub=list("Observed [count] - predicted [rate]", cex=1, font=1))
print(p1)
@
\vspace{0cm}
\caption{Model residuals for the 2005 hurricane season.}
\label{fig:residuals2005}
\end{figure}
Figure~\ref{fig:residuals2005} shows where the model over (blues) and under (reds) predicts for the 2005 hurricane season.  The model under predicted the large amount of hurricane activity over the western Caribbean and Gulf of Mexico.

\subsection{Spatial autocorrelation}

Note the residuals tend to be spatially correlated.  Residuals in neighboring hexagons tend to be more similar than residuals in hexagons farther away (see Chapter~\ref{chap:spatialmodels}).  To quantify the degree of spatial correlation you create a weights matrix indicating the spatial neighbors for each hexagon.  The {\bf spdep} package \citep{Bivand2011} has functions for creating weights based on contiguity neighbors.  First, you use the \verb@poly2nb@ function ({\bf spdep}) on the spatial polygons data frame to create a contiguity-based neighborhood list object.
<<hexagon neighbors, echo=TRUE, eval=TRUE>>=
require(spdep, quietly=TRUE)
hexnb = poly2nb(err.pdf)
@

The list is ordered by hexagon number starting with the southwest-most hexagon.  It has \Sexpr{length(hexnb[[1]])} neighbors; hexagon numbers \Sexpr{hexnb[[1]][1]}, \Sexpr{hexnb[[1]][2]}, and \Sexpr{hexnb[[1]][3]}.  Hexagon numbers increase to the west and north.  A hexagon has at most six contiguous neighbors.  Hexagons at the borders have fewer neighbors.   A graph of the hexagon connectivity, here defined by the first-order contiguity, is made by typing
<<neighbor plot, echo=TRUE, eval=FALSE>>=
plot(hexnb, coordinates(err.pdf))
plot(err.pdf, add=TRUE)
@
A summary method applied to the neighborhood list (\verb@summary(hexnb)@) reveals the average number of neighbors and the distribution of connectivity among the hexagons.

You turn the neighborhood list object into a \verb@listw@ object using the \verb@nb2listw@ function that adds weights to the neighborhood list.  The \verb@style@ argument determines the weighting scheme.  With the argument value set to \verb@W@, the weights are the inverse of the number of neighbors.
<<weights matrix, echo=TRUE, eval=TRUE>>=
wts = nb2listw(hexnb, style="W")
@

Next you quantify the amount of spatial correlation through the value of Moran's $I$.  This is done using the function \verb@moran@ ({\bf spdep}).  The first argument is the variable of interest followed by the name of the \verb@listw@ object.  Also you need the number of hexagons and the global sum of the weights, which is obtained using the \verb@Szero@ function.
<<morans I, echo=TRUE, eval=TRUE>>=
n = length(err.pdf$err2005)
s = Szero(wts)
mI = moran(err.pdf$err2005, wts, n=n, S0=s)$I
@
The function returns a value for Moran's $I$ of \Sexpr{round(mI,2)}, which indicates spatial autocorrelation in model residuals for 2005.  The expected value of Moran's $I$ under the hypothesis of no spatial autocorrelation is $-1/(n-1)$, where $n$ is the number of hexagons.  This indicates the model can be improved by including a term for the spatial correlation.  Other years have more or less similar autocorrelation, which in and of itself might be a useful diagnostic of hurricane climate variability.

<<morans scatterplot, echo=FALSE, eval=FALSE>>=
y = err.pdf$err2005
Wy = lag.listw(wts, y)
par(las=1, pty="s")
plot(y, Wy, pch=20,
   xlab="Prediction Error",
   ylab="Neighborhood Avg Prediction Error")
abline(lm(Wy ~ y), lwd=2, col="red")
@

\subsection{BUGS data}

Next you build a spatial regression model.  The model is motivated by the fact that the residuals from your non-spatial regression above are spatially correlated and hurricanes are rather infrequent in most hexagons.  You can borrow information from neighboring hexagons.  The model is written in BUGS (see Chapter~\ref{chap:bayesianstatistics}).  Here you will run BUGS (WinBUGS or OpenBUGS) outside of R.

First you convert your neighborhood list object to BUGS format.  Then you gather the hurricane counts and covariates as lists and put them all together in the object you call \verb@BUGSdata@.
<<convert neighborhood list to BUGS format and export, echo=TRUE, eval=TRUE>>=
hexadj = nb2WB(hexnb)
BUGSData = c(list(S=nrow(Wind.count),
  T=ncol(Wind.count), h=as.matrix(Wind.count),
  SST=as.matrix(SST.mean.year.subset),
  SOI=Cov$soi), hexadj)
@

For each hexagon (\verb@S@ of them) and year (\verb@T@ of them) there is a cyclone count (\verb@N@) that indicates the number of hurricanes.  The associated covariates are the SOI and local SST.  The local SST is constructed by averaging the August-October monthly gridded SST over each hexagon for each year.  For each hexagon, the neighborhood lists indicate the neighboring hexagons by ID (adjacency \verb@adj@), the weights for the a neighbors (weights \verb@wts@), and the number of neighbors (number \verb@num@).

Finally you use \verb@writeDatafileR@ from the file {\bf writedatafileR.R} to write an ASCII text representation of it to your working directory.
<<write ascii text file of your winbugs data, echo=TRUE, eval=TRUE>>=
source("writedatafileR.R")
writeDatafileR(BUGSData,"BugsData.txt")
@

\subsection{MCMC output}

Counts in each hexagon for each year are described by a Poisson distribution (\verb@dpois@) with a rate that depends on hexagon and year (\verb@lambda@).  The logarithm of the rate is conditional on local SST and SOI.  The error term (\verb@error@) and the local effect terms include  structured and unstructured components following \cite{BesagEtAl1991}, where the structured component is an intrinsic conditional autoregressive (ICAR) specification.  The adjacency matrix (\verb@adj@), which gives your contiguity neighborhood as defined above, is part of the ICAR specification.

The BUGS code for the model is
\begin{verbatim}
___BUGS code___
model {
for(hx in 1:S) {
  for(yr in 1:T) {
# Poisson likelihood for observed counts
   h[hx, yr] ~ dpois(lambda[hx, yr])
   log(lambda[hx, yr]) <- sst[hx] * SST[hx, yr] +
                          soi[hx] * SOI[yr] +
                          error[hx]
      }
# Error terms
   error[hx] <- u.error[hx] + s.error[hx]
   sst[hx] <- u.sst[hx] + s.sst[hx]
   soi[hx] <- u.soi[hx] + s.soi[hx]
# Unstructured errors
   u.error[hx] ~ dnorm(int, tau)
   u.sst[hx] ~ dnorm(int.sst, tau.sst)
   u.soi[hx] ~ dnorm(int.soi, tau.soi)
   }
# Structured errors
s.error[1:S] ~ car.normal(adj[], weights[],
                          num[], tau.s)
s.sst[1:S] ~ car.normal(adj[], weights[],
                          num[], tau.s.sst)
s.soi[1:S] ~ car.normal(adj[], weights[],
                          num[], tau.s.soi)

# Priors
int ~ dflat()
int.sst ~ dflat()
int.soi ~ dflat()
tau ~ dgamma(.5, .005)
tau.sst ~ dgamma(.5, .005)
tau.soi ~ dgamma(.5, .005)
tau.s ~ dgamma(.5, .005)
tau.s.sst ~ dgamma(.5, .005)
tau.s.soi ~ dgamma(.5, .005)
}
_______________
\end{verbatim}
Again, think of the model as a directed acyclic graph (DAG) as shown in Chapter~\ref{chap:bayesianstatistics}.  Nodes are the parameters and data and the arrows indicate conditional dependency, either stochastic (\verb@~@) or deterministic (\verb@<-@).

Uninformative (flat) priors are used for the intercept terms.  The gamma distribution is used for priors on the precisions.  A shape parameter of 0.5 and a scale of 0.005 translates to a 1\% probability that the standard deviation is less than 0.04 and a 1\% probability that the standard deviation is greater than 8.  It would also be reasonable to combine the models into a multivariate CAR model, with a Wishart prior, say with 4 degrees of freedom and a diagonal of 0.05.

Open BUGS (outside R) and copy the code to a new BUGS document saving it as (\verb@BUGSmodel.odc@).  Open the \verb@BUGSdata.txt@ file and copy the entire file to your BUGS document.  Finally copy the initial values below into the same document.
\begin{verbatim}
___Initial Values___
list(int=0, int.sst=0, int.soi=0, tau=100, tau.sst=100,
  tau.soi=100, tau.s=100, tau.s.sst=100, tau.s.soi=100)
list(int=.5, int.sst=.5, int.soi=.5, tau=100, tau.sst=100,
  tau.soi=100, tau.s=100, tau.s.sst=100, tau.s.soi=100)
____________________
\end{verbatim}
It might be necessary to specify high values for the precisions since you need to have BUGS generate realistic samples of the uninitialized parameters. You use two sets of initial values (two separate \verb@list@ objects) to help monitor and diagnose convergence toward the posterior density.  Each set will result in a separate chain of samples.  Here you specify separate chains using a different value for the intercept terms.

First select Model $>$ Specification to open the specification tool.  Highlight the word \verb@model@ in your document then select \verb@check@ model.  It will tell you that your model is syntactically correct in the lower left corner of the window.  Next, highlight the word \verb@list@ in front of your data list and select \verb@load data@.  Then change the number of chains to two and select \verb@compile@.  It will tell you that the data are loaded and the model is compiled.  Next, highlight the word \verb@list@ in front of your first set of initial values and select \verb@load inits@.  It will tell you this chain contains uninitialized values.  Repeat for your second set of initial values.  It will again tell you about uninitialized values.  Finally, select \verb@gen inits@ to initialize the remaining values (for both chains).

Next you tell BUGS what model parameters (nodes) you want to monitor.  Select Inference $>$ Samples to open the sample monitor tool.  In the \verb@node@ window type \verb@sst@ then select \verb@set@.  These parameters are the coefficients on the SST variable.  Repeat for the coefficients on the SOI variable by typing \verb@soi@ and selecting \verb@set@.  Note that there are \Sexpr{n} SST and SOI coefficients, one of each for each hexagon.

Next select Model $>$ Update to open the update tool.  In the \verb@update@ window type 5000, and in the \verb@refresh@ window, type 10.  Although you are monitoring only the SST and SOI coefficients, each update (iteration) represents new values for all parameters in your model using an MCMC algorithm (see Chapter~\ref{chap:bayesianstatistics}).  Select \verb@update@.  This will take time.  The refresh number indicates the progress in intervals of 10 updates.

After the MCMC has finished updating, you output the monitored parameter values from both chains. In the \verb@node@ window, scroll to \verb@sst@, then select \verb@coda@.  CODA is a suite of functions for analyzing outputs from BUGS software.  Save the samples and index files separately (use the extension \verb@txt@).  The index file provides the order of the samples within the samples file.

\subsection{Convergence and mixing}

Before using your samples for inference, you need to check a few things.  Starting from your initial values, the MCMC algorithm produces a new set of values (the first update sample) for each parameter (node).  In most cases (except in simple models), these new values will not represent the posterior density.  As updating continues, however, the MCMC samples are guaranteed to {\it converge} to a stationary distribution representing samples from your posterior.  It is useful to know the minimum number of updates needed until convergence.  The period before convergence is called `burn-in', where the samples are moving away from the initial set of values towards the posterior distribution.

You can analyze your MCMC samples in BUGS directly, but there are more options in R.  Input the CODA files corresponding to the SST parameter using the \verb@read.coda@ function ({\bf coda}).  The first argument is the name of the samples file and the second is the name of the index file.  Your new objects ({\tt chain1} and {\tt chain2}) have class {\tt mcmc}.
<<read coda and save samples as an mcmc object, echo=TRUE, eval=TRUE>>=
require(coda)
chain1 = read.coda("Chain1.txt",
  "Index.txt", quiet=TRUE)
chain2 = read.coda("Chain2.txt",
  "Index.txt", quiet=TRUE)
@

To plot the MCMC samples of the SST coefficient in the first hexagon (hexagons are ordered from southwest to northeast) from both chains, type
<<plot MCMC samples, fig=FALSE, echo=TRUE, eval=FALSE>>=
traceplot(chain1[, "sst[1]"], ylim=c(-.3, .7))
traceplot(chain2[, "sst[1]"], col="red", add=TRUE)
@
\begin{figure}
\centering
<<traceplots, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.3>>=
par(las=1, mar=c(5, 6, 4, 2) + .1, mgp=c(2, .4, 0), tcl=-.3)
traceplot(chain1[, "sst[1]"], ylim=c(-.3, .7), 
  ylab="$\\beta_{SST}$ [/$^\\circ$C]")
traceplot(chain2[, "sst[1]"], col="red", add=TRUE)
legend("topright", legend=c("Chain 1", "Chain 2"), col=1:2, lty=1, lwd=2, bg="white")
@
\vspace{-.75cm}
\caption{MCMC from a space-time model of cyclone counts in hexagon one.}
\label{fig:traceplots1}
\end{figure}
This generates a graph showing the sequence of values (trace plot) from the first (black) and second (red) chains (Fig.~\ref{fig:traceplots1}).  Values fluctuate from one iteration to the next but tend toward a stable distribution after about 3000 updates.  This tendency toward convergence is even more apparent by comparing the two trace plots.  Initially the values from chain one are quite different from those of chain two but after about 4000 iterations the distributions are visually indistinguishable.  From this analysis you estimate the model needs about 4000 iterations to forget its initial values.

You quantify MCMC convergence with the potential scale reduction factor (PSRF) proposed by \cite{GelmanRubin1992}.  At convergence your two chains started with different initial conditions should represent samples from the same distribution.  You assess this by comparing the mean and variance of each chain to the mean and variance of the {\it combined} chain.  Specifically with two chains, the between-chain variance $B/n$ and pooled within-chain variance $W$ are defined by
\begin{equation}
\frac{B}{n} = \frac{1}{2-1} \sum_{j=1}^2 (\bar s_{j.} - \bar s_{..})^2
\end{equation}
and
\begin{equation}
W = \frac{1}{2(n-1)} \sum_{j=1}^2 \sum_{t=1}^n (s_{j t} - \bar s_{j.})^2
\end{equation}
where $s_{jt}$ is the parameter value of the $t$th sample in the $j$th chain, $\bar s_j$ is the mean of the samples in $j$ and $\bar s_{..}$ is the mean of the combined chains.

By taking the sampling variability of the combined mean into account you get a pooled estimate for the variance:
\begin{equation}
\hat V = \frac{n-1}{n} W + \frac{B}{n}.
\end{equation}
Then an estimate $\hat R$ for PSRF is obtained by dividing the pooled variance with the pooled within-chain variance,
\begin{equation}
\hat R = \frac{\hat V}{W} = \frac{n-1}{n} + \frac{B}{nW}.
\end{equation}
If the chains have not converged, Bayesian credible intervals based on the $t$-distribution are too wide, and have the potential to shrink by the potential scale reduction factor.  Thus PSRF is sometimes called the shrink factor.

A value for $\hat R$ is available for each monitored node using the \verb@gelman.diag@ function ({\bf coda}).  Values substantially above one indicate lack of convergence.  For example, to get the PSRF estimate on the SST parameter for the first hexagon, type
<<gelman rubin convergence diagnostic, echo=TRUE, eval=TRUE>>=
gelman.diag(mcmc.list(chain1[, "sst[1]"],
  chain2[, "sst[1]"]))
@
By default only the second half of the chain is used.  The point estimate indicates near convergence consistent with the evidence in the trace plots.  To see the evolution of $\hat R$ as the number of iterations increase, type
<<gelman rubin plot, echo=TRUE, eval=FALSE>>=
gelman.plot(mcmc.list(chain1[, "sst[1]"],
  chain2[, "sst[1]"]))
@
The plot indicates convergence after about 3000 iterations.
%\begin{figure}
%\centering
<<gelmanrubinstatistic, fig=FALSE, echo=FALSE, eval=FALSE, width=4.6, height=3.3>>=
par(las=1, mar=c(7, 6, 7, 2) + .1, mgp=c(2, .4, 0), tcl=-.3)
source("gelmanplot2.R")
gelman.plot2(mcmc.list(chain1[, "sst[1]"], chain2[, "sst[1]"]),
  ylab="Gelman-Rubin statistic\n shrink factor", xlab="Last iteration in the chain",
  lty=1, lwd=2, col=1:2)
@
%\vspace{-1cm}
%\caption{Gelman-Rubin statistic for the SST coefficient in hexagon one.}
%\label{fig:gelmanrubinstatistic}
%\end{figure}

Convergence does not guarantee that your samples have visited all (or even a large portion) of the posterior density.  The speed with which your samples work their way through the posterior (mixing) depends on how far they advance from one update to the next. The quality of mixing is inversely related to the between-sample correlation decay as a function of sample lag.  Sharp decay of the correlation indicates good mixing.

The autocorrelation function shows the correlation as a function of consecutive sample lag.  Here you use the \verb@acf@ function on the chain to examine the correlation by typing
<<acf MCMC chain, echo=TRUE, eval=FALSE>>=
acf(chain1[, "sst[1]"], ci=0, lag.max=2000)
@
\begin{figure}
\centering
<<acfMCMCchain, fig=TRUE, echo=FALSE, eval=TRUE, width=4.3, height=3.3>>=
par(mfrow=c(1, 2), pty="s", las=1, mgp=c(2, .4, 0), tcl=-.3)
acf(chain1[, "sst[1]"], lag.max=2000, ci=0, xlab="Lag [iteration]",
  main="", ylab="Autocorrelation")
mtext("a", side=3, line=1, adj=0, cex=1.1)
acf(chain1[, "sst[60]"], lag.max=2000, ci=0, xlab="Lag [iteration]",
  main="", ylab="Autocorrelation")
mtext("b", side=3, line=1, adj=0, cex=1.1)
@
\vspace{-1.25cm}
\caption{Autocorrelation of the SST coefficient in hexagon (a) one and (b) 60.}
\label{fig:acfMCMCsamples}
\end{figure}
The results for the SST coefficient in hexagon one and hexagon 60 are shown in Fig.~\ref{fig:acfMCMCsamples}.  Depending on the hexagon the decay of the positive correlation drops below after several hundred samples.

You use the effective chain size to quantify the decay.  Because of the between-sample correlation your effective chain size is less than the 5K updates. You use the \verb@effectiveSize@ function ({\bf coda}) to estimate the effective chain size by typing
<<effective sample size, echo=TRUE, eval=TRUE>>=
effectiveSize(chain1[, "sst[1]"])
effectiveSize(chain1[, "sst[60]"])
@
The effective size is an estimate of how many independent samples you have given the amount of autocorrelation.  The slower the decay of the autocorrelation function the lower the effective size.  For chain one the effective size on the SST coefficient is \Sexpr{round(effectiveSize(chain1[, "sst[1]"]),0)} for the first hexagon and \Sexpr{round(effectiveSize(chain1[, "sst[60]"]),0)} for hexagon 40.

These numbers are too small to make reliable inferences, so you need more updates.  Your goal is 1000 independent samples.  Since a conservative estimate of your effective sample size is 25 in 5K updates you need 200K updates to reach your goal.

\subsection{Updates}

First return to BUGS.  Next create a single chain and generate 205K updates.  This time you monitor both the SST and SOI coefficients.  After updating (this will take several hours),\footnote{200K updates took 10.5 hr on a 2 $\times$ 2.66 GHz dual-core Intel Xeon processor running on OS X version 10.6.8} you output every 200th sample over the last 200K updates.  This is done with the sample monitor tool by setting \verb@beg@ equal to 5001 and \verb@thin@ equal to 200.  Save the index files and chain values from both coefficients, then input them into R.
<<read coda again and save samples as an mcmc object, echo=TRUE, eval=TRUE>>=
sstChain = read.coda("SSTChain.txt",
  "SSTIndex.txt", quiet=TRUE)
soiChain = read.coda("SOIChain.txt",
  "SOIIndex.txt", quiet=TRUE)
@

You create a trace plot of your samples for hexagon one by typing
<<plot MCMC samples from the hexagon, echo=TRUE, eval=FALSE>>=
traceplot(sstChain[, "sst[1]"], ylim=c(-.1, .3))
@
\begin{figure}
\centering
<<traceplots2, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.6>>=
par(mfrow=c(2, 1), mex=.7, las=1, mgp=c(2.3, .4, 0), tcl=-.3)
traceplot(sstChain[, "sst[1]"], ylim=c(-.1, .3),
  ylab="$\\beta_{SST}$ [/$^\\circ$C]")
mtext("a", side=3, line=1, adj=0, cex=1.1)
traceplot(sstChain[, "sst[60]"], ylim=c(-.1, .3),
  ylab="$\\beta_{SST}$ [/$^\\circ$C]")
mtext("b", side=3, line=1, adj=0, cex=1.1)
@
\vspace{-.5cm}
\caption{MCMC trace of the SST coefficients for hexagon (a) one and (b) 60.}
\label{fig:traceplots2}
\end{figure}
Note the apparent stationarity (see Fig.~\ref{fig:traceplots2}).  Also note that since you saved only every 200th sample, the variation from one value to the next is much higher.

You use these samples to make inferences.  For example, the probability that the SST coefficient for hexagon one is greater than zero is obtained by typing
<<probability of SST coefficient greater than zero, echo=TRUE, eval=TRUE>>=
sum(sstChain[, "sst[1]"]>0)/
  length(sstChain[, "sst[1]"]) * 100
@
You interpret the coefficient of the Poisson regression as a factor increase (or decrease) in the rate of occurrence per unit of the explanatory variable given the other covariates are held constant.  This is a relative risk or the ratio of two probabilities.  The relative risk of hurricanes per $^\circ$C in hexagon one is estimated using the posterior mean as
<<relative risk per degree C, echo=TRUE, eval=TRUE>>=
exp(mean(sstChain[, "sst[1]"]))
@
A relative risk of one indicates no increase or decrease in occurrence probability relative to the long-term average.  So you interpret the value of \Sexpr{round(exp(mean(sstChain[, "sst[1]"])),2)} to mean an \Sexpr{round((exp(mean(sstChain[, "sst[1]"]))-1)*100,0)}\% increase per $^\circ$C and only a \Sexpr{(1-sum(sstChain[, "sst[1]"]>0)/length(sstChain[, "sst[1]"]))*100}\% chance that the relative risk is one (no change).

\subsection{Relative risk maps}

Here you map the relative risk of hurricanes.  This time you estimate it using the posterior median and you do this for all hexagons by typing
<<relative risk per degree C at each hexagon, echo=TRUE, eval=TRUE>>=
RRsst = exp(apply(sstChain, 2, function(x)
  median(x)))
@
Next, create a spatial data frame of the relative risk with row names equal to your hexagon ids.
<<relative risk as a spatial data frame, echo=TRUE, eval=TRUE>>=
RRsst.df = data.frame(RRsst)
rownames(RRsst.df) = rownames(Wind.count)
RRsst.pdf = SpatialPolygonsDataFrame(
  hpg[rownames(RRsst.df)], RRsst.df)
@
Then choose a color ramp and create a choropleth map.
<<choropleth map of relative risk per C, fig=FALSE, echo=TRUE, eval=TRUE>>=
al = colorRampPalette(c("#FEE8C8", "#FDBB84",
  "#E34A33"), space="Lab")
spplot(RRsst.pdf, col="white", col.regions=al(20),
  at=seq(1, 1.25, .05), colorkey=list(space="bottom",
  labels=paste(seq(1, 1.25, .05))),
  sp.layout=list(l2),
  sub="Relative Hurricane Risk [/C]")
@
\begin{figure}
\centering
<<riskchange, fig=TRUE, echo=FALSE, eval=TRUE, width=4.6, height=3.5>>=
require(grid)
p1 = spplot(RRsst.pdf, col="white", col.regions=al(20),
   at=seq(1, 1.25, .05), colorkey=list(space="bottom",
   labels=paste(seq(1, 1.25, .05))),
   sp.layout=list(l2),
   par.settings=list(fontsize=list(text=10)),
   sub=list("Relative risk [/$^\\circ$C]", cex=1, font=1))
RRsoi = exp(apply(soiChain, 2, function(x) median(x)))
RRsoi.df = data.frame(RRsoi)
rownames(RRsoi.df) = rownames(Wind.count)
RRsoi.pdf = SpatialPolygonsDataFrame(hpg[rownames(RRsoi.df)], RRsoi.df)
al2 = colorRampPalette(c("#E7E1EF", "#C994C7", "#DD1C77"), space="Lab")
p2 = spplot(RRsoi.pdf, col="white", col.regions=al2(20), at=seq(1, 1.1, .02),
   colorkey=list(space="bottom", labels=paste(seq(1, 1.1, .02))), sp.layout=list(l2),
   par.settings=list(fontsize=list(text=10)),
   sub=list("Relative risk [/s.d.]", cex=1, font=1))
p1 = update(p1, main=textGrob("a", x=unit(.05, "npc")))
p2 = update(p2, main=textGrob("b", x=unit(.05, "npc")))
print(p1, split=c(1, 1, 2, 1), more=TRUE)
print(p2, split=c(2 ,1, 2, 1), more=FALSE)
@
\vspace{-.75cm}
\caption{Hurricane risk per change in (a) local SST and (b) SOI.}
\label{fig:riskchange}
\end{figure}
Figure~\ref{fig:riskchange} maps the hurricane risk relative to a per $^\circ$C change in local SST and a per s.d. change in SOI.  Hurricane occurrence is most sensitive to rising ocean temperatures over the eastern tropical North Atlantic and less so across the western Caribbean Sea, Gulf of Mexico, and across much of the coast lines of Central America, Mexico, and the United States.  In comparison, hurricane occurrence is most sensitive to ENSO over much of the Caribbean and less so over the central and northeastern North Atlantic.

This chapter demonstrated Bayesian models for hurricane climate research.  We began by showing how to combine information about hurricane counts from the modern and historical cyclone archives to get a baseline estimate of future activity over the next several decades.  We then showed how to create a Bayesian model for seasonal forecasts, where the model is based on an MCMC algorithm that allows you to exploit the older, less reliable cyclone information.  We showed how to create a consensus model for seasonal prediction based on Bayesian model averaging.  The approach circumvents the need to choose a single `best' model.  Finally, we showed how to create a hierarchical model for exploiting the space-time nature of hurricane activity over the years.  Bayesian hierarchical models like this will help us to better understand hurricane climate.