Bayesian Models

“Errors using inadequate data are much less than those using no data at all.”—Charles Babbage

data files used: read.table(“US.txt”, T); load(“annual.RData”); load(“best.use.RData”); load(“ncdataframe.RData”); read.coda(“Chain1.txt”, “Index.txt”, quiet=TRUE); read.coda(“Chain2.txt”, “Index.txt”, quiet=TRUE); read.coda(“SSTChain.txt”, “SSTIndex.txt”, quiet=TRUE); read.coda(“SOIChain.txt”, “SOIIndex.txt”, quiet=TRUE)

required packages: rjags, fields, BMA, sp, rgdal, maps, maptools, colorRamps, spdep, coda, bootstrap, xtable

source code: source(“bic.glm.R”); source(“imageplot2.R”); source(“imageplot3.R”); source(“prediction.R”); source(“getmax.R”); source(“writedatafileR.R”)

Information about past hurricanes is available from instruments and written accounts. Written accounts are generally less precise than instrumental observations, which tend to become even more precise as technology advances. Here we show you how to build Bayesian models that make use of the available information while accounting for differences in levels of precision.

Long-Range Outlook

You begin with a model for predicting hurricane activity over the next three decades. This climatology model is useful as a benchmark for climate change studies. The methodology is originally presented in Elsner and Bossak (2001) based on the formalism given by Epstein (1985).

Poisson-gamma conjugate

It is useful to consider the arrival of hurricanes on the coast as a stochastic process, where the annual counts are described by a Poisson distribution. The Poisson distribution is a limiting form of the binomial distribution with no upper bound on the number of occurrences and where the parameter \( \lambda \) characterizes the rate process.

Knowledge of \( \lambda \) allows you to make statements about future hurricane frequency. Since the process is stochastic your statements will be given in terms of probabilities.

For example, the probability of \( \hat h \) hurricanes occurring over the next \( T \) years (e.g., 1, 5, 20, etc) is \[ f(\hat h | \lambda, T) = \hbox{exp}(-\lambda T){(\lambda T) ^h \over h!} \hspace{0.2cm} \hbox{for} \hspace{0.2cm} h=0, 1, \ldots, \hspace{0.2cm} \lambda > 0, \hspace{0.2cm} \hbox{and} \hspace{0.2cm} T > 0. \] The hat notation is used to indicate future values.

The parameter \( \lambda \) and statistic \( T \) appear in the formula as a product, which is the mean and variance of the distribution. Knowledge about \( \lambda \) can come from historical written archives and instrumental records. Clearly you want to use as much of this information as possible before inferring something about future activity.

This requires you to treat \( \lambda \) as a parameter that can be any positive real number, rather than as a fixed constant. One form for expressing your judgement about \( \lambda \) is through the gamma distribution. The numbers that are used to estimate \( \lambda \) from a set of data are the time interval \( T' \) and the number of hurricanes \( h' \) that occurred during this interval.

The prime notation indicates prior (here, earlier) information. For instance, observations from the hurricane record since 1851 indicate 15 hurricanes over the first ten years, so \( T'=10 \) and \( h'=15 \). To verify this type

con = ""
H = read.table(con, header = TRUE)
## [1] 15

The gamma distribution of possible future values for \( \lambda \) is given by \[ f(\hat \lambda | h', T') = {T'^{h'}\lambda^{h'-1} \over \Gamma (h')} \hbox{exp}(-\lambda T'), \] with the expected value E(\( \hat \lambda \)) = \( h'/T' \), and the gamma function \( \Gamma (x) \) given by \[ \Gamma (x) = \int^{\infty}_0 t^{x-1} \hbox{e}^{-t} \hbox{d}t. \]

Of importance here is the fact that if the probability density on \( \hat \lambda \) is a gamma distribution, with initial numbers (prior parameters) \( h' \) and \( T' \), and the numbers \( h \) and \( T \) are later observed, then the posterior density of \( \hat \lambda \) is also gamma with parameters \( h+h' \) and \( T+T' \). In other words the gamma density is the conjugate prior for the Poisson rate \( \lambda \).

Prior parameters

This gives you a convenient way to combine earlier, less reliable information with later information. You simply add the prior parameters \( h' \) and \( T' \) to the sample numbers \( h \) and \( T \) to get the posterior parameters.

But how do you estimate the prior parameters? You have actual values but the values could be too low (or too high) due to missing or misclassified information. One way is to use bootstrapping. Bootstrapping is sampling from your sample (resampling) to provide an estimate of the variation about your statistic of interest.

Here you use the bootstrap() function in the bootstrap package (Tibshirani and Leisch 2007) to obtain a confidence interval about \( \lambda \) from data before 1899.

First load the package and save a vector of the counts over the earlier period of record. Then to get a bootstrap sample of the mean use the bootstrap() function on this vector of counts.

early = H$All[H$Year < 1899]
bs = bootstrap(early, theta = mean, nboot = 1000)

To obtain a 90% bootstrapped confidence interval about the mean, type

qbs = quantile(bs$thetastar, prob = c(0.05, 0.95))
##    5%   95% 
## 1.417 2.104

Although you cannot say with certainty what the true hurricane rate was over this early period, you can make a sound judgement that you are 90% confident that the interval contains it. In other words you are willing to admit a 5% chance that the true rate is less than 1.42 and a 5% chance that it is greater than 2.1.

Given this appraisal about the early hurricane rate you need to obtain an estimate of the parameters of the gamma distribution. Said another way, given your 90% confidence interval for the rate what is your best estimate for the number of hurricanes and the length of time over which those hurricanes occurred?

You do this with the optimization function optim(). You start by creating an objective function defined as the absolute value of the difference between gamma quantiles and your target quantiles.

obj = function(x) {
    sum(abs(pgamma(q = qbs, shape = x[1], rate = x[2]) - c(0.05, 0.95)))

You then apply the optimization function starting with reasonable initial values for the gamma parameters given in the par argument and save the solution in the vector theta.

theta = optim(par = c(2, 1), obj)$par

Store these parameters as separate objects by typing

hp = theta[1]
Tp = theta[2]

The above procedure quantifies your judgement about hurricanes prior to the reliable set of counts. It does so terms of the shape and rate parameter of the gamma distribution.

Posterior density

Now you have two distinct pieces of information from which to obtain a posterior distribution for \( \lambda \) (landfall rate). Your prior parameters \( h' \) = 69.7 and \( T' \) = 39.9 from above and your likelihood statistics based on the data over the reliable period of record (1899–2010).

The total number of hurricanes over this reliable period and the record length are

late = H$All[H$Year >= 1899]
h = sum(late)
T = length(late)
## [1] 187
## [1] 112

The posterior parameters are therefore \( h'' = h+h' \) = 256.7 and \( T'' = T+T' \) = 151.9. Note that although the likelihood parameters \( h \) and \( T \) must be integers, the prior parameters can take on any real value depending on your degree of belief.

Since the prior, likelihood, and posterior are all in the same gamma family, you can use the dgamma() function to compute the densities.

par(las = 1, mgp = c(2, 0.4, 0), tcl = -0.3)
curve(dgamma(x, shape = h + hp, rate = T + Tp), from = 1, to = 3, xlab = "Landfall rate [hur/yr]", 
    ylab = "Density", col = 1, lwd = 4)
curve(dgamma(x, shape = h + hp, rate = T + Tp), add = TRUE, col = 1, lwd = 4)
curve(dgamma(x, shape = h, rate = T), add = TRUE, col = 2, lwd = 4)
curve(dgamma(x, shape = hp, rate = Tp), add = TRUE, col = 3, lwd = 4)
legend("topright", c("Prior", "Likelihood", "Posterior"), col = c(3, 2, 1), 
    lwd = c(3, 3, 3), bg = "white")

plot of chunk priorLikePost

The posterior density resembles the likelihood but is shifted in the direction of the prior. It is also narrower. The posterior is a weighted average of the prior and the likelihood where the weights are proportional to the precision. The greater the precision, the more weight it carries in determining the posterior.

The relatively broad density on the prior estimate indicates low precision. Combining the prior and likelihood results in a posterior distribution that represents your best information about \( \lambda \).

Predictive distribution

The information you have about \( \lambda \) is codified in the two parameters \( h'' \) and \( T'' \) of the gamma density. Of practical interest is how to use the information to predict future hurricane activity.

The answer lies in the fact that the predictive density for observing \( \hat h \) hurricanes over the next \( \hat T \) years is a negative binomial distribution, with parameters \( h'' \) and \( {T'' \over \hat T + T''} \) given by \[ f\Bigl(\hat h | h'', {T'' \over \hat T + T''}\Bigr) = {\Gamma(\hat h + h'') \over \Gamma (h'') \hat h!}\Bigl[{T'' \over \hat T + T''}\Bigr]^{h''}\Bigl[{\hat T \over \hat T + T''}\Bigr]^{\hat h}. \]

The mean and variance of the negative binomial are \( \hat T{h'' \over T''} \) and \( \hat T{h'' \over T''}\bigl({\hat T + T'' \over T''}\bigr) \), respectively. Note that the variance of the predictive distribution is larger than it would be if \( \lambda \) were known precisely.

If you are interested in the climatological probability of a hurricane {\it next} year, then \( \hat T \) is one and small compared with \( T'' \) so it makes little difference, but if you're interested in the distribution of hurricane activity over the next 20 years then it is important.

You plot the posterior probabilities and cumulative probabilities for the number of U.S. hurricanes over the next ten years by typing,

Th = 10
m = Th * (h + hp)/(T + Tp)
v = Th * m * (Th + T + Tp)/(T + Tp)
nl = 0:30
hh = dnbinom(nl, mu = m, size = v)
plot(nl, hh, type = "h", xlab = "Number of hurricanes ($H$)", ylab = "Probability of $H$", 
    col = "lightgray", lwd = 4)
par(new = TRUE)
p = pnbinom(nl, mu = m, size = v)
plot(nl, p, type = "l", xaxt = "n", yaxt = "n", xlab = "", ylab = "", lwd = 2)
mtext("Probability $h \\le H$", side = 4, line = 1.6, las = 0)
mtext("a", side = 3, line = 1, adj = 0, cex = 1.1)

plot of chunk plotPosteriorDistribution

plot(c(0, 80), c(0, 1), type = "n", yaxt = "n", ylab = "", xlab = "Number of hurricanes ($H$)", 
    lwd = 2)
mtext("Probability $h > H$", side = 4, line = 1.6, las = 0)
mtext("b", side = 3, line = 1, adj = 0, cex = 1.1)
Th = c(30, 20, 10)
cls = rev(c("black", "gray", "lightgray"))
ey = c(80, 60, 40)
for (i in 1:3) {
    m = Th[i] * (h + hp)/(T + Tp)
    v = Th[i] * m * (Th + T + Tp)/(T + Tp)
    nl = 0:ey[i]
    p = 1 - pnbinom(nl, mu = m, size = v)
    points(nl, p, pch = 16, col = cls[i])

plot of chunk plotPosteriorDistribution

The probabilities of \( H \) hurricanes in ten years are shown as vertical bars and with a scale plotted along the left vertical axis and the probability that the number of hurricanes will be less than or equal to \( H \) is shown as a solid line and with a scale along the right axis.

The probability that the number of hurricanes will exceed \( H \) is shown for a random 10, 20, and 30-year period is shown in the right panel. The expected number of U.S. hurricanes over the next 30 years is 51 of which 18 (not shown) are anticipated to be major hurricanes. These probabilities represent the best estimates of the future baseline hurricane climate.

The above approach is a rational and coherent foundation for incorporating all available information about hurricane occurrences, while accounting for the differences in the precision as it varies over the years. It could be used to account for the influence of climate change by discounting the older information. Records influenced by recent changes can be given more weight than records from earlier decades.

Seasonal Model


Here you create a Bayesian model for predicting annual hurricane counts. The counts are reliable starting in the middle 20th century, especially in the United States. But data records on past hurricanes extend farther back and these earlier records are useful to understand and predict seasonal activity. The logarithm of the annual rate is linearly related to sea-surface temperature (SST) and the Southern Oscillation Index (SOI).

Consensus Model

In choosing one model over another you typically apply a selection procedure on a set of covariates to find the single best' model. You then make predictions with the model as if it had generated the data. This approach, however, ignores the uncertainty in your model selection procedure resulting in you having too much confidence in your predictions.

For example, given a pool of covariates for hurricane activity a stepwise regression procedure is employed to search through hundreds of candidate models (combinations of covariates). The model that provides the best level of skill is chosen. The best model is subsequently subjected to a leave-one-out cross-validation (LOOCV) exercise to obtain an estimate of how well the it will predict future data. This does not result in a cross validation as the procedure for selecting the reduced set of covariates is not itself cross validated. Cross validation is a procedure for assessing how well an algorithm for choosing a particular model (including the predictor selection phase) will do in forecasting the unknown future.

An alternative is to use Bayesian model averaging (BMA). BMA works by assigning a probability to each model (combination of covariates), then averaging over all models weighted by their probability. In this way model uncertainty is included (Raftery et a. 2005). Here we produce a consensus forecast of seasonal hurricane activity. In doing so we show how the approach can facilitate a physical interpretation of your modeled relationships. The presentation follows closely the work of \cite{JaggerElsner2010}.

Bayesian model averaging

Let \( H_i \), \( i=1, \ldots, N \) denote your set of observed hurricane counts by year. Assume that your model has \( k \) covariates, then let \( \mathbf{X} \) be the covariate matrix with components \( X[i,j+1] \), \( i=1, \ldots, N \), \( j=1, \ldots k \) associated with the $i$th observation of the $j$th covariate and with the intercept term \( X[i, 1] =1 \) for all \( i \). Then associated with the intercept and \( k \) covariates are \( k+1 \) parameters \( \beta_j \), \( j=1, \ldots, k+1 \).

You assume the counts are adequately described with a Poisson distribution. The logarithm of the rate is regressed onto the covariates as \[ H_i \sim \hbox{pois}(\lambda_i)\nonumber\\ \log(\lambda_i) = \sum_{j=1}^{k+1}{X}[i,j] \beta_j \nonumber \] This is a generalized linear model (GLM) and the method of maximum likelihoods is used to estimate the parameters. From these parameter estimates and the values of the corresponding covariates you infer \( \lambda \) from the regression equation. The future hurricane count conditional on these covariates has a Poisson distribution with mean of \( \lambda \). Thus your model is probabilistic and the average count represents a single forecast.

A full model is defined as one that uses all \( k \) covariates. However, it is usual that some of the covariates do not contribute much to the model. In classical statistics these are the ones that are not statistically significant. You can choose a reduced model by setting some of the \( k \) parameters to zero.

Thus with \( k \) covariates there are a total of \( m=2^k \) possible models. The important idea behind BMA is that none of the \( m \) models are discarded. Instead a probability is assigned to each model. Predictions are made by a weighted average over predictions from all models where the weights are the model probabilities. Models with greater probability have proportionally more weight in the averaging.

Consider a simple case. You have observations of \( Y \) arising from either one of two possible regression models. Let \( Y_1 = \alpha_1 + \epsilon_1 \) be a constant mean model and \( Y_2 = \alpha_2 + \beta x + \epsilon_2 \) be a simple regression model where \( x \) is a single covariate. The residual terms \( \epsilon_1, \epsilon_2 \) are independent and normally distributed with means of zero and variances of \( \sigma_1^2 \) and \( \sigma_2^2 \), respectively.

Suppose you assign a probability \( p \) that the constant mean model generated the observed data. Then there is a probability \( 1-p \) that the simple regression model generated the data instead. Now with BMA the posterior predictive expectation (mean) of \( Y \) is \( \mu= p\mu_1+(1-p)\mu_2 =p\alpha_1 +(1-p) (\alpha_2+ \beta x) \). This represents a consensus opinion that combines information from both models as opposed to choosing one over the other.

The posterior predictive distribution of \( Y \) given the data is not necessarily normal. Instead it is a mixture of normal distributions with a posterior predicted variance of \( p\sigma_1^2+(1-p)\sigma_2^2 + p(1-p)(\alpha_2+\beta x-\alpha_1)^2 \). This variance under BMA is larger than a simple weighted sum of the individual model variances by an amount \( p(1-p)(\alpha_2+\beta x-\alpha_1)^2 \) that represents the uncertainty associated with model choice. Thus, the predictive distribution under BMA has a larger variance than the predictive distribution of a single model.

Over a set of competing models you need a way to assign a probability to each. You start with a collection of models \( M_i, i=1,\ldots,m \), where each model is a unique description of your data. For example, in the above example, you need to assign a probability to the constant mean model and a probability to the simple regression model under the constraint that the total probability over both models is one.

Now with your data \( D \) and a set of proposed models \( M_i \), you determine the probability of your data given each model [\( P(D | M_i) \)]. You also assign a prior probability to each model [\( P(M_i) \)] representing your belief that each model generated your data. Under the situation in which you are maximally non-committal on a model, you assigned \( 1/m \) to each model's prior probability. For example in the above case if you believe both models are equally likely then you assign \( P(M_1)=P(M_2)=0.5 \). Using Bayes rule you find the probability of the model given the data \( P(M_i|D)= P(D|M_i) \times P(M_i)/P(D) \) since \( P(D) \) is fixed for all models you can let \( W_i = P(D|M_i) \times P(M_i) \) be the model weights with probabilities \( P(M_i | D)= W_i/\sum_{i=1}^m W_i \).

Let the random variable \( H \) represent the prediction of a future hurricane count. The posterior distribution of \( H \) at \( h \) under each model is given by \( f(h|D, M_i) \). The marginal posterior probability over all models is given by \[ f(h|D)= \sum_{i=1}^m f(h |D, M_i) P(M_i |D ). \] A point estimate for the future count (i.e. the posterior mean of \( H \) over all models) is obtained by taking the expectation of \( H \) given the data as \[ \mathbb{E}(H|D) = \sum_{h=0}^{\infty} h f(h|D). \]

Expanding \( f(h|D) \) and switching the order of summation you get \[ \mathbb{E}(H|D)= \sum_{i=1}^m P(M_i |D) \sum_{h=0}^{\infty}h f(h|D,M_i), \] which is \[ \sum_{i=1}^m P(M_i|D) \mathbb{E}(H |D, M_i), \] where \( \mathbb{E}(H |D, M_i)=\mu_i \). For a given model, \( P(D|M_i) \) is the marginal likelihood over the parameter space. In other words, \( P(D|M_i) = \int P(D|M_i,\theta) f(\theta | M_i) d\theta \) where \( f(\theta | M_i) \) is the prior distribution of the parameters for model \( M_i \) and \( P(D | M_i, \theta) \) is the likelihood of the data given the model [\( L(\theta; M_i, D) \)].

In many cases the above integral cannot be evaluated analytically or it is infinite as when an improper prior is put on the parameter vector \( \theta \). Approximation methods can be used (Hoeting et al. 1999). Here you use the Bayesian Information Criterion (BIC) approximation, which is based on a Laplace expansion of the integral about the maximum likelihood estimates (Madigan and Raftery 1994).

BMA keeps all candidate models assigning a probability based on how likely it would be for your data to have come from each model. A consensus model, representing a weighted average of all models, is then used to make predictions.

If values for the prior parameters come from reasonably well-behaved distributions, then a consensus model from a BMA procedure yields the lowest mean square error of any single best model (Raftery and Zheng2003). BMA is described in more detail in Hoeting et al. (1999) and Raftery (1996).

BMA provides better coverage probabilities on the predictions than any single model (Raftery and Zheng 2003). Consider a data record split into a training and testing set. Using the training set you can create \( 1- \alpha \) credible intervals on the predictions. Then, using the testing set, you can calculate the proportion of observations that lie within the credible intervals. This is called the coverage probability.

In standard practice with a single best model, the credible intervals are too small resulting in coverage probabilities less than \( 1-\alpha \). Since BMA provides a larger variance than any model individually, the coverage probabilities on the predictions are greater or equal to \( 1-\alpha \).

Data plots

You use the data saved in file annual.RData. Load the data and create a new data frame that is a subset of the years since 1866.

dat = annual[annual$Year >= 1866, ]

The counts are the number of near-coastal hurricanes passing through the grids. You consider monthly values of SST, SOI, NAO, and sunspots as covariates.

The monthly covariate values are shown in Fig.~\ref{fig:covariatesplot} as image plots. The monthly values for May through October displayed on the vertical axis are plotted as a function of year displayed on the horizontal axis. The values are shown using a color ramp from blue (low) to yellow (high). The SST and sunspot number (SSN) covariates are characterized by high month-to-month correlation as can be seen by the vertical striations.

mo = 5:10
ytb = colorRampPalette(c("#EDF8B1", "#7FCDBB", "#2C7FB8"), bias = 1)
covplot = function(x = dat, months = mo, col = ytb(100), field = "sst") {
    leglab = switch(field, sst = "SST [C]", ssn = "Sunspot number", nao = "NAO [s.d.]", 
        soi = "SOI [s.d.]")
    abc = switch(field, sst = "a", nao = "b", soi = "c", ssn = "d")
    y = dat[, paste(field,[mo], sep = ".")]
    image.plot(x = dat$Year, y = mo, z = as.matrix(y), ylab = "", xlab = "Year", 
        yaxt = "n", col = col, las = 1, mgp = c(1.8, 0.7, 0))
    axis(side = 2, at = 5:10, labels = c("May", "Jun", "Jul", "Aug", "Sep", 
        "Oct"), las = 1, mgp = c(1.8, 0.7, 0))
    mtext(text = leglab, side = 4, line = -0.2, las = 0, cex = 0.7)
    mtext(text = abc, side = 3, line = 0.2, adj = 0)
par(mfrow = c(2, 2), mex = 0.8, las = 1)
covplot(field = "sst")
covplot(field = "nao")
covplot(field = "soi")
covplot(field = "ssn")

plot of chunk covariatePlots

Model selection

You assume the logarithm of the annual hurricane rate is a linear combination of a fixed subset of your covariates (Poisson generalized linear model). With six months and four environmental variables per month you have \( 2^{24} \) or more than 16.5 million possible models.

Model selection is done with functions in the BMA package (Raftery et al. 2009). Obtain the package and source additional functions in bic.glm.R that allows you to make posterior predictions.

## Warning: package 'BMA' was built under R version 2.15.3

Specifically, you use the bic.glm() function for determining the probability on each model and the imageplot.bma() function for displaying the results.

First save the model formula involving the response variable (here US.1) and all covariates.

fml = US.1 ~ sst.Oct + sst.Sep + sst.Aug + sst.Jul + sst.Jun + sst.May + nao.Oct + 
    nao.Sep + nao.Aug + nao.Jul + nao.Jun + nao.May + soi.Oct + soi.Sep + soi.Aug + 
    soi.Jul + soi.Jun + soi.May + ssn.Oct + ssn.Sep + ssn.Aug + ssn.Jul + ssn.Jun + 

Then save the output from the bic.glm() function by typing

mdls = bic.glm(f = fml, data = dat, = "poisson")

The function returns an object of class bic.glm. By default only models with a Bayesian Information Criterion (BIC) within a factor of 20 of the model with the lowest BIC are kept and only the top 150 models of each size (number of covariates) are considered.

The summary method is used to summarize the results. The top three models having the lowest BIC values (highest posterior probabilities) are summarized in the table below.

summary(mdls, n.models = 3, digits = 2)
## Call:
## bic.glm.formula(f = fml, data = dat, = "poisson")
##   209  models were selected
##  Best  3  models (cumulative posterior probability =  0.1 ): 
##            p!=0   EV       SD       model 1  model 2  model 3
## Intercept  100    7.5e-01  0.10695    0.692    0.695    0.698
## sst.Oct     3.2   7.5e-03  0.07167      .        .        .  
## sst.Sep     5.8  -3.3e-02  0.21834      .        .        .  
## sst.Aug     5.0  -3.4e-03  0.15889      .        .        .  
## sst.Jul    23.9   1.7e-01  0.39788      .        .        .  
## sst.Jun     5.5   1.5e-02  0.11962      .        .        .  
## sst.May     3.4  -5.9e-03  0.11293      .        .        .  
## nao.Oct     2.2   6.1e-04  0.00729      .        .        .  
## nao.Sep     3.2   1.2e-03  0.00973      .        .        .  
## nao.Aug     1.9   3.9e-04  0.00564      .        .        .  
## nao.Jul     3.4   1.6e-03  0.01198      .        .        .  
## nao.Jun    43.3  -4.3e-02  0.05662   -0.105   -0.096   -0.103
## nao.May     5.4  -3.2e-03  0.01713      .        .        .  
## soi.Oct    37.7   2.0e-02  0.02841    0.055      .        .  
## soi.Sep     4.6   1.5e-03  0.00869      .        .        .  
## soi.Aug    25.0   1.3e-02  0.02519      .        .      0.054
## soi.Jul    39.2   2.4e-02  0.03381      .      0.054      .  
## soi.Jun     9.1  -4.6e-03  0.01740      .        .        .  
## soi.May     1.4   2.0e-04  0.00402      .        .        .  
## ssn.Oct     7.0  -3.8e-04  0.00178      .        .        .  
## ssn.Sep    94.4  -1.1e-02  0.00442   -0.012   -0.012   -0.012
## ssn.Aug     1.1   1.1e-05  0.00041      .        .        .  
## ssn.Jul     1.7  -3.7e-05  0.00055      .        .        .  
## ssn.Jun    88.2   8.5e-03  0.00440    0.010    0.011    0.011
## ssn.May     5.7   3.0e-04  0.00149      .        .        .  
## nVar                                    4        4        4  
## BIC                                 171.072  171.511  171.528
## post prob                             0.040    0.032    0.031

The first column lists the covariates (and intercept). The second column gives the posterior probability that a given coefficient is not zero over all the 209 models. One could view this as the inclusion probability for a given covariate. That is, what is the probability that the associated covariate had a non-zero coefficient in the data generating model.

For example, the posterior probability that the June NAO covariate is in the data generating model is 43.3%. The third and fourth columns are the posterior expected value (EV) and standard deviation (SD) across all models. Subsequent columns include the most probable models as indicated by values in rows corresponding to a covariate. The number of variables in the model, the model BIC, and the posterior probability are also given in the table.

Models are ordered by BIC value with the first model having the lowest BIC value, the second model having the second lowest BIC, and so on. The value of BIC for a given model is \[ -2\cdot\ln L + k \ln(n), \] where \( L \) is the likelihood evaluated at the parameter estimates, \( k \) is the number of parameters to be estimated, and \( n \) is the number of years. BIC includes a penalty term (\( k\ln(n) \)) which makes it useful for comparing models with different sizes. If the penalty term is removed, \( -2\cdot\ln L \), can be reduced simply by increasing the number of model covariates.

The BIC as a selection criterion results in choosing models that are parsimonious and asymptotically consistent, meaning that the model with the lowest BIC converges to the true model as the number of hurricane years increases.

You use a plot method to display the model coefficients (by sign) ordered by decreasing posterior probabilities. Models are listed along the horizontal axis by decreasing posterior probability. Covariates in a model are shown with colored bars. A brown bar indicates a positive relationship with hurricane rate and a green bar indicates a negative relationship. Bar width is proportional to the model's posterior probability.

par(cex = 0.8)

plot of chunk imagePlots

BMAfull = glm(family = "poisson", data = dat, newformula, x = TRUE, y = TRUE)
qrX = qr(BMAfull$x)
qrQ = qr.Q(qrX)
qrR = qr.R(qrX)
colnames(qrQ) = colnames(BMAfull$x)
qrdata = data.frame(US.1 = BMAfull$y, qrQ)
BMAalldataqr = bic.glm( = "poisson", data = qrdata, newformula)

The plot makes it easy to see the covariates picked by the most probable models. They are the ones with the most consistent coloring from left to right across the image. A covariate with only a few gaps indicates that it is included in most of the higher probable models. These include September and June SSN, June NAO, July SST, and any of the months of July through September for the SOI.

You might ask why July SST is selected as a model covariate more often than August and September? The answer lies in the fact that when the hurricanes arrive in August and September, they draw heat from the ocean surface so the correlation between hurricane activity and SST weakens.

The thermodynamics of hurricane intensification works against the statistical correlation. Said another way, July SST better relates to an active hurricane season not because a warm ocean in July causes tropical cyclones in August and September, but because hurricanes in August and September cool the ocean.

The SOI covariates get chosen frequently by the most probable models but with a mixture across the months of July through October. The posterior probability is somewhat higher for the months of June and October and smallest for August and September. Convection over the eastern equatorial Pacific during El Nino produces increased shear and subsidence across the Atlantic especially over the western Caribbean where during the months of July and October a relatively larger percentage of the North Atlantic hurricane activity occurs. Moreover, the inhibiting influence of El Nino might be less effective during the core months of August and September when, on average, other conditions tend to be favorable.

The sign on the September SSN parameter is negative indicating that the probability of a U.S. hurricane decreases with increasing number of sunspots. This result accords with the hypothesis that increases in UV radiation from an active sun (greater number of sunspots) warms the upper troposphere resulting in greater thermodynamic stability and a lower probability of a hurricane over the western Caribbean and Gulf of Mexico (Elsner and Jagger 2008, Elsner et al. 2010).

The positive relationship between hurricane probability and June SSN is explained by the direct influence the sun has on ocean temperature. Alternative explanations are possible especially in light of role the solar cycle likely plays in modulating the NAO (Kodera 2002, Ogi et al. 2003).

You can find the probability that a covariate irrespective of month is chosen by calculating the total posterior probability over all models that include this covariate during at least one month.

First, use the substring() function on the covariate names given in the bic.glm object under namesx to remove the last four characters in each name. Also create a character string containing only the unique names.

cn = substring(mdls$namesx, 1, 3)
cnu = unique(cn)

Next create a matrix of logical values using the outer() function, which performs an outer product of matrices and arrays. Also assign column names to the matrix.

mn = outer(cn, cnu, "==")
colnames(mn) = cnu

Next perform a matrix multiplication of the matching names with the matrix of logical entries indicating which particular covariate was chosen. This returns a matrix with dimensions of number of models by number of covariate types. The matrix entries are the number of covariates in each model of that type.

Finally multiply the posterior probabilities given under \verb@postprob@ by a logical version of this matrix using the condition that the covariate type is included.

xx = mdls$which %*% mn
pc = 100 * mdls$postprob %*% (xx > 0)
round(pc, 1)
##       sst  nao  soi  ssn
## [1,] 39.7 51.9 98.5 98.5

You see that the SOI and sunspot number have the largest probabilities at 98.5% while the NAO and SST have posterior probabilities of 51.9% and 39.7%, respectively. The lower probability of choosing the NAO reflects the rather large intra-seasonal variability in this covariate.

It is informative to compare the results of your BMA with a BMA performed on a random series of counts. Here you do this by resampling the actual hurricane counts. The randomization results in the same set of counts but the counts are placed randomly across the years. The random series together with the covariates are used as before and the results mapped.

s1 = dat
s1$US.1 = sample(s1$US.1)
mdls1 = bic.glm( = "poisson", data = s1, fml)
s1 = dat
s1$US.1 = sample(s1$US.1)
mdls2 = bic.glm( = "poisson", data = s1, fml)
s1 = dat
s1$US.1 = sample(s1$US.1)
mdls3 = bic.glm( = "poisson", data = s1, fml)
par(mfrow = c(2, 2), mex = 0.8, cex = 0.6)
imageplot.bma3(mdls, order = "probne0")
mtext(text = "a", side = 3, line = 0.5, adj = 0)
imageplot.bma3(mdls1, order = "probne0")
mtext(text = "b", side = 3, line = 0.5, adj = 0)
imageplot.bma3(mdls2, order = "probne0")
mtext(text = "c", side = 3, line = 0.5, adj = 0)
imageplot.bma3(mdls3, order = "probne0")
mtext(text = "d", side = 3, line = 0.5, adj = 0)

plot of chunk BMArandom

The comparison shows that your set of covariates has a meaningful relationship with U.S. hurricane activity. There are fewer models chosen with the randomized data sets and the number of variables included in set of most probable models is lower.

In fact the averaged number of variables in the 20 most probable models is 4, which compares with an average of one for the three randomized series. Moreover, there is little consistency in the variable selected from one model to the next as it should be.

Consensus hindcasts

In contrast to selecting a single model the BMA procedure assigns a posterior probability to a set of the most probable models. Each model can be used to make a prediction. But which forecast should you believe? Fortunately no choice is necessary. Each model makes a prediction and then the forecasts are averaged. The average is weighted where the weights are proportional to the model's posterior probability. Here you assume perfect knowledge of the covariates and hindcasts are made in-sample. For an actual forecast situation this is not available, but the method would be the same.

You use the prediction functions in {\bf prediction.R} to hindcast annual rates, rate distributions, and posterior count distributions. First, source the code file then compute the mean and standard deviation of the annual rate for each year. From the output, compute the in-sample average square error.

ar = bic.poisson(mdls, newdata = mdls$x, simple = TRUE)
sqrt(mean((ar[1, ] - mdls$y)^2))
## [1] 1.36

Thus on average the consensus model results in a mean square error of \Sexpr{round(sqrt(mean((ar[1,]-mdls$y)2)),1)} hurricanes per year.

You compare hindcast probabilities from the consensus model between two years. Here you examine the consecutive years of 2007 and 2008. You determine the posterior probabilities for the number of hurricanes for each year for hurricane numbers between zero and eight and display them using a side-by-side bar plot.

yr1 = 2007
yr2 = 2008
r1 = which(dat$Year == yr1)
r2 = which(dat$Year == yr2)
Pr = bic.poisson(mdls, newdata = mdls$x[c(r1, r2), ], N = 9)
par(las = 1, mgp = c(2.2, 0.4, 0), tcl = -0.3)
barplot(t(Pr), beside = TRUE, las = 1, xlab = "Number of hurricanes", ylab = "Probability", 
    legend.text = c(as.character(yr1), as.character(yr2)))

plot of chunk compareYears

The vertical axis is the probability of observing \( h \) number of hurricanes. Forecasts are shown for the 2007 and 2008 hurricane seasons. The model predicts a higher probability of at least one U.S. hurricane for 2008 compared with 2007. There is a 54% chance of 3 or more hurricanes for 2007 and a 57% chance of 3 or more hurricanes for 2008. There was 1 hurricane in 2007 and 3 hurricanes in 2008.

The consensus model hindcasts larger probabilities of an extreme year given the rate than would be expected from a Poisson process. That is, the consensus model is over dispersed with respect to a Poisson distribution. This makes sense since model uncertainty is incorporated in the consensus hindcasts.

A cross validation of the BMA procedure is needed to get an estimate of how well the consensus model will do in predicting future counts. This is done in Jagger and Elsner (2010) using various scoring rules including the mean square error, the ranked probability score, the quadratic (Brier) score, and the logarithmic score.

They find that the consensus model provides more accurate predictions than a procedure that selects a single best model using BIC or AIC irrespective of the scoring rule. The consensus forecast will not necessarily give you the smallest forecast error every year, but it will always provide a better assessment of forecast uncertainty compared to a forecast from a single model. The BMA procedure provides a rational way for you to incorporate competing models in the forecast process.

Space-Time Model