\chapter{Bayesian Models} \label{chap:bayesianmodels} \SweaveOpts{keep.source=TRUE, pdf=TRUE} <>= options(digits=3, show.signif.stars=F, width=53) options(width=50) rm(list=ls()) @ %data files used: load("alldata.R") %required packages: "gamlss" %source code: \begin{quote} ``Errors using inadequate data are much less than those using no data at all.'' \end{quote} \indent---Charles Babbage\\ This chapter focuses on modeling from the Bayesian perspective. Information about past hurricanes is available from instrument records and historical accounts. Historical accounts have a greater amount of uncertainty. Here we show you how the Bayesian perspective can help you make use of all the available information while accounting for differences in levels of uncertainty. \section{Long-Range Outlook} We begin with a long-range outlook for U.S. hurricane activity over the next 30 years. This is useful for providing a benchmark in climate change studies. The methodology is presented in \cite{ElsnerBossak2001} and is based on the formalism presented in \cite{Epstein1985}. \subsection{Poisson-gamma conjugate} As shown in Chapter~\ref{chap:frequencymodels} you can consider the arrival of hurricanes on the coast as a stochastic process where the annual counts have a Poisson distribution. The Poisson distribution is a limiting form of the binomial distribution with no upper bound on the number of occurrences. The rate parameter $\lambda$ characterizes this stochastic process. Knowledge of $\lambda$ allows you to make statements about future hurricane frequency. Since the process is stochastic your statements must be given in terms of probabilities. For example, the probability of $\hat h$ many hurricanes occurring over the next $T$ years (e.g., 1, 5, 20, etc) is \begin{equation} f_{\hbox{Poisson}}(\hat h | \lambda, T) = \hbox{exp}(-\lambda T){(\lambda T) ^h \over h!} \hspace{0.3cm} \hbox{for} \hspace{0.2cm} h=0, 1, 2,\ldots, \hspace{0.2cm} \lambda > 0, \hspace{0.2cm}, \hbox{and} \hspace{0.2cm} T > 0. \end{equation} The hat notation is used to indicate future values. The parameter $\lambda$ and statistic $T$ appear in the formula as a product, which is the mean and variance of the distribution. Knowledge about $\lambda$ can come from a variety of sources. Clearly you want to use as much of this information as possible before inferring something about future activity. This requires you to treat $\lambda$ as a continuous random variable that can be any positive number, rather than as a fixed constant. The form for expressing your judgement about $\lambda$ is the gamma distribution. The numbers that are used to estimate $\lambda$ from a set of observations are the time interval $T'$ and the number of hurricanes $h'$ that occurred during this interval. For instance, observations from the official U.S. hurricane record indicate 15 hurricanes over the first ten years, so $T'=10$ and $h'=15$. To verify this type <>= H=read.table("US.txt",header=T) sum(H$All[1:10]) @ The gamma distribution of possible future values for $\lambda$ is given by \begin{equation} f_{\gamma}(\hat \lambda | h', T') = {T'^{h'}\lambda^{h'-1} \over \Gamma (h')} \hbox{exp}(-\lambda T'), \end{equation} with the expected value E($\hat \lambda$) = $h'/T'$, and the gamma function $\Gamma (x)$ given by \begin{equation} \Gamma (x) = \int^{\infty}_0 t^{x-1} \hbox{e}^{-t} \hbox{d}t. \end{equation} Of importance here is the fact that if the probability density on $\hat \lambda$ is a gamma distribution, with initial numbers (prior parameters) $h'$ and $T'$, and the numbers $h$ and $T$ are later observed, then the posterior density of $\hat \lambda$ is also gamma with parameters $h+h'$ and $T+T'$. That is, the gamma density is the conjugate prior for the intensity of the Poisson process, $\lambda$ (see Chapter~\ref{chap:bayesianstatistics}). \subsection{Estimating prior parameters} This gives you a convenient way to combine earlier, less reliable information with later information. You simply add the prior parameters $h'$ and $T'$ to the sample numbers $h$ and $T$ to get the posterior parameters. But how do you estimate the prior parameters? One way is to use bootstrapping. Bootstrapping is sampling from your sample (resampling) to provide an estimate of variation about your statistic of interest \citep{EfronTibshirani1986}. Here you use the \verb@bootstrap@ function in the {\bf bootstrap} package to obtain a confidence interval about $\lambda$ using data from before the year 1899. First load the package and save a vector of the counts over the earlier period of record. Then use the \verb@bootstrap@ function on this vector of counts. <>= library(bootstrap) early=H$All[H$Year<1899] bs=bootstrap(early,theta=mean, nboot=1000) @ To obtain a 90~\% bootstrap confidence interval on the mean, type <>= qbs=quantile(bs$thetastar,prob=c(.05,.95)) qbs @ Although you cannot say with certainty what the true hurricane rate was over this early period, you can make a sound judgement that you are 90~\% confident that the interval contains it. In other words, you are willing to admit a 5~\% chance that the true rate is less than \Sexpr{round(qbs[1],2)} and a 5~\% chance that it is greater than \Sexpr{round(qbs[2],2)}. To take advantage of your judgement about the early hurricane rate in assessing future activity, you make use of the relationship between the gamma and $\chi^2$ distributions. Specifically, if $\hat \lambda$ is gamma with parameters $h'$ and $T'$, then the random variable $\hat Z = 2\hat \lambda T'$ is $\chi^2$ with 2$h'$ degrees of freedom \cite{Epstein1985}. The 5~\% probability that $\lambda$ is less than \Sexpr{round(qbs[1],2)} implies that $\hat Z$ = 2(\Sexpr{round(qbs[1],2)})$T'$ = \Sexpr{round(2*qbs[1],2)}$T'$ is $\chi^2_n(.05)$, where $\chi^2_n(.05)$ is the lower .05 quantile of a $\chi^2$ distribution with $n$ degrees of freedom. Similarly, the 95~\% probability that $\lambda$ is less than \Sexpr{round(qbs[2],2)} implies that $\hat Z$ = \Sexpr{round(2*qbs[2],2)}$T'$ is $\chi^2_n(.95)$, where $\chi^2_n(.95)$ is the upper 0.95 quantile. Thus the ratio of the upper to lower quantiles of the $\chi^2$ variable is <>= ratio=round(as.numeric(qbs[2]/qbs[1]),2) ratio @ To determine the degrees of freedom when the $\chi_n^2$ ratio is \Sexpr{ratio} you create a function that computes the ratio of the corresponding upper and lower quantiles of the distribution. You then apply the function to a vector containing the degrees of freedom and subset it according to whether or not the ratios are equal taking the lowest value when more than one coincide. <>= rchi2=function(x){ round(qchisq(.95,x)/qchisq(.05,x),2) } df=1:200 dof=min(df[sapply(df,FUN=rchi2)==ratio]) dof @ Your estimate of the effective number of hurricanes during the early period $h'$ is one half this many degrees of freedom, or \Sexpr{dof/2}. Continuing, the effective record length $T'$ over this period is the .05 quantile of the $\chi^2$ with \Sexpr{dof} degrees of freedom divided by two times the corresponding quantile of the bootstrap sample mean. You save these prior parameters by typing <>= hp=dof/2 Tp=round(qchisq(.95,dof)/(2*qbs[2]),1) @ The above procedure quantifies your judgement about hurricanes prior to the reliable set of counts. \subsection{Posterior density for landfall rate} Now you have two distinct pieces of information from which to obtain a posterior distribution for $\lambda$ (landfall rate). Your prior parameters $h'$ = \Sexpr{hp} and $T'$ = \Sexpr{Tp} from above and your likelihood statistics based on the data over the reliable period of record (1899--2010). The total number of hurricanes over this reliable period and the record length are <>= late=H$All[H$Year>=1899] h=sum(late) T=length(late) h;T @ The posterior parameters are therefore $h'' = h+h'$ = \Sexpr{h+hp} and $T'' = T+T'$ = \Sexpr{T+Tp}. Note that although the likelihood parameters $h$ and $T$ must be integers, the prior parameters can take on any real value depending on your degree of belief. Since the prior, likelihood, and posterior are all in the same gamma family, you can use \verb@dgamma@ to compute the densities. <>= curve(dgamma(x,shape=h+hp,rate=T+Tp),from=1,to=3, xlab="Landfall Rate [hurricanes/year]", ylab="Density", col=1,lwd=4, las=1) urve(dgamma(x,shape=h,rate=T),add=TRUE, col=2,lwd=4) curve(dgamma(x,shape=hp,rate=Tp),add=TRUE, col=3,lwd=4) legend("topright",c("Prior","Likelihood", "Posterior"), col=c(3,2,1),lwd=c(3,3,3)) @ \begin{figure} \begin{center} <>= op=par(las=1) curve(dgamma(x,shape=h+hp,rate=T+Tp),from=1,to=3, xlab="Landfall Rate [hurricanes/year]", ylab="Density", col=1,lwd=4) curve(dgamma(x,shape=h,rate=T),add=TRUE, col=2,lwd=4) curve(dgamma(x,shape=hp,rate=Tp),add=TRUE, col=3,lwd=4) legend("topright",c("Prior","Likelihood", "Posterior"), col=c(3,2,1),lwd=c(3,3,3)) grid() par(op) @ \end{center} \caption{Gamma densities on the U.S. landfall hurricane rate.} \label{fig:prior:like:post} \end{figure} The densities are shown in Fig.~{fig:prior:like:post}. Note that the posterior density resembles the likelihood but is shifted slightly in the direction of the prior and is narrower. Recall the posterior is a weighted average of the prior and the likelihood where the weights are proportional to the precision. The greater the precision, the more weight it carries in determining the posterior. The relatively broad density on the prior estimate indicates the uncertainty and relative shorter length of the unreliable period. Combining the prior and likelihood results in a posterior distribution that represents the best information about $\lambda$. <>= YEAR=1899 early=H$MUS[H$Year=YEAR] h=sum(late) T=length(late) op=par(las=1) curve(dgamma(x,shape=h+hp,rate=T+Tp),from=1,to=3, xlab="U.S. Hurricane Rate [hurricanes/year]", ylab="Density", col=1,lwd=4) curve(dgamma(x,shape=h,rate=T),add=TRUE, col=2,lwd=4) curve(dgamma(x,shape=hp,rate=Tp),add=TRUE, col=3,lwd=4) legend("topright",c("Prior","Likelihood", "Posterior"), col=c(3,2,1),lwd=c(3,3,3)) grid() par(op) m=30*(h+hp)/(T+Tp) @ \subsection{Predictive distribution} The information you have about $\lambda$ is codified in the two parameters $h'' = h + h'$ and $T'' = T + T'$ of the gamma density. Of practical interest is how to use the information to predict future hurricane activity. The answer lies in the fact that the {\it predictive density} for observing $\hat h$ U.S. hurricanes over the next $\hat T$ years is a negative binomial distribution, with parameters $h''$ and ${T'' \over \hat T + T''}$ given by \begin{equation} f_{nb}\Bigl(\hat h | h'', {T'' \over \hat T + T''}\Bigr) = {\Gamma(\hat h + h'') \over \Gamma (h'') \hat h!}\Bigl[{T'' \over \hat T + T''}\Bigr]^{h''}\Bigl[{\hat T \over \hat T + T''}\Bigr]^{\hat h}. \end{equation} The mean and variance of the negative binomial are $\hat T{h'' \over T''}$ and $\hat T{h'' \over T''}\bigl({\hat T + T'' \over T''}\bigr)$, respectively. Note that the variance of the predictive distribution is larger than it would be if $\lambda$ were known precisely. If you are interested in the climatological probability of a hurricane {\it next} year, then $\hat T$ is one and small compared with $T''$ so it makes little difference, but if you're interested in the distribution of hurricane activity over the next 20 years then it is important. For example, you plot the posterior probability and cumulative probability distributions of the number of U.S. hurricanes over the next ten years by typing, <>= Th=10 m=Th*(h+hp)/(T+Tp) v=Th*m*(Th+T+Tp)/(T+Tp) nl=0:32 hh=dnbinom(nl,mu=m,size=v) op=par(las=1,mar=c(5,4,2,4)) plot(nl,hh,type="h", xlab="Number of U.S. Hurricanes", ylab="Probability", col="gray",lwd=4) par(new=TRUE) p=pnbinom(nl,mu=m,size=v) plot(nl,p,type="l",col="red",xaxt="n",yaxt="n", xlab="",ylab="",lwd=2) axis(4) mtext("Probability of h less than or equal to H", side=4,line=2.5,las=0) @ \begin{figure} \begin{center} <>= op=par(mfrow=c(1,2),las=1,mar=c(5,4,2,4)) Th=10 m=Th*(h+hp)/(T+Tp) v=Th*m*(Th+T+Tp)/(T+Tp) nl=0:30 hh=dnbinom(nl,mu=m,size=v) plot(nl,hh,type="h", xlab="Number of U.S. Hurricanes (H)", ylab="Probability of H Hurricanes", col="lightgray",lwd=4) par(new=TRUE) p=pnbinom(nl,mu=m,size=v) plot(nl,p,type="l",xaxt="n",yaxt="n", xlab="",ylab="",lwd=2) axis(4) mtext(expression(paste("Probability that h",{}<={},"H")),side=4,line=2.5,las=0) mtext("a",side=3,line=1,adj=0,cex=1.1) plot(c(0,80),c(0,1), type="n", yaxt="n", ylab="",xlab="Number of U.S. Hurricanes (H)",lwd=2) axis(4) mtext("Probability that h > H",side=4,line=2.5,las=0) grid() mtext("b",side=3,line=1,adj=0,cex=1.1) Th=c(30,20,10) cls=rev(c("black","gray","lightgray")) ey=c(80,60,40) #legend(60,1,legend=c("10 yr","20 yr","30 yr"), xjust=1, col=rev(cls), pch=16, cex=.8) for(i in 1:3){ m=Th[i]*(h+hp)/(T+Tp) v=Th[i]*m*(Th+T+Tp)/(T+Tp) nl=0:ey[i] p=1-pnbinom(nl,mu=m,size=v) points(nl,p,pch=16,col=cls[i]) } par(op) @ \end{center} \caption{Predictive probability distributions for U.S. hurricanes. (a) Probabilities of $H$ hurricanes in the specified number of years over a random ten-year period is shown as vertical bars. The probability that the number of hurricanes will be less than or equal to $H$ is shown as a solid line. (b) Probabilities that the number of hurricanes will exceed $H$ in 10-year (black), 20-year (gray), and 30-year (light gray) periods.} \label{fig:posteriorprobs} \end{figure} The results are shown in Figure~\ref{figposteriorprobs} where the probabilities of $H$ hurricanes in ten years are shown as vertical bars and with a scale plotted along the left vertical axis and the probability that the number of hurricanes will be less than or equal to $H$ is shown as a solid line and with a scale along the right axis. The probability that the number of hurricanes will exceed $H$ is shown for a random 10, 20, and 30-year period is shown in the right panel. The expected number of U.S. hurricanes over the next 30 years is \Sexpr{round(30*(h+hp)/(T+Tp),0)} of which 18 (not shown) are anticipated to be major hurricanes. These probability distributions represent the best estimates of the future climate of U.S. hurricanes. <>= YEAR=1899 #select cutoff year of reliability early=H$MUS[H$Year=YEAR] bs=bootstrap(early,theta=mean, nboot=1000) qbs=quantile(bs$thetastar,prob=c(.05,.95)) ratio=round(as.numeric(qbs[2]/qbs[1]),2) rchi2=function(x){ round(qchisq(.95,x)/qchisq(.05,x),2) } df=1:200 dof=min(df[sapply(df,FUN=rchi2)==ratio]) hp=dof/2 Tp=round(qchisq(.95,dof)/(2*qbs[2]),1) h=sum(late) T=length(late) round(30*(h+hp)/(T+Tp),2) #predicted average number over 30 years @ The above approach provides a rational and coherent foundation for incorporating all available information, while explicitly accounting for the differences in the degree of uncertainty. It can be used to account for the influence of climate change by discounting the older information. That is, records influenced by recent changes can be given more weight than records from earlier decades. \section{Seasonal Model} You can also create a Bayesian model for annual hurricane counts. The counts are reliable starting in the middle 20th century, especially in the United States. But data records on past hurricanes extend farther back and these earlier records are useful to understanding and predicting seasonal activity. The model assumes the logarithm of the annual rate is linearly related to sea-surface temperature (SST) and the Southern Oscillation Index (SOI) as discussed in Chapter~\ref{chap:frequencymodels}. Samples from the posterior distribution are generated using the Markov chain Monte Carlo approach discussed in Chapter~\ref{chap:bayesianstatistics}. The JAGS code is given below. You copy and paste it into a text file in your working directory with the name {\it JAGSmodel2.txt}. \begin{verbatim} ___JAGS code___ model { for(i in 1:N) { # Poisson likelihood for observed counts h[i] ~ dpois(lambda[i]) mu[i] <- b0 + b1*SOI[i] + b2*SST[i] + b3*RI[i] + eta[i] lambda[i] <- exp(mu[i]) tt[i] <- tau[RI[i] + 1] eta[i] ~ dnorm(0, tt[i]) } b0 ~ dnorm(0,.0001) b1 ~ dnorm(0,.0001) b2 ~ dnorm(0,.0001) b3 <- log(pm) pm ~ dunif(lo,hi) tau[1] ~ dgamma(.001,.001) tau[2] ~ dgamma(.001,.001) } _______________ \end{verbatim} The codes specifies the regression model and associated priors. In particular, the annual hurricane count {\tt h[i]} is stochastically generated from a Poisson distribution with a rate {\tt lambda[i]}. The parameter {\tt lambda[i]} is deterministically linked to {\tt mu[i]} through the exponential function, where {\tt mu[i]} is linearly related to the SST and SOI covariates and a ``reliability'' index ({\tt RI[i]}). The reliability index is coded as 1 for years prior to aircraft and 0 for years after. Here you use 1943 as the cutoff year. Your model includes a random effects term ({\tt eta[i]}) that quantifies the extra variation in annual hurricane rates not modeled by the covariates. It accommodates two variance distributions to account for different levels of uncertainty before and after the cutoff year. Recall that JAGS is a declarative language so the order of the assignment statements is irrelevant. Your model is a directed acyclic graph (DAG) where the nodes are the parameters and data and the arrows indicate the conditional dependency, either stochastic or deterministic. Acquire the data by typing <>= load("alldata.R") ATL=read.table("ATL.txt",header=T) @ Next, to simplify things, create a data frame for the data to be modeled. <>= df=data.frame(YR=ATL$Year,H=ATL$H,SOI=alldata$soi, SST=alldata$sst,RI=as.numeric(ATL$Year<1943)) df=df[16:160,] #start year 1866 @ Here you include the year {\tt YR}, the basin-wide hurricane count {\tt H}, the August-October value of SOI, the August through October value of SST, and a reliability index {\tt RI} that is coded as 1 for years prior to aircraft and 0 for years after. <>= require(rjags) model=jags.model('JAGSmodel2.txt', data = list(N=length(df$H),h=df$H, SST=df$SST,SOI=df$SOI,RI=df$RI, lo=.2,hi=.95), inits=list(b0=0,b1=0,b2=0, tau=c(.1,.1),pm=.9), n.chains = 2, n.adapt = 1000) @ <<>>= update(model,2000) out=coda.samples(model,c('b1','b2'),1000) @ You first generate samples from the coefficients on the SST and SOI terms and plot histograms of the samples (Figure~\ref{fig:sstsoihist}). <>= hist(out[[1]][,1],main="",xlab=expression(beta[1])) hist(out[[1]][,2],main="",xlab=expression(beta[2])) @ \begin{figure} \begin{center} <>= op=par(mfrow=c(1,2),las=1) hist(out[[1]][,1],main="",xlab=expression(beta[1])) hist(out[[1]][,2],main="",xlab=expression(beta[2])) par(op) @ \end{center} \caption{Histograms of the posterior samples of the model coefficients. The samples are from the first chain and the parameter $\beta_1$ corresponds to the SST coefficent and $\beta_2$ to the SOI coefficient.} \label{fig:sstsoihist} \end{figure} Note that\verb@out@ is an \verb@mcmc.list@ object containing two chains. The chain number is given inside the double brackets and the sample values are given as a matrix where the rows are the consecutive samples and the columns are the parameters. As expected the histograms verify both covariates are important in modulating annual hurricane rates as the distributions of their coefficients are shifted to the right of zero. A time series of box plots is instructive for examining the distribution of the annual rate parameter ($\lambda$) as a function of year. First generate samples of \verb@lambda@. <>= out=coda.samples(model,c('lambda'),1000) @ Then use the \verb@fivenum@ function with the \verb@points@ and \verb@lines@ functions within a loop over all years. <>= plot(c(1866,2011),c(0,20),type="n",bty="n", xlab="Year",ylab="Annual Rate [hur/yr]") for(i in 1:145){ points(i+1865,fivenum(out[[1]][,i])[3], pch=16) lines(c(i+1865,i+1865),c(fivenum(out[[1]][,i])[1], fivenum(out[[1]][,i])[5])) } @ \begin{figure} \begin{center} <>= op=par(las=1) plot(c(1866,2011),c(0,20),type="n",bty="n", xlab="Year",ylab="Annual Rate [hur/yr]") for(i in 1:145){ points(i+1865,fivenum(out[[1]][,i])[3], pch=16) lines(c(i+1865,i+1865),c(fivenum(out[[1]][,i])[1], fivenum(out[[1]][,i])[5])) } par(op) @ \end{center} \caption{Annual hurricane rates. The median rate is shown as a point and the range is given as a vertical line. The rates are posterior samples from an implementation of an MCMC algorithm for a model of basin-wide hurricane counts. The model is written in JAGS code. The model includes a term to accommodate more uncertainty about the exact number of hurricanes prior to 1943. The covariates include the SST and SOI.} \label{fig:annualrates} \end{figure} Results are shown in Figure~\ref{fig:annualrates}. The median rate is given as a point and the range is given as a vertical line. Rates are posterior samples from an implementation of an MCMC algorithm for your model of basin-wide hurricane counts (your JAGS code above). Model covariates include the SST and SOI as used throughout this book. As mentioned above, \verb@eta@ quantifies the extra variation in rates not modeled by SST and SOI and it accounts for the different levels of uncertainty before and after the start of the aircraft recon era. A time series plot of posterior samples of \verb@eta@ are shown in Figure~\ref{fig:extravariation}. \begin{figure} \begin{center} <>= out=coda.samples(model,c('eta'),1000) m=numeric() rang=numeric() op=par(mfrow=c(1,1),las=1) plot(c(1866,2011),c(-.2,.2),type="n",bty="n",xlab="Year",ylab="Extra Variation") for(i in 1:145){ points(i+1865,fivenum(out[[1]][,i])[3], pch=16) lines(c(i+1865,i+1865),c(fivenum(out[[1]][,i])[2],fivenum(out[[1]][,i])[4])) m[i]=median(out[[1]][,i]) rang[i]=fivenum(out[[1]][,i])[5]-fivenum(out[[1]][,i])[1] } lines(loess.smooth(1866:2010,m),col="red",lwd=2) par(op) @ \end{center} \caption{Extra variation in hurricane rates. The posterior median is plotted as a dot and the vertical lines extend from the 25th to the 75th percentiles. The red line is local regression smoother through the median values.} \label{fig:extravariation} \end{figure} The posterior median is plotted as a dot and the vertical lines extend from the 25th to the 75th percentiles. The red line is local regression smoother through the median values. The graph shows larger unexplained variations and a tendency for under prediction of the rates (positive values) during the earlier years. %plot(1866:2010,rang,type="l",ylim=c(0,1)) %bayesglm \section{Consensus Model} Model uncertainty is your incertitude associated with choosing a single model from a suite of other good models. You typically use a selection procedure on a set of covariates for a given class of models (model selection) and then use a method to estimate the parameter values. You make predictive inferences with the model as if it had generated the data. Unfortunately, this approach ignores the uncertainty in your model selection procedure resulting in too much confidence in your predictions. For example, given a pool of covariates for hurricane activity a stepwise regression procedure is frequently employed to search through hundreds of candidate models (combinations of covariates). The model that provides the best level of skill is chosen. The best model is subsequently subjected to a hold-one-out cross-validation exercise to obtain an estimate of how well the it will predict future data. However, this exercise does not result in a ``full'' cross validation as the procedure for selecting the reduced set of covariates is not itself cross validated. Cross validation is a procedure for assessing how well an algorithm for choosing a particular model (including the predictor selection phase) will do in forecasting the unknown future \cite[]{Michaelsen1987, ElsnerSchmertmann1994}. An alternative is to use Bayesian model averaging (BMA). BMA works by assigning a probability to each model (combination of covariates), then averaging over all models weighted by their probability. In this way model uncertainty is included \citep{RafteryEtAl2005}. Here we produce a consensus forecast of seasonal hurricane activity \citep{JaggerElsner2010}. In doing so we show how the approach can facilitate a physical interpretation of your modeled relationships. \subsection{Bayesian model averaging} Let $H_i$, $i=1, \ldots, N$ denote observed hurricane counts by year. Assume that your model has $k$ covariates, then let $\mathbf{X}$ be the covariate matrix with components $X[i,j]$, $i=1, \ldots, N$, $j=1, \ldots k$ associated with the $i$th observation of the $(j+1)$th covariate and with the intercept term $X[i, 1] =1$ for all $i$. Associated with the intercept and $k$ covariates are $k+1$ parameters $\beta_j$, $j=1, \ldots, k+1$. You assume the counts have a Poisson distribution and the logarithm of the rate is regressed onto the covariates as \begin{eqnarray} H_i &\sim& \hbox{pois}(\lambda_i)\nonumber\\ \log(\lambda_i) &=& \sum_{j=1}^{k+1}{X}[i,j] \beta_j \nonumber \end{eqnarray} This is a generalized linear model and the method of maximum likelihoods can be used to estimate the parameters. From these parameter estimates and the values of the corresponding covariates you infer $\lambda$ from the regression equation. The future hurricane count conditional on these covariate values has a Poisson distribution with mean of $\lambda$. Thus your model is probabilistic and the average count is a ``point'' forecast. A full model is defined as one that uses all $k$ covariates. However, it is usual that some of the covariates do not contribute much to the model. In classical statistics these are the ones that are not statistically significant. You can choose a reduced model by setting some of the $k$ parameters to zero. Thus with $k$ covariates there are a total of $m=2^k$ possible models. The important idea behind BMA is that none of the $m$ models are discarded. Instead a probability is assigned to each model. Predictions are made by a weighted average over predictions from all models where the weights are the model probabilities. Models with greater probability have proportionally more weight in the averaging. To solidify this point, consider a simple case. You have observations of $Y$ arising from either one of two possible regression models. Let $Y_1 = \alpha_1 + \epsilon_1$ be a constant mean model and $Y_2 = \alpha_2 + \beta x + \epsilon_2$ be a simple regression model where $x$ is a single covariate. The residual terms $\epsilon_1, \epsilon_2$ are independent and normally distributed with means of zero and variances of $\sigma_1^2$ and $\sigma_2^2$, respectively. Suppose you assign a probability $p$ that the constant mean model generated the observed data. Then there is a probability $1-p$ that the simple regression model generated the data instead. Now with BMA the posterior predictive expectation (mean) of $Y$ is $\mu= p\mu_1+(1-p)\mu_2 =p\alpha_1 +(1-p) (\alpha_2+ \beta x)$. This represents a consensus opinion that combines information from both models as opposed to choosing one over the other. The posterior predictive distribution of $Y$ given the data is not necessarily normal. Instead it is a mixture of normal distributions with a posterior predicted variance of $p\sigma_1^2+(1-p)\sigma_2^2 + p(1-p)(\alpha_2+\beta x-\alpha_1)^2$. This variance under BMA is larger than a simple weighted sum of the individual model variances by an amount $p(1-p)(\alpha_2+\beta x-\alpha_1)^2$ that represents the uncertainty associated with model choice. Thus, the predictive distribution under BMA has a larger variance than the predictive distribution of a single model. This makes sense. Over a set of competing models you need a way to assign a probability to each. You start with a collection of models $M_i, i=1,\ldots,m$, where each model is a unique description of your data. For example, in the above example, you need to assign a probability to the constant mean model and a probability to the simple regression model under the constraint that the total probability over both models is one. Now with your data $D$ and a set of proposed models $M_i$, you determine the probability of your data given each model[$P(D | M_i)$]. You also assign a prior probability to each model [$P(M_i)$] representing your belief that each model generated your data. Under the situation in which you are maximally non-commital as to which model to choose you assigned $1/m$ to each model's prior probability. For example in the above case if you believe both models are equally likely then you assign $P(M_1)=P(M_2)=0.5$. Using Bayes rule (see Chapter~\ref{chap:bayesianstatistics}) you find the probability of the model given the data $P(M_i|D)= P(D|M_i) \times P(M_i)/P(D)$ since $P(D)$ is fixed for all models you can let $W_i = P(D|M_i) \times P(M_i)$ be the model weights with probabilities $P(M_i | D)= W_i/\sum_{i=1}^m W_i$. Let the random variable $H$ represent the prediction of a future hurricane count. The posterior distribution of $H$ at $h$ under each model is given by $f(h|D, M_i)$. The marginal posterior probability over all models is given by $f(h|D)= \sum_{i=1}^m f(h |D, M_i) P(M_i |D )$. A point estimate of the future count (i.e. the posterior mean of H over all models) is obtained by taking the expectation of $H$ given the data as \begin{equation} \mathbb{E}(H|D) = \sum_{h=0}^{\infty} h f(h|D). \end{equation} Expanding $f(h|D)$ and switching the order of summation you get \begin{equation} \mathbb{E}(H|D)= \sum_{i=1}^m P(M_i |D) \sum_{h=0}^{\infty}h f(h|D,M_i), \end{equation} which is \begin{equation} \sum_{i=1}^m P(M_i|D) \mathbb{E}(H |D, M_i), \end{equation} where $\mathbb{E}(H |D, M_i)=\mu_i$. For a given model, $P(D|M_i)$ is the marginal likelihood over the parameter space. In other words, $P(D|M_i) = \int P(D|M_i,\theta) f(\theta | M_i) d\theta$ where $f(\theta | M_i)$ is the prior distribution of the parameters for model $M_i$ and $P(D | M_i, \theta)$ is the likelihood of the data given the model [$L(\theta; M_i, D)$]. In many cases the above integral cannot be evaluated analytically or it is infinite as when an improper prior is put on the parameter vector $\theta$. Approximation methods can be used \citet{HoetingEtAl1999}. Here you use the Bayesian Information Criterion (BIC) approximation, which is based on a Laplace expansion of the integral about the maximum likelihood estimates \citep{MadiganRaftery1994}. BMA keeps all candidate models assigning a probability based on how likely it would be for your data to have come from each model. A consensus model, representing a weighted average of all models, is then used to make predictions. If values for the prior parameters come from reasonably well-behaved distributions, then a consensus model from a BMA procedure yields the lowest mean square error of any single best model \citep{RafteryZheng2003}. BMA is described in more detail in \citet{HoetingEtAl1999} and \citet{Raftery1996}. It turns out BMA provides better coverage probabilities on the predictions than any single model \citep{RafteryZheng2003}. Consider a data record split into a training and testing set. Using the training set you can create $1- \alpha$ credible intervals on the predictions. Then using the testing set you can calculate the proportion of observations that lie within the credible intervals (coverage probability). In standard practice with a single best model, the credible intervals are too small resulting in coverage probabilities less than $1-\alpha$. Since BMA provides a larger variance than any model individually, the coverage probabilities on the predictions are greater or equal to $1-\alpha$. \subsection{Data plots} You again use the data set saved in file \verb@alldata.RData@ and described in Chapter~\ref{chap:datasets}. Load the data and create a new data frame subsetting only years since 1866. <>= load("alldata.R") dat=alldata[alldata$Year>=1866,] @ The counts are near-coastal hurricanes passing through the grids shown in Fig.~\ref{fig:coastalregions}. You consider monthly values of SST, SOI, NAO, and sunspots as covariates. The monthly covariate values are shown in Figure~\ref{fig:covariatesplot} as image plots. The monthly values for May through October on the vertical axis are plotted as a function of year on the horizontal axis. The values are shown using a color ramp from blue (low) to yellow (high). The SST and sunspot number (SSN) covariates are characterized by high month-to-month correlation as can be seen by the vertical striations. \begin{figure} \begin{center} <>= require(fields) ytb=colorRampPalette(c("#EDF8B1","#7FCDBB","#2C7FB8"),bias=1) covplot<-function(x=dat,months=5:10,col=ytb(100),field="sst") { leglab<-switch(field,sst=expression(paste("SST [",degree,"C]",sep="")),sun="Sunspot Count",nao="NAO [s.d.]",soi="SOI [s.d.]","") abc=switch(field,sst="a",nao="b",soi="c",sun="d") y<-dat[,paste(field,month.abb[months],sep=".")] image.plot(x=dat$Year,y=months,z=as.matrix(y),ylab="",xlab="Year",yaxt="n",col=col,las=1,mgp=c(1.8,0.7,0)) title("") axis(side=2,at=5:10,labels=c("May","Jun","Jul","Aug","Sep","Oct"),las=1,mgp=c(1.8,0.7,0)) mtext(text=leglab,side=4,line=0,las=0,cex=0.85) mtext(text=abc,side=3,line=0.2,adj=0) } op=par(mfrow=c(2,2),mex=.8,las=1) covplot(field="sst"); covplot(field="nao"); covplot(field="soi"); covplot(field="sun") par(op) @ \end{center} \caption{Covariates by month. (a) SST, (b) NAO, (c) SOI, and (d) sunspot number. The image plot shows the monthly values for May through October on the vertical axis as a function of year on the horizontal axis. Values are shown using a color ramp from yellow (low) to blue (high).} \label{fig:covariatesplot} \end{figure} \subsection{Results} You assume the logarithm of the annual hurricane rate is a linear combination of a fixed subset of your covariates (Poisson generalized linear model--GLM). With six months and four environmental variables per month you have $2^{24}$ or more than 16.5 million possible models. Calculations are carried out using the {\bf BMA} package \citep{RafteryEtAl2009}. Acquire the package by typing <>= require(BMA) source("bic.glm.r") @ Specifically, you use the \verb@bic.glm@ function for determining the probability on each model and the \verb@imageplot.bma@ function for displaying the results. First save the model formula involving the response variable (here \verb@US.1@) and all covariates. <>= fml=US.1~sst.Oct+sst.Sep+sst.Aug+sst.Jul+sst.Jun+ sst.May+nao.Oct+nao.Sep+nao.Aug+nao.Jul+nao.Jun+ nao.May+soi.Oct+soi.Sep+soi.Aug+soi.Jul+soi.Jun+ soi.May+sun.Oct+sun.Sep+sun.Aug+sun.Jul+sun.Jun+ sun.May @ Then save the models output from the \verb@bic.glm@ function. By default only models with a Bayesian Information Criterion (BIC) within a factor of 20 of the model with the lowest BIC are kept and only the top 150 models of each size (number of covariates) are considered. <>= mdls=bic.glm(f=fml,data=dat,glm.family="poisson") @ The function returns an object of class \verb@bic.glm@. The summary method is used to summarize the results. <>= summary(mdls, n.models=3) @ %<>= %require(xtable) %xt=xtable(summary(mdls,n.models=3, digits=2), caption='Selected models.', %label='tab:selectedmodels') %print(xt) %@ The top three models having the lowest BIC values (highest posterior probabilities) are given in the summary table. The table lists the intercept and covariates in the first column. The second column gives the posterior probability that the model parameter is not equal to zero across all \Sexpr{mdls$n.models} models. For example, June NAO covariate has a probability of \Sexpr{mdls$probne0[11]}~\% of being included in a model. The third and fourth columns are the posterior expected value and standard deviation across all models. The subsequent columns are the most probable models as indicated by values in rows corresponding to a covariate. The number of variables in the model, the model BIC, and the posterior probability are also given in the table. Models are ordered by BIC value with the first model having the lowest BIC value, the second model having the second lowest BIC, and so on. The value of BIC for a given model is $$-2\cdot\ln L + k \ln(n),$$ where $L$ is the likelihood evaluated at the parameter estimates, $k$ is the number of parameters to be estimated, and $n$ is the number of years. BIC includes a penalty term ($k\ln(n)$) which makes it useful for comparing models with different sizes. If the penalty term is removed, $-2\cdot\ln L$, can be reduced simply by increasing the number of model covariates. The BIC as a selection criterion results in choosing models that are parsimonious and asymptotically consistent, meaning that the model with the lowest BIC converges to the true model as the number of hurricane years increases. You use a plot method to display the model coefficients (by sign) ordered by decreasing posterior probabilities. <>= imageplot.bma(mdls) @ \begin{figure} \begin{center} <>= source("imageplot2.R") op=par(mfrow=c(1,1)) imageplot.bma2(mdls) par(op) @ \end{center} \caption{Covariates versus model. Models are listed along the horizontal axis by decreasing posterior probability. Covariates included in the particular model are shown with colored bars with brown indicating the covariate has a positive relationship with hurricane rate and green indicating it has a negative relationship. The bar width is proportional to the model's posterior probability.} \label{fig:imageplotmodels} \end{figure} <>= BMAfull=glm(family="poisson",data=dat,newformula,x=T,y=T) qrX=qr(BMAfull$x) qrQ=qr.Q(qrX) qrR=qr.R(qrX) colnames(qrQ)=colnames(BMAfull$x) qrdata=data.frame(US.1=BMAfull$y,qrQ) BMAalldataqr=bic.glm(glm.family="poisson",data=qrdata,newformula) @ The plot makes it easy to see which covariates are picked by the most probable models. They are the ones with the most consistent coloring from left to right across the image. A covariate with only a few gaps indicates that it is included in most of the most probable models. Here these include September and June SSN, June NAO, July SST, and any of the months of July through September for the SOI. Results also provoke physical insight. You might ask why July SST is selected as a model covariate more often than August and September? The answer lies in the fact that when the hurricanes arrive in August and September, they take heat from the ocean so the correlation between hurricane activity and SST weakens. That is, the thermodynamics of hurricane intensification works against the statistical correlation. Said another way, July SST better relates to an active hurricane season not because a warm ocean in July {\it causes} tropical cyclones in August and September, but because hurricanes in August and September cool the ocean. The SOI covariates get chosen frequently by the most probable models but with a mixture across the months of July through October. The posterior probability is somewhat higher for the months of June and October and smallest for August and September. With El Ni\~no conditions, convection over the eastern equatorial Pacific produces increased shear and subsidence across the Atlantic \citep{Gray1984}, but especially over the western Caribbean, where during the months of July and October a relatively larger percentage of the North Atlantic hurricane activity occurs. Moreover, the inhibiting influence of El Ni\~no might be less effective during the core months of August and September when, on average, other conditions tend to be favorable. The sign on the September SSN parameter is negative indicating that the probability of a U.S. hurricane decreases with increasing number of sunspots. This result accords with the hypothesis that increases in UV radiation from an active sun (greater number of sunspots) warms the upper troposphere resulting in greater thermodynamic stability and a lower probability of a hurricane over the western Caribbean and Gulf of Mexico \cite[]{ElsnerJagger2008, ElsnerEtAl2010}. The positive relationship between hurricane probability and June SSN is explained by the direct influence the sun has on ocean temperature. Other alternative explanations are possible especially in light of role the solar cycle likely plays in modulating the NAO \cite[]{Kodera2002, OgiEtAl2003}. You can find the probability that a covariate irrespective of month is chosen for at least one month by calculating the total posterior probability over all models that include this covariate during at least one month. First, use the \verb@substring@ function on the covariate names given in the \verb@bic.glm@ object under \verb@namesx@ to remove the last four characters in each name. Also create a character string containing only the unique names. <>= cn=substring(mdls$namesx,1,3) cnu=unique(cn) @ Next create a matrix of logical values using the \verb@outer@ function, which performs an outer product of matrices and arrays. Also assign column names to the matrix. <>= mn=outer(cn,cnu,"==") colnames(mn)=cnu @ Next perform a matrix multiplication of the matching names with the matrix of logical entries indicating which particular covariate was chosen. This returns a matrix with dimensions of number of models by number of covariate types. The matrix entries are the number of covariates in each model of that type. Finally multiply the posterior probabilities given under \verb@postprob@ by a logical version of this matrix using the condition that the covariate type is included. <>= xx=mdls$which%*%mn pc=100*mdls$postprob%*%(xx>0) round(pc,1) @ You see that the SOI and sunspot number have the largest probabilities at \Sexpr{round(pc[3],1)}~\% while the NAO and SST have posterior probabilities of \Sexpr{round(pc[2],1)}~\% and \Sexpr{round(pc[1],1)}~\%, respectively. The lower probability of choosing the NAO reflects the rather large intra-seasonal variability in this covariate as seen in Figure~\ref{fig:covariatesplot}. It is informative to compare the results of your BMA with a BMA performed on a random series of counts. Here you do this by resampling the actual hurricane counts. The randomization results in the same set of counts, but the counts are placed randomly across the years. The random series together with the covariates are used as before and the results mapped in Fig.~\ref{fig:bmarandom}. \begin{figure} \begin{center} <>= s1=dat s1$US.1=sample(s1$US.1) mdls1=bic.glm(glm.family="poisson",data=s1,fml) s1=dat s1$US.1=sample(s1$US.1) mdls2=bic.glm(glm.family="poisson",data=s1,fml) s1=dat s1$US.1=sample(s1$US.1) mdls3=bic.glm(glm.family="poisson",data=s1,fml) source("imageplot3.R") op=par(mfrow=c(2,2),mex=0.8,cex=0.6) imageplot.bma3(mdls, order="probne0") mtext(text="a",side=3,line=0.2,adj=0) imageplot.bma3(mdls1,order="probne0") mtext(text="b",side=3,line=0.2,adj=0) imageplot.bma3(mdls2,order="probne0") mtext(text="c",side=3,line=0.2,adj=0) imageplot.bma3(mdls3,order="probne0") mtext(text="d",side=3,line=0.2,adj=0) par(op) @ \end{center} \caption{Covariates versus model as in Figure~\ref{fig:imageplotmodels}. Covariates are ordered by posterior probability of inclusion in the set of most probable models. (a) The BMA procedure using the actual time series of U.S. hurricane counts, and (b--d) the BMA procedure using three randomly permuted series.} \label{fig:bmarandom} \end{figure} The comparison shows that your set of covariates has a meaningful relationship with U.S. hurricane activity. There are fewer models chosen with the randomized data sets and the number of variables included in set of most probable models is lower. In fact the averaged number of variables in the 20 most probable models is \Sexpr{round(mean(mdls$size),0)}, which compares with averages of \Sexpr{round(mean(mdls1$size),0)}, \Sexpr{round(mean(mdls2$size),0)}, and \Sexpr{round(mean(mdls3$size),0)} for the three randomized series. Moreover, there is little consistency in the variable selected from one model to the next as it should be. \subsection{Consensus hindcasts} As noted above, in contrast to selecting a single model the BMA procedure assigns a posterior probability to a set of the most probable models. Each model can be used to make a prediction. But which forecast should you believe? Fortunately no choice is necessary. Instead a prediction from each model is made and then the forecasts averaged. The average is weighted where the weights are proportional to the model's posterior probability. Here you assume perfect knowledge of all covariates every month and hindcasts are made in-sample. For an actual forecast situation this is not available, but the method would be the same. You use the prediction functions in {\bf prediction.R} to retrodict annual rates, rate distributions, and posterior count distributions. First source the code file then compute the mean and standard deviation of the annual rate for each year. Then from the output compute the in-sample average square error. <>= source("prediction.R") ar=bic.poisson(mdls,newdata=mdls$x, simple=TRUE) sqrt(mean((ar[1,]-mdls$y)^2)) @ Thus on average the consensus model results in a mean square error of \Sexpr{round(sqrt(mean((ar[1,]-mdls$y)^2)),1)} hurricanes per year. You compare hindcast probabilities from the consensus model between two years. Here you examine the consecutive years of 2007 and 2008. You determine the posterior probabilities for the number of hurricanes for each year for hurricane numbers between zero and eight and display them using a side-by-side bar plot. \begin{figure} \begin{center} <>= yr1=2007; yr2=2008 r1=which(dat$Year==yr1); r2=which(dat$Year==yr2) Pr=bic.poisson(mdls,newdata=mdls$x[c(r1,r2),], N=9) barplot(t(Pr),beside=TRUE,xlab="Number of Hurricanes", ylab="Probability",legend.text= c(as.character(yr1),as.character(yr2))) @ \end{center} \caption{Hindcasts from the consensus model. The vertical axis is the probability of observing $h$ number of hurricanes. Hindcasts are shown for the \Sexpr{yr1} and \Sexpr{yr2} hurricane seasons. There was \Sexpr{dat$US.1[dat$Year==yr1]} hurricane in \Sexpr{yr1} and \Sexpr{dat$US.1[dat$Year==yr2]} hurricanes in \Sexpr{yr2}.} \label{fig:compareyears} \end{figure} Results are shown in Figure~\ref{fig:compareyears}. The consensus hindcasts indicate a higher probability of at least one U.S. hurricane in 2008 compared with 2007. The model retrodicts a \Sexpr{round(sum(Pr[4:9,1])*100,0)}~\% chance of 3 or more hurricanes for 2007 and a \Sexpr{round(sum(Pr[4:9,2])*100,0)}~\% chance of 3 or more hurricanes for 2008. There was \Sexpr{dat$US.1[dat$Year==yr1]} hurricane in \Sexpr{yr1} and \Sexpr{dat$US.1[dat$Year==yr2]} hurricanes in \Sexpr{yr2}. The consensus model hindcasts larger probabilities of an extreme year given the rate than would be expected from a Poisson process. That is, the consensus model is over dispersed with respect to a Poisson distribution. This makes sense as model uncertainty is incorporated in the consensus hindcasts. To provide an estimate of how well the consensus model will do in predicting the future a cross validation of the BMA procedure is needed. This is done in \cite{JaggerElsner2010} using various scoring rules including the mean square error, the ranked probability score, the quadratic (Brier) score, and the logarithmic score. They find that for all scoring rules the consensus model provides more accurate predictions than a procedure that selects a single best model using either BIC or AIC. \section{Space-Time Modeling on Hexagons} \begin{verbatim} ___JAGS code___ model { for(yr in 1:T) { for(hx in 1:80) { # Poisson likelihood for observed counts h[hx,yr] ~ dpois(lambda[hx,yr]) mu[hx,yr] <- b0 + b1*SOI[yr] + b2*NAO[yr] + b3*SST[hx,yr] + eta[hx] lambda[hx,yr] <- exp(mu[hx,yr]) } } # Spatial model for(hx in 1:80){ eta[hx]=ueta[hx]+seta[hx] ueta[hx]~dnorm(0,utau) } # Priors utau ~ dgamma(.5,.0005) seta[1:80] ~ car.normal(adj[],weights[],num[],stau) stau ~ dgamma(.5,.0005) b0 ~ dnorm(0,.0001) b1 ~ dnorm(0,.0001) b2 ~ dnorm(0,.0001) b3 ~ dnorm(0,.0001) } _______________ \end{verbatim} Multilevel statistical models are now widely used in social science research. Here you apply this approach to model seasonal hurricane activity over the North Atlantic basin. Single-level models including linear and generalized linear models are used to describe basin-wide activity, but models capable of capturing within-region variations have not been attempted. The model makes use of your hexagon tessellation developed in Chapter~\ref{chap:spatialmodels}. The model can be useful in anticipating basin-wide and regional hurricane activity a few months in advance. The files are 1. Model: Hexagonwindsstsoi.odc 2. Data: WinBugsData.txt (generated by hexwinds.R) 3. Init1: WinBugsInit.txt Sample output files are CODAchain1.txt CODAindex.txt The model consists of a hexagonal tessellation of a Lambert conformal conic projection over the North Atlantic, with each cell containing the number of hurricanes whose centers passed through for each year from 1866 through 2009. The associated covariates are the local SST and SOI. The local SST is constructed by averaging the monthly gridded SST over each hexagon. A single value for each year in each hexagon is generated by averaging the August-October months for each year from 1866-2009. The model is a Poisson regression with a logarithmic link function. The model consists of three independent spatial factors each of which is an unconstrained ICAR model using a nearest neighbor. The spatial factors are the local SST regression coefficient, the SOI regression coefficient and the intercept. The Spatial hyper priors (taus) are given a gamma (conjugate) distribution with shape=.5 and scale =.005 this prior has a 1\% probability that the standard deviation is less than .04 and a 1\% probability that the standard deviation is greater than 8. Given the expected values of the covariates (soi about .2), (sst about .5) these seem reasonable. We could have chosen to combine the models into a multivariate CAR model, with a Wishart prior, say with 4 degrees of freedom and a diagonal of .05 about. <>= #library(lme4) #library(arm) #radon=read.csv("radon.txt",header=T) #radonMN=subset(radon,state=="MN") #y=radonMN$activity #x=radonMN$floor #county=radonMN$county #M0=lmer(y~1+(1|county)) #display(M0) #M1=lmer(y~x+(1|county)) #display(M1) @ For modeling these data you use a hierarchical model proposed by \citep{GelmanEtAl2003}. You assume a normal distribution for the observed estimate for each school with mean \verb@theta@ and inverse-variance \verb@tau.y@. The inverse-variance is given as \verb@1/sigma.y@$^2$ and its prior distibution is uniform on the interval (0, 1000). For the mean \verb@theta@, you employ another normal distribution with mean \verb@mu.theta@ and inverse-variance \verb@tau.theta@. The WinBUGS code is given here and it must be stored in a separate file. Here you store it in {\bf schools.txt}. \begin{verbatim} ___WinBUGS code___ model{ for (j in 1:J) { y[j] ~ dnorm (theta[j], tau.y[j]) theta[j] ~ dnorm(mu.theta, tau.theta) tau.y[j] <- pow(sigma.y[j], -2) } mu.theta ~ dnorm(0.0, 1.0E-6) tau.theta <- pow(sigma.theta, -2) sigma.theta ~ dunif(0, 1000) } _______________ \end{verbatim} Next you prepare the data inputs required by the \verb@bugs@ function from the {\bf R2WinBUGS} package. This can be a list containing the name of each data vector. <>= library(R2WinBUGS) data(schools) J = nrow(schools) y = schools$estimate sigma.y = schools$sd data = list("J", "y", "sigma.y") @ Using these data together with the model, you can run an MCMC simulation to get estimates for \verb@theta@, \verb@mu.theta@, and \verb@sigma.theta@. Beforehand you decide how many chains to run for how many interations. If the length of length of the burn-in is not specified, then the burn-in is taken as half the number of iterations. You also specify the starting values for the chains. Here you do this by writing the following function <>= inits=function(){ list(theta=rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100), sigma.theta = runif(1, 0, 100)) } @ You start the MCMC by typing <>= schools.sim = bugs(data, inits, model.file="schools.txt", parameters = c("theta", "mu.theta", "sigma.theta"), n.chains = 3, n.iter = 1000, bugs.directory = "C:/Program Files/WinBUGS14") @ The argument \verb@bugs.directory@ must point to the directory containing the WinBUGS executable and you must have write permission in this directory. The results are saved in the bugs object \verb@schools.sim@ and are printed using the \verb@print@ method. Additionally you use the \verb@plot@ method to create graphs summarizing the posterior statistics of your parameters and displaying diagnostics relating to convergence to the chains. The chains are convergent when your samples are taken from the posterior. $\hat R$ is close to 1 for all parameters indicating good mixing of the three chains and thus approximate convergence. The \verb@bugs@ function uses the parameter names you gave in the WinBUGS text file to structure the output into scalar, vector, and arrays of the parameters. In addition the parameter values are stored in a long vector. Greater flexibility in summarizing and plotting your results are available with function in the {\bf coda} package \citep{PlummerEtAl2010}. You turn on the \verb@codaPkg@ switch and rerun the simulations. <>= schools.sim = bugs(data, inits, model.file="schools.txt", parameters = c("theta", "mu.theta", "sigma.theta"), n.chains = 3, n.iter = 1000, codaPkg = TRUE, bugs.directory = "C:/Program Files/WinBUGS14") @ Now \verb@schools.sim@ is a character vector of files names with each file containing coda output for one of the three chains. You create a list object with the \verb@read.bugs@ function. <>= out=read.bugs(schools.sim, quiet=TRUE) @ To plot a histogram of the 500 posterior samples of \verb@mu.theta@ type <>= hist(out[[1]][,2]) @ Additional plotting options are available using the {\bf lattice} package. To plot the density of all the model parameters type <>= library(lattice) densityplot(out) @