“We must think about what our models mean, regardless of fit, or we will promulgate nonsense.”—Leland Wilkinson

data files: read.table(“LMI.txt”, T); read.table(“SST.txt”, T); read.table(“SOI.txt”, T); load(“catwinds.RData”); load(“catcounts.RData”) packages: quantreg, xtable, ismev, gamlss, gamlss.cens source code: source(“CountyWinds.R”) third party: NONE

Often interest is on the strongest. Hurricanes like Camille in 1969, Andrew in 1992, and Katrina in 2005 cause catastrophic damage. It's important to have an estimate of when the next big one will occur. You also might want to know what influences the strongest hurricanes and if they are getting stronger.

Here we show you how to model hurricane intensity. The data are basin-wide lifetime highest intensities for individual tropical cyclones over the North Atlantic and county-level hurricane wind intervals.

We begin by considering trends using the method of quantile regression and then examine extreme-value models for estimating return period of the strongest hurricanes. We also look at modeling cyclone winds when the values are given by category and use Miami-Dade County as an example.

Here we consider cyclones above tropical storm intensity (\( \ge \) 17 m~s\( ^{-1} \)) during the period 1967–2010, inclusive. The period is long enough to see changes but not too long that it includes intensity estimates before satellite observations. We use 'intensity' and 'strength' synonymously to mean the fastest wind inside the hurricane.

Consider the set of events defined by the location and wind speed at which a tropical cyclone first reaches its lifetime maximum intensity. The data are in the file LMI.txt.

Import and list the values in ten columns of the first six rows of the data frame by typing

```
con = "http://www.hurricaneclimate.com/storage/chapter-8/LMI.txt"
LMI.df = read.table(con, header = TRUE)
round(head(LMI.df)[c(1, 5:9, 12, 16)], 1)
```

```
## Sid Yr Mo Da hr lon WmaxS maguv
## 26637.5 941 1967 9 3 17 -52.2 70.5 27.5
## 26703.4 942 1967 9 20 10 -97.1 136.2 8.0
## 26747.2 943 1967 9 13 2 -51.0 94.5 4.2
## 26807.2 944 1967 9 13 20 -65.0 74.3 3.8
## 26849.5 945 1967 9 28 23 -56.9 47.3 9.0
## 26867 946 1967 10 3 0 -93.7 69.0 5.6
```

Interest centers on the smoothed intensity estimate at the time of lifetime maximum (WmaxS).

First convert the wind speeds from the operational units of knots to the SI units of m~s\( ^{-1} \).

```
LMI.df$WmaxS = LMI.df$WmaxS * 0.5144
```

Next, determine the quartiles (0.25 and 0.75 quantiles) of the wind speed distribution. The quartiles divide the cumulative distribution function (CDF) into three equal-sized subsets.

```
quantile(LMI.df$WmaxS, c(0.25, 0.75))
```

```
## 25% 75%
## 25.50 45.97
```

We find that 25% of the cyclones have a maximum wind speed less than 26 m~s\( ^{-1} \) and 75% have a maximum wind speed less than 46 m~s\( ^{-1} \) so that 50% of all cyclones have a maximum wind speed between 26 and 46 m~s\( ^{-1} \) (interquartile range–IQR).

Given a sample of maximum wind speeds \( w_1,\ldots, w_n \), the $\tau$th sample quantile is the $\tau$th quantile of the corresponding empirical CDF. Formally, let \( W \) be a random maximum storm intensity then the $k$th \( q \)-quantile is defined as the value \( w \) such that \[ p(W \le w) \ge \tau \hbox{ and } p(W \ge w) \ge 1 - \tau, \] where \( \tau=\frac{k}{n} \).

The CDF appears to have three distinct regions, indicated by the vertical lines. The function is nearly a straight line for intensities less than 40 m~s\( ^{-1} \) and greater than 65 m~s\( ^{-1} \).

```
pt = seq(0, 1, 0.01)
qp = quantile(LMI.df$WmaxS, prob = pt)
par(mfrow = c(1, 2), las = 1, mgp = c(2, 0.4, 0), tcl = -0.3)
plot(ecdf(LMI.df$WmaxS), xlab = "Wind speed [m/s]", ylab = "Cumulative distribution",
main = "")
abline(v = 40, col = "red")
abline(v = 65, col = "red")
rug(LMI.df$WmaxS)
mtext("a", side = 3, line = 1, adj = 0, cex = 1.1)
plot(pt, qp, type = "l", xlab = "Quantile", ylab = "Wind speed [m s$^{-1}$]",
lwd = 2)
mtext("b", side = 3, line = 1, adj = 0, cex = 1.1)
```

Is there a trend in cyclone intensities? Start with a plot of your data. By specifying the first argument in the boxplot() function as a model formula you create a sequence of conditional box plots.

For example, to create a series of wind speed box plot conditional on year, type

```
boxplot(LMI.df$WmaxS ~ as.factor(LMI.df$SYear))
```

The conditioning variable must be specified as a factor. The graph is useful for a quick look at the distribution of your wind speed data over time.

Earlier you created a series of box plots of the SOI by month that minimized the amount of redundant ink. Here you reuse this code, modifying it a bit, to create a series of wind speed box plots by year.

Begin by creating a vector of years and saving the length of the vector as a numeric object.

```
yrs = 1967:2010
n = length(yrs)
```

Next create the plot frame without the data and without the horizontal axis tic labels. You also add a label to the vertical axis.

```
plot(c(1967, 2010), c(15, 85), type = "n", xaxt = "n", bty = "n", xlab = "",
ylab = "Lifetime maximum wind speed (m/s)")
axis(1, at = yrs, labels = yrs, cex = 0.5)
```

The fivenum() function lists the minimum, first quartile, median, third quartile, and maximum value in that order, so to obtain the median value from a vector of values called x, you type fivenum(x)[3].

Thus you loop over each year indexed by i and plot the median wind speed value for that year as a point using the points() function. In the same loop you create vertical lines connecting the minimum with the first quartile and the third quartile with the maximum using the lines() function.

```
for (i in 1:n) {
fn = fivenum(LMI.df$WmaxS[LMI.df$SYear == yrs[i]])
points(yrs[i], fn[3], pch = 19)
lines(c(yrs[i], yrs[i]), c(fn[1], fn[2]))
lines(c(yrs[i], yrs[i]), c(fn[4], fn[5]))
}
```

```
## Error: plot.new has not been called yet
```

Here we added the least-squares regression line about the annual mean lifetime highest wind speed (black line) and the least-squares regression line about the annual lifetime highest wind speed (red). While there is no upward or downward trend in the average cyclone intensity, there is an upward trend to the set of strongest cyclones.

```
maxw = numeric()
plot(c(1967, 2010), c(15, 85), type = "n", xaxt = "n", bty = "n", xlab = "Year",
ylab = "Wind speed [m s$^{-1}$]")
axis(1, at = yrs, labels = yrs, cex = 0.5)
for (i in 1:n) {
fn = fivenum(LMI.df$WmaxS[LMI.df$SYear == yrs[i]])
points(yrs[i], fn[3], pch = 19)
lines(c(yrs[i], yrs[i]), c(fn[1], fn[2]))
lines(c(yrs[i], yrs[i]), c(fn[4], fn[5]))
maxw[i] = fn[5]
}
abline(lm(maxw ~ yrs), col = "red", lwd = 2)
abline(lm(WmaxS ~ SYear, data = LMI.df), lwd = 2)
```

The theory of maximum potential intensity, which relates intensity to ocean heat, refers to a theoretical limit given the thermodynamic conditions (Emanuel 1988). So the upward trend in the observed lifetime maximum intensity is physically consistent with what you expect given the increasing ocean temperature.

It is informative then to explore the relationship of lifetime highest wind speed to ocean temperature. Get the monthly North Atlantic sea-surface temperature (SST) data by typing

```
con = "http://www.hurricaneclimate.com/storage/chapter-8/SST.txt"
SST = read.table(con, header = TRUE)
```

Here you subset the data on years since 1967, and keep June values only.

```
lg = SST$Year >= 1967
sst.df = data.frame(Yr = SST$Year[lg], sst = SST$Jun[lg])
```

Next merge your SST data frame with your cyclone intensity data.

This is done using the merge() function. Merge is performed on the common column name Yr as specified with the by argument.

```
lmisst.df = merge(LMI.df, sst.df, by = "Yr")
head(lmisst.df[c("Yr", "WmaxS", "sst")])
```

```
## Yr WmaxS sst
## 1 1967 36.27 21
## 2 1967 70.06 21
## 3 1967 48.59 21
## 4 1967 38.23 21
## 5 1967 24.33 21
## 6 1967 35.52 21
```

There are more instances of Yr in the intensity data frame (one for each cyclone) so the June SST values in the SST data frame get duplicated. All cyclones for a particular year get the same SST value as it should be.

You are interested in regressing cyclone intensity on SST as you did above on the year, but the SST values are continuous rather than discrete. So you first create SST intervals. This is done with the cut() function.

```
brk = quantile(lmisst.df$sst, prob = seq(0, 1, 0.2))
sst.i = cut(lmisst.df$sst, brk, include.lowest = TRUE)
```

Your cuts divide the SST values into five equal quantiles (pentiles). The intervals represent categories of much below normal, below normal, normal, above normal, and much above normal SST. The choice of quantiles is a compromise between having enough years for a given range of SSTs and having enough quantiles to assess differences.

Repeat this procedure for your SOI data. You create a merged data frame and cut the SOI values into pentads.

```
con = "http://www.hurricaneclimate.com/storage/chapter-8/SOI.txt"
SOI = read.table(con, header = TRUE)
lg = SOI$Year >= 1967
soi.df = data.frame(Yr = SOI$Year[lg], soi = SOI$Sep[lg])
lmisoi.df = merge(LMI.df, soi.df, by = "Yr")
brk = quantile(lmisoi.df$soi, prob = seq(0, 1, 0.2))
soi.i = cut(lmisoi.df$soi, brk, include.lowest = TRUE)
```

Finally, you create a series of box plots corresponding to the SST intervals. Begin by creating a character vector of horizontal axis labels corresponding to the SST intervals and, to simplify the code, save the wind speeds as a vector.

```
xlabs = c("MB", "B", "N", "A", "MA")
W = lmisst.df$WmaxS
y = boxplot(W ~ sst.i, plot = FALSE)
```

Plot and add regression lines through the medians and third quartile values using the saved statistics of the box plot and regressing on the sequence from one to five. Repeat the code using the September SOI covariate and create two box plots.

```
par(mfrow = c(1, 2), las = 1, mgp = c(2, 0.4, 0), tcl = -0.3)
y = boxplot(W ~ sst.i, plot = FALSE)
boxplot(W ~ sst.i, notch = TRUE, names = xlabs, xlab = "June SST pentile", ylab = "Wind speed [m/s]",
cex.axis = 0.7)
x = 1:5
abline(lm(y$stats[3, ] ~ x), lwd = 2)
abline(lm(y$stats[4, ] ~ x), col = "red", lwd = 2)
mtext("a", side = 3, line = 1, at = 0, cex = 1.1)
y = boxplot(W ~ soi.i, plot = FALSE)
boxplot(W ~ soi.i, notch = TRUE, names = xlabs, xlab = "September SOI pentile",
ylab = "Wind speed [m/s]", cex.axis = 0.7)
abline(lm(y$stats[3, ] ~ x), lwd = 2)
abline(lm(y$stats[4, ] ~ x), col = "red", lwd = 2)
mtext("b", side = 3, line = 1, at = 0, cex = 1.1)
```

The first pentad is the lowest 20% of all values. The upper and lower limits of the boxes represent the first and third quartiles of cyclone intensity. The median for each group is represented by the horizontal bar in the middle of each box.

Notches on the box sides represent an estimated confidence interval about the median. The full range of the observed intensities in each group is represented by the horizontal bars at the end of the dashed whiskers.

In cases where the whiskers would extend more than one and half times the interquartile range, they are truncated and the remaining outlying points are indicated by open circles. The red line is the best-fit line through the upper quartile and the black line is through the medians.

The box plot summarizes the distribution of maximum storm intensity by pentiles of the covariate. The graphs show a tendency for the upper quantiles of cyclone intensity values to increase with both SST and SOI.

As SST increases so does the intensity of the strongest cyclones. Also as SOI increases (toward more La Nina-like conditions) so does the intensity of the cyclones. Results from this exploratory analysis give you reason to continue your investigation.

The next step is to model these data. The box plots provide evidence that a model for the mean will not capture the relationships as the trends are larger for higher quantiles. So instead of linear regression you use quantile regression.

The quantile function and the conditional box plots shown above are useful for exploratory analysis. They are adequate for describing and comparing univariate distributions. However, since you are interested in modeling the relationship between a response variable (intensity) and the covariates (SST and SOI) it is necessary to introduce a regression-type model for the quantile function.

Quantile regression extends ordinary least squares regression model to conditional quantiles of the response variable. Although you used linear regression on the conditional quantiles in the plots above, this is not the same as quantile regression on the covariates.

Quantile regression allows you to examine the relationship without the need to consider discrete levels of the covariate. Ordinary regression model specifies how the mean changes with changes in the covariates while the quantile regression model specifies how the quantile changes with changes in the covariates. Quantile regression relies on empirical quantiles, but uses parameters to assess the relationship between the quantile and the covariates.

The quantile regression model with two covariates is given by \[ \hat \mu(\tau) = \hat \beta_0(\tau) + \hat \beta_1(\tau) x_1 + \hat \beta_2 (\tau) x_2\\ \] where \( \hat \mu(\tau) \) is the predicted conditional quantile of tropical cyclone intensity (\( W \)) and where the \( \hat \beta_i \)'s are obtained by minimizing the piecewise linear least absolute deviation function given by \[ \frac{1-\tau}{n}\sum_{w_i < q_i}|w_i - q_i| + \frac{\tau}{n}\sum_{w_i > q_i}|w_i - q_i|. \] for a given \( \tau \), where \( q_i \) is the predicted \( \tau \) quantile corresponding to observation \( i \) (\( \hat \mu_i(\tau) \)).

The value of a simple trend analysis (involving only one variable—usually time) is limited by the fact that other explanatory variables also might be trending.

It is well known that the ENSO cycle can significantly alter the frequency and intensity of hurricane activity on the seasonal time scale. A trend over time in hurricane intensity could simply reflect a change in this cycle.

Thus it is important to look at the trend after controlling for this important factor. Here we show the trend as a function of Atlantic SST after controlling for the ENSO cycle. Thus we answer the question of whether the data support the contention that the increasing trend in the intensity of the strongest hurricanes is related to an increase in ocean warmth conditional on ENSO.

Here we use the **quantreg** package for performing quantile regression developed by Roger Koenker. Load the package and print a BibTeX citation.

```
require(quantreg)
x = citation(package = "quantreg")
toBibtex(x)
```

```
## @Manual{,
## title = {quantreg: Quantile Regression},
## author = {Roger Koenker},
## year = {2012},
## note = {R package version 4.94},
## url = {http://CRAN.R-project.org/package=quantreg},
## }
```

Begin with median regression. Here \( \tau \) is set to 0.5 and is specified with the tau argument. The function rq performs the regression. The output is assigned to the object qrm.

```
Year = lmisst.df$Yr
W = lmisst.df$WmaxS
SOI = lmisoi.df$soi
SST = lmisst.df$sst
qrm = rq(W ~ Year + SST + SOI, tau = 0.5)
```

By default a simplex method is used to fit the regression. It is a variant of the Barrodale and Roberts (1974) approach described in Koenkerd and Orey (1987).

You obtain a concise summary of the regression results by typing

```
qrm
```

```
## Call:
## rq(formula = W ~ Year + SST + SOI, tau = 0.5)
##
## Coefficients:
## (Intercept) Year SST SOI
## 238.0389 -0.2210 11.0087 0.8268
##
## Degrees of freedom: 500 total; 496 residual
```

The output shows the estimated coefficients and information about the degrees of freedom. You find that the median lifetime intensity decreases with year (negative trend) and increases with both SST and the SOI.

More details are available through the the summary() method.

```
summary(qrm)
```

```
##
## Call: rq(formula = W ~ Year + SST + SOI, tau = 0.5)
##
## tau: [1] 0.5
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) 238.03885 11.67072 414.77836
## Year -0.22095 -0.33418 -0.05102
## SST 11.00873 0.73362 16.98951
## SOI 0.82685 -0.65041 1.92524
```

The table gives the estimated coefficients and confidence intervals (95%) for these parameters. The confidence intervals are computed by the rank inversion method developed in Koenker (2005).

The confidence interval includes zero for Year and the SOI indicating these terms are not significant in explaining the median per cyclone intensity. However, the SST variable is significant and positive. The relationship indicates that for every 1$^{\circ\( C increase in SST the median intensity increases by 11 m~s \){-1}$.}

But this seems too large (by an order of magnitude) given the box plot above and the range of SST values.

```
range(SST)
```

```
## [1] 20.78 21.83
```

The problem stems from the other variables in the model. To see this, refit the regression model removing the variables that are not significant.

```
qrm2 = rq(W ~ SST, tau = 0.5)
summary(qrm2)
```

```
## Warning: Solution may be nonunique
```

```
##
## Call: rq(formula = W ~ SST, tau = 0.5)
##
## tau: [1] 0.5
##
## Coefficients:
## coefficients lower bd upper bd
## (Intercept) -14.227 -141.106 63.122
## SST 2.226 -1.438 8.176
```

Now the relationship indicates that for every 1$^{\circ\( C increase in SST the median intensity increases by 2.23 m~s \){-1}$.} This amount is not statistically significant as you might have guessed from your exploratory plot.

The theory of maximum potential intensity relates a theoretical highest wind speed to ocean temperature so it is interesting to consider quantiles above the median. You repeat the modeling exercise using \( \tau \) = 0.9. Here again you find year and SOI not significant so you exclude them in your final model.

```
summary(rq(W ~ SST, tau = 0.9), se = "iid")
```

```
##
## Call: rq(formula = W ~ SST, tau = 0.9)
##
## tau: [1] 0.9
##
## Coefficients:
## Value Std. Error t value Pr(>|t|)
## (Intercept) -307.30045 68.69143 -4.47364 0.00001
## SST 17.15850 3.21853 5.33117 0.00000
```

Here instead of the rank-inversion CI you obtain a table of coefficients that includes standard errors, \( t \)-statistics, and \( p \)-values using the se=“iid” argument from the summary() function.

As anticipated from theory and your exploratory data analysis, you see a statistically significant positive relationship between cyclone intensity and SST for the top 10% of the strongest tropical cyclones.

The estimated coefficient indicates that for every 1$^{\circ\( C increase in SST the upper percentile intensity increases by -0.2 m~s \){-1}$.}

Other options exist for computing standard errors including a bootstrap approach (se=“boot”; see Koenker (2005).

```
iid = summary(rq(W ~ SST, tau = 0.9), se = "iid")
boot = summary(rq(W ~ SST, tau = 0.9), se = "boot")
percinc = (boot$coef[2, 2] - iid$coef[2, 2])/iid$coef[2, 2] * 100
```

The bootstrap standard error is

```
Error in parse(text = code[i], srcfile = NULL) : 1:40: unexpected symbol
1: I(round(boot$coef[2,2],2)) (difference of
^
```

r I(round(percinc))%). The larger standard error results in a significance level that is somewhat less, but the results still provide conclusive evidence of a climate change signal.

To visualize the intensity-SST relationship in more detail you plot several quantile regression lines on a scatter plot. For reference you include the least-squares regression line.

The code is given below. Note that you use type=“n” in the plot function and use the points() function to add the points last so the lines do not cover them.

```
par(las = 1, mgp = c(2, 0.4, 0), tcl = -0.3)
plot(SST, W, type = "n", xlab = "SST [$^\\circ$C]", ylab = "Wind speed [m s$^{-1}$]")
abline(lm(W ~ SST), col = "red", lwd = 2)
taus = c(0.1, 0.25, 0.5, 0.75, 0.9)
for (i in 1:length(taus)) {
abline(rq(W ~ SST, tau = taus[i]), col = "gray", lwd = 2)
}
points(SST, W, pch = 19, cex = 0.4)
```

The 0.1, 0.25, 0.5, 0.75, and 0.9 quantile regression lines are shown in gray and the least squares regression line about the mean is shown in red. Trend lines are close to horizontal for the weaker tropical cyclones but have a significant upward slope for the stronger cyclones.

To see all the distinct quantile regressions for a particular model you specify a tau=-1. For example, save the quantile regressions of wind speed on SST and SOI in an object by typing

```
model = rq(W ~ SST + SOI, tau = -1)
```

This will cause the rq() function to find the entire sample path of the quantile process. The returned object is of class rq.process.

You plot the regression coefficients for each variable in the model as a function of quantile by typing

```
plot.rq.process(model)
```

Another plot at discrete quantiles is made by typing.

```
model = rq(W ~ SST + SOI, tau = seq(0.025, 0.975, 0.05))
labs = round(quantile(W, seq(0.2, 0.8, 0.2)))
par(las = 1, mgp = c(2, 0.4, 0), tcl = -0.3)
plot(summary(model, se = "boot"), parm = 2, lcol = "transparent", xaxt = "n",
mar = c(5, 5, 4, 2) + 0.1, pch = 16, lwd = 2, ylab = expression(paste(beta[SST],
" [m/s/C]")), xlab = expression(paste("Wind speed quantile [", tau,
"]")), main = "")
grid()
abline(h = 0)
axis(3, at = seq(0.2, 0.8, 0.2), labels = labs)
mtext("Lifetime highest wind speed [m/s]", side = 3, line = 2)
```

Values of \( \tau \) range from 0.025 to 0.975 in intervals of 0.05. The 95% confidence band (gray) is based on a bootstrap method. The plot shows the rising trend of the most intense hurricanes as the ocean temperatures rise after accounting for the El Nino.

The trends depart from zero for quantiles above about 0.4 and become significant for cyclones exceed about 50 m~s\( ^{-1} \). Additional capabilities for quantile modeling and inference are available in the **quantreg** package. Next we consider a model for the most intense hurricanes.