Chapter 6: Hurricane Data

“Data, data, data, I cannot make bricks without clay.”—Sherlock Holmes


Hurricane data originate from careful analysis by operational meteorologists. The data include estimates of the hurricane position and intensity at six-hourly intervals. Information related to landfall time, local wind speeds, damages, deaths as well as cyclone size are included. The data are archived by season.

Effort is needed to make the data useful for climate studies. Here we describe the datasets used throughout the book. We show you a work flow that includes import into R, interpolation, smoothing, and adding additional attributes.

We show you how to create useful subsets of the data. Code in this chapter is more complicated and it can take longer to run. You can skip this material on first reading and return to it when you need an updated version of the data that includes the most recent years.


Most statistical models in this book use the best-track data. Here we describe them and provide original source material. We also explain how to smooth and interpolate them. Interpolations and derivatives are needed for regional analysis.


The so-called best-track dataset contains the six-hourly center locations and intensities of all known tropical cyclones across the North Atlantic basin including the Gulf of Mexico and Caribbean Sea. The dataset is called HURDAT for HURricane DATa. It is maintained by the U.S. National Oceanic and Atmospheric Administration (NOAA) at the National Hurricane Center (NHC).

Center locations are given in geographic coordinates (in tenths of degrees) and the intensities, representing the one-minute near-surface (~ 10 m) wind speeds, are given in knots (1 kt = .5144 m/s) and the minimum central pressures are given in millibars (1 mb = 1 hPa). The data are provided in six-hourly intervals starting at 00 UTC (Universal Time Coordinate).

The version of HURDAT file used here contains cyclones over the period 1851 through 2010 inclusive (From, August 2011.) Information on the history and origin of these data is found in (Jarvinen et al. 1984).

The file has a logical structure that makes it easy to read with FORTRAN. Each cyclone contains a header record, a series of data records, and a trailer record. Original best-track data for the first cyclone in the file is shown below.

00005 06/25/1851 M= 4 1 SNBR= 1 NOT NAMED XING=1 SSS=1
00010 06/25280 948 80 0*280 954 80 0*280 960 80 0*281 965 80 0
00015 06/26282 970 70 0*283 976 60 0*284 983 60 0*286 989 50 0
00020 06/27290 994 50 0*295 998 40 0*3001000 40 0*3051001 40 0
00025 06/283101002 40 0 0 0 0 0* 0 0 0 0* 0 0 0 0*
00030 HRBTX1

The header (beginning with 00005) and trailer (beginning with 00030) records are single rows. The header has eight fields. The first field is the line number in intervals of five and padded with leading zeros. The second is the start day for the cyclone in MM/DD/YYYY format.

The third is M= 4 indicating four data records to follow before the trailer record. The fourth field is a number indicating the cyclone sequence for the season, here 1 indicates the first cyclone of 1851. The fifth field, beginning with SNBR=, is the cyclone number over all cyclones and all seasons.

The sixth field is the cyclone name. Cyclones were named beginning in 1950. The seventh field indicates whether the cyclone made hit the United States with XING=1 indicating it did and XING=0 indicating it did not. A hit is defined as the center of the cyclone crossed the coast on the continental United States as a tropical storm or hurricane.

The final field indicates the Saffir-Simpson hurricane scale (1 to 5) impact in the United States based on the estimated maximum sustained winds at the coast. The value 0 was used to indicate U.S. tropical storm landfalls, but has been deprecated.

The next four rows contain the data records. Each row has the same format. The first field is again the line number. The second field is the cyclone day in MM/DD format. The next 16 fields are divided into four blocks of four fields each. The first block is the 00 UTC record and the next three blocks are in six-hour increments (6, 12, and 18 UTC).

Each block is the same and begins with a code indicating the stage of the cyclone, tropical cyclone *, subtropical cyclone S, extratropical low E, wave W, and remanent low L. The three digits immediately to the right of the cyclone stage code is the latitude of the center position in tenths of degree north (280 is 28.0N) and the next four digits are the longitude in tenths of a degree west (948 is 94.8W) followed by a space.

The third set of three digits is the maximum sustained (one minute) surface (10 m) wind speed in knots. These are estimated to the nearest 10 kt for cyclones prior to 1886 and to 5 kt afterwards. The final four digits after another space is the central surface pressure of the cyclone in mb if available. If not the field is given a zero. Central pressures are available for all cyclones after 1978.

The trailer has at least two fields. The first field is again the line number. The second field is the maximum intensity of the cyclone as a code using HR for hurricane, TS for tropical storm, and SS for subtropical storm. If there are additional fields they relate to landfall in the United States. The fields are given in groups of four with the first three indicating location by state and the last indicating the Saffir-Simpson scale based on wind speeds in the state.

Two-letter state abbreviations are used with the exception of Texas and Florida, which are further subdivided as follows: ATX, BTX, CTX for south, central, and north Texas, respectively and AFL, BFL, CFL, and DFL for northwest, southwest, southeast, and northeast Florida, respectively. An I is used as a prefix in cases where a cyclone has had a hurricane impact across a non-coastal state.


The HURDAT file is appended each year with the set of cyclones from the previous season. The latest version is available usually by late spring or early summer from Additional modifications to older cyclones are made when newer information becomes available. After downloading the HURDAT file we use a FORTRAN executable for the Windows platform (BT2flat.exe) to create a csv file (BTflat.csv) listing the data records. The file is created by typing the following in a terminal.

BT2flat.exe tracks.txt > BTflat.csv

The resulting comma separate flat file is read into R and the lines between the separate cyclone records removed by typing

con = ""
best = read.csv(con)
best = best[![, 1]), ]

Adjustment are made to change the hours to ones, the longitude to degrees east, and the column name for the type of cyclone.

best$hr = best$hr/100
best$lon = -best$lon
east = best$lon < -180
best$lon[east] = 360 + best$lon[east]
names(best)[12] = "Type"

The first six lines of the data frame are shown here (head(best)).

SYear Sn name Yr Mo Da hr lat lon Wmax pmin Type
1 1851 1 NOT NAMED 1851 6 25 0 28.0 -94.8 80 0 *
2 1851 1 NOT NAMED 1851 6 25 6 28.0 -95.4 80 0 *
3 1851 1 NOT NAMED 1851 6 25 12 28.0 -96.0 80 0 *
4 1851 1 NOT NAMED 1851 6 25 18 28.1 -96.5 80 0 *
5 1851 1 NOT NAMED 1851 6 26 0 28.2 -97.0 70 0 *
6 1851 1 NOT NAMED 1851 6 26 6 28.3 -97.6 60 0 *

Note the 10 kt precision on the Wmax column. This is reduced to 5 kt from 1886 onward.

Unique cyclones in the data frame are identified by SYear and Sn, but not by a single column identifier. To make it easier to subset by cyclone you add one as follows. First, use the paste function to create a character id string that combines the two columns. Second, table the number of cyclone records with each character id and save these as an integer vector (nrs). Third, create a structured vector indexing the number of cyclones begining with the first one. Fourth, repeat the index by the number of records in each cyclone and save the result in a Sid vector.

id = paste(best$SYear, format(best$Sn), sep = ":")
nrs = as.vector(table(id))
cycn = 1:length(nrs)
Sid = rep(cycn, nrs[cycn])

Next you create a column identifying the cyclone hours. This is needed to perform time interpolations. Begin by creating a character vector with strings identifying the year, month, day, and hour. Note that first you need to take care of years when cyclones crossed into a new calendar year. In the best-track file the year remains the year of the season. The character vector is turned into a POSIXlt object with the strptime function() with the time zone argument set to GMT (UTC).

yrs = best$Yr
mos = best$Mo
yrs[mos == 1] = yrs[mos == 1] + 1
dtc = paste(yrs, "-", mos, "-", best$Da, " ", best$hr, ":00:00", sep = "")
dt = strptime(dtc, format = "%Y-%m-%d %H:%M:%S", tz = "GMT")

Each cyclone record begins at either 0, 6, 12, or 18 UTC. Retrieve those hours for each cyclone using the cumsum function and the number of cyclone records as an index. Offsets are needed for the first and last cyclone. Then sub sample the time vector obtained above at the corresponding values of the index and populate those times for all cyclone records. Finally the cyclone hour is the time difference between the two time vectors in units of hours and is saved as a numeric vector Shour.

i0 = c(1, cumsum(nrs[-length(nrs)]) + 1)
dt0 = dt[i0]
dt1 = rep(dt0, nrs[cycn])
Shour = as.vector(difftime(dt, dt1, units = "hours"))
best$Sid = Sid
best$Shour = Shour
## [1] 41192    14

The best-track data provides information on 1442 individual tropical cyclones over the period 1851–2010, inclusive. The data frame you created contains these data in 41192 separate six-hourly records each having 14 columns. You can output the data as a spreadsheet using the write.table() function.

If you want to send the file to someone that uses R or load it into another R session, use the save() function to export it. This exports a binary file that is imported back using the load() function.

save(best, file = "best.RData")

Alternatively you might be interested in the functions available in the RNetCDF and ncdf packages for exporting data in Network Common Data Form.