Confidence Interval: COVID Vaccine tests

Data from Moderna Vaccine Study Control Group

X <- 95
N <- 1500

In a sample of 1500 volunteers receiving the placebo, there were 95 positive cases; so our estimate for the rate of COVID-19 in this population (and this time period) is 0.063.

Recall that the formula for the \(\alpha\) confidence interval is

C.I. \[ \bar X \pm (z_{1-\alpha/2})\ \sigma_{\bar X}\] Here \(z_{1-\alpha/2}\) is the \(1-\alpha/2\) quantile of the normal distribution. We can look that up on a Normal Table. For \(\alpha=.95\), \(z_{1-\alpha/2}=1.96\approx 2\).

For binomial distribution \[ \bar X = X/N=\hat p\] This is the value 0.063 we calculated earlier. The hat over the \(p\) is a sign that it is a (maximum likelihood) estimate.

The usual formula for the standard error of a mean (from a simple random sample) is \[ \sigma_{\bar X} = \frac{\sigma}{\sqrt{N}} \] For the binomial distribution, the standard deviation is \[ \sigma = \sqrt{p(1-p)}; \qquad s = \sqrt{\hat p(1-\hat p)}\] Plug that into the formula for the standard error and we get:

\[ \sigma_{\bar X} = \sqrt{p(1-p)/N} \] Lets go ahead and calculate those

p.hat <- X/N
se <- sqrt(p.hat*(1-p.hat)/N)

The probability estimate is 0.063 and the standard error is 0.0063.

I’ll now use an R trick. qnorm() is the R function to calculate the quantiles of the normal distribution. If I give it two probabilities, it will give me both the postive and negative values. So I will pass it \((\alpha/2,1-\alpha/2)\), this gives the values \(r round(qnorm(c(.025,.975)),3)\).

Because R does calculations on vectors, it will calculate both sides of the confidence interval with one formula.

ci <- p.hat + qnorm(c(.025,.975))*se

Prevlance of covid at the time and in the locations the study was run was between (5.1%,7.6%).

Note that a lot of things have changed between now and then. In particular, the rise of the much more transmissable delta variant. But also changes in how seriously people take masking and other percautions. In particular, there is probably considerable regional variation in the prevalence of COVID-19.

The web site https://www.microcovid.org/ tracks this on a county-by-county basis.

Severe Covid

Same thing with the severe (hospitalizations or death) COVID numbers.

X1 <- 11
p1 <- X1/N
se1 <- sqrt(p1*(1-p1)/N)
ci1 <- p1 + qnorm(c(.025,.975))*se1

Prevlance of severe covid at the time and in the locations the study was run was between (0.3%,1.2%).