Kurtosis Boxplot Practice
Kurtosis Determination Exercise.
In this exercise, the computer will generate 3 datasets: X, Y and Z. These will be randomly assigned to high (leptokurtic), medium (mesokurtic) and low (platykurtic) distributions.
You can redraw from the same distributions by changing the sample size.
#| standalone: true
#| viewerHeight: 750
library(lattice)
library(shiny)
distlist <-list(
platykurtic = list("Uniform"=runif,
"Mixture of normals (different means)"=
function (n)
ifelse(runif(n)<.5,rnorm(n,-1),
rnorm(n,1)),
"beta(1.5,1.5" = function (n)
rbeta(n,1.5,1.5)),
leptokurtic = list("t(df=5)"=function(n) rt(n,5),
"Mixture of normals (different sds)"=
function (n)
ifelse(runif(n)<.25,rnorm(n,0,3),
rnorm(n,0,1)),
"Exponential" = rexp),
mesokurtic = list("normal"=rnorm,
"Wiebul(2,2)"= function (n)
rweibull(n,2,2),
"Binomial(.45,10)"=function(n)
rbinom(n,10,.45)))
longnames <- c("Platykurtic (flat)"="platykurtic",
"Leptokurtic (heavy tails)"="leptokurtic",
"Mesokurtic (normal)"="mesokurtic")
## Initial draw, so that we have some starting values.
key <-
{
## Randomly permute the types.
key <- sample(names(distlist),length(distlist))
## Label from A -- C (or whatever)
names(key) <- sapply(1L:length(key),
function (i)
intToUtf8(utf8ToInt("Z")-length(key)+i))
key
}
kdist <-
{
# draw random distribution for each plot
sapply(key, function (r)
sample(names(distlist[[r]]),1L))
}
ui <- fluidPage(
inputPanel(
selectInput("nn", label = "Sample Size:",
choices = c(50, 100, 500, 1000), selected = 100)),
mainPanel(
plotOutput("boxplots")),
h4("Which is which?"),
p("Identify the skewness of each distribution."),
do.call(inputPanel,
lapply(names(key), function (k)
selectInput(k, label=k,
choices=c(Unknown="unknown",
longnames),
selected="unknown"))),
h4("Answers:\n"),
tableOutput("answers"))
server <- function (input,output) {
output$boxplots <- renderPlot({
## Draw random data
kdat <- lapply(names(key), function (k) {
x <-do.call(distlist[[key[k]]][[kdist[k]]],
list(input$nn))
scale(x,(min(x)+max(x))/2,(max(x)-min(x)))*100+50
})
names(kdat) <- names(key)
kdat <- as.data.frame(kdat)
boxplot(kdat, xlab = "X")
})
output$answers <- renderTable({
answer <- sapply(names(key),
function (k) {
if (input[[k]]=="unknown") {
"Make your selection.\n"
} else {
paste(ifelse(input[[k]]==key[k],
"Correct:", "Incorrect:"),
"Distribution was",kdist[k],
"(",
names(longnames)[grep(key[k],longnames)],
")\n")
}})
names(answer) <- names(key)
as.data.frame(answer)
}, colnames=FALSE,rownames=TRUE)
}
shinyApp(ui=ui,server=server)
To try again with different distributions, reload the page. If you are having trouble, try increasing the sample size: sometimes a small sample won’t display the characteristics of the distribution strongly.
What to look for
In a normal distribution the whiskers extend about 1.5 box-lengths (IQR)s from the hinges (sides of the box).
- If the box is long compared to the whiskers, this is a sign the distribuiton is platykurtic.
- If the box is short compared to the whiskers, this is a sign the distribution is leptokurtic.
The whiskers only extend to the farthest data point within 1.5 IQRs from the box. So if there is high kurtosis, this will show up as lots of outliers.
- With a normal distribution, there is often 1–2 outliers per 100 data points. Much more than that is a sign of high kurtosis.
Is the length of the box (IQR) long compared to the length of the whiskers?