Skewness Boxplot Practice

Author

Russell Almond

Published

February 23, 2019

Skewness Determination Exercise.

In this exercise, the computer will generate 3 datasets: A, B and C. These will be randomly assigned to a positively skewed, negatively skewed, and symmetric distribution type. Your job is to determine which is which.

You can redraw from the same distributions by changing the sample size.

#| standalone: true
#| viewerHeight: 500
library(lattice)
library(shiny)

distlist <-list(
skewNeg = list("beta(8,2)"=function(n) rbeta(n,8,2),
                "normal with neg outliers"=function (n) {
                  ifelse(runif(n)<.05,rnorm(n,-3),rnorm(n))
                },
                "hypergeometric(975,25,100)" = 
                  function (n) rhyper(n,975,25,100)),
skewPos = list("gamma(3)"=function(n) rgamma(n,3),
                "normal with positive outliers"=function(n) {
                  ifelse(runif(n)<.05,rnorm(n,3),rnorm(n))
                },
                "lognormal"=function (n) rlnorm(n,0,.3)),
sym = list("normal"=rnorm, "uniform"=runif,
            "t(5 d.f.)"=function (n) rt(n,5)))
longnames <- c("Negatively Skewed"="skewNeg",
               "Positively Skewed"="skewPos",
               "Symmetric"="sym")
## Initial draw, so that we have some starting values.
key <- 
{
  ## Randomly permute the types.  
  key <- sample(names(distlist),length(distlist))
  ## Label from A -- C (or whatever)  
  names(key) <- sapply(1L:length(key),
       function (i)
         intToUtf8(utf8ToInt("A")-1L+i))
  key
}
kdist <- 
{
    # draw random distribution for each plot
    sapply(key, function (r)  
        sample(names(distlist[[r]]),1L))
}


ui <- fluidPage(
inputPanel(
  selectInput("nn", label = "Sample Size:",
              choices = c(50, 100, 500, 1000), selected = 100)),
mainPanel(
  plotOutput("boxplots")),
  h4("Which is which?"),
  p("Identify the skewness of each distribution."),
  do.call(inputPanel,
         lapply(names(key), function (k)
                selectInput(k, label=k,
                  choices=c(Unknown="unknown", 
                           longnames),
                  selected="unknown"))),
  h4("Answers:\n"),
  tableOutput("answers"))

server <- function (input,output) {
  output$boxplots <- renderPlot({
    ## Draw random data 
    kdat <- lapply(names(key), function (k) {
    x <-do.call(distlist[[key[k]]][[kdist[k]]],
                list(input$nn))
      scale(x,(min(x)+max(x))/2,(max(x)-min(x)))*100+50
  })
  names(kdat) <- names(key)
  kdat <- as.data.frame(kdat)
  
  boxplot(kdat,xlab="X")

})
  output$answers <- renderTable({
  answer <- sapply(names(key),
   function (k) {
      if (input[[k]]=="unknown") {
        "Make your selection.\n"
      } else {
        paste(ifelse(input[[k]]==key[k],
                     "Correct:", "Incorrect:"),
              "Distribution was",kdist[k],
               "(",
        names(longnames)[grep(key[k],longnames)],
               ")\n")
    }})
  names(answer) <- names(key)
  as.data.frame(answer)
}, colnames=FALSE,rownames=TRUE)
}
shinyApp(ui=ui,server=server)

To try again with different distributions, reload the page. If you are having trouble, try increasing the sample size: sometimes a small sample won’t display the characteristics of the distribution strongly.

What to look for:

Is the box from median to quartile longer on one side than the other?
Is the whisker longer on one side than the other?
Are there outliers on one side and not the other?

All three of these are signs of skewness in that direction (longer box, whisker, or outliers).

Skewness Determination Exercise.

What to look for:

Related Pages