Skewness Practice with Quantile-Quantile Plots

Author

Russell Almond

Published

February 23, 2019

Skewness Determination Exercise.

In this exercise, the computer will generate 3 datasets: A, B and C. These will be randomly assigned to a positively skewed, negatively skewed, and symmetric distribution type. The data (sorted in order) are plotted on the Y axis and the quantiles of a standard normal (qnorm) distribution are plotted on the X axis. A normal distribution should appear as a straight line; positively skewed distributions will yield a ‘U’ shaped curve, and negatively skewed distributions a ‘C’ shaped curve. (An ‘S’ or ‘Z’ shaped curve indicates kurtosis, not skewness). Note: SPSS plots the normal quantiles on the Y axis and the data on the X: Which means that leptokurtic and platykurtic distributions will curve in the opposite direction from these Q-Q plots.

You can redraw from the same distributions by changing the sample size. (Bigger sample sizes are easier.)

#| standalone: true
#| viewerHeight: 500
library(lattice)
library(shiny)
distlist <-list(
skewNeg = list("beta(8,2)"=function(n) rbeta(n,8,2),
                "normal with neg outliers"=function (n) {
                  ifelse(runif(n)<.05,rnorm(n,-3),rnorm(n))
                },
                "hypergeometric(975,25,100)" = 
                  function (n) rhyper(n,975,25,100)),
skewPos = list("gamma(3)"=function(n) rgamma(n,3),
                "normal with positive outliers"=function(n) {
                  ifelse(runif(n)<.05,rnorm(n,3),rnorm(n))
                },
                "lognormal"=function (n) rlnorm(n,0,.3)),
sym = list("normal"=rnorm, "uniform"=runif,
            "t(5 d.f.)"=function (n) rt(n,5)))
longnames <- c("Negatively Skewed"="skewNeg",
               "Positively Skewed"="skewPos",
               "Symmetric"="sym")
## Initial draw, so that we have some starting values.
key <- 
{
  ## Randomly permute the types.  
  key <- sample(names(distlist),length(distlist))
  ## Label from A -- C (or whatever)  
  names(key) <- sapply(1L:length(key),
       function (i)
         intToUtf8(utf8ToInt("A")-1L+i))
  key
}
kdist <- 
{
    # draw random distribution for each plot
    sapply(key, function (r)  
        sample(names(distlist[[r]]),1L))
}

ui <- fluidPage(
inputPanel(
  selectInput("nn", label = "Sample Size:",
              choices = c(50, 100, 500, 1000), selected = 500)),
mainPanel(
  plotOutput("QQplots")),
  h4("Which is which?"),
  p("Identify the skewness of each distribution."),
  do.call(inputPanel,
         lapply(names(key), function (k)
                selectInput(k, label=k,
                  choices=c(Unknown="unknown", 
                           longnames),
                  selected="unknown"))),
  h4("Answers:\n"),
  tableOutput("answers"))

server <- function (input,output) {
  output$QQplots <- renderPlot({
    ## Draw random data 
    print(key)
    kdat <- lapply(names(key), function (k) {
    x <-do.call(distlist[[key[k]]][[kdist[k]]],
                list(input$nn))
      scale(x,(min(x)+max(x))/2,(max(x)-min(x)))*100+50
  })
  names(kdat) <- names(key)
 kdat <- 
    data.frame(dat=do.call(c,kdat),
               group=rep(names(key),
                          each=input$nn))
  
  qqmath(~dat|group, data = kdat,
          layout=c(3,1),horizontal=FALSE,
         panel=function(x,...) {
           panel.qqmathline(x,...)
           panel.qqmath(x,...)
           })

})
  output$answers <- renderTable({
  answer <- sapply(names(key),
   function (k) {
      if (input[[k]]=="unknown") {
        "Make your selection.\n"
      } else {
        paste(ifelse(input[[k]]==key[k],
                     "Correct:", "Incorrect:"),
              "Distribution was",kdist[k],
               "(",
        names(longnames)[grep(key[k],longnames)],
               ")\n")
    }})
  names(answer) <- names(key)
  as.data.frame(answer)
}, colnames=FALSE,rownames=TRUE)
}
shinyApp(ui=ui,server=server)

To try again with different distributions, reload the page. If you are having trouble, try increasing the sample size: sometimes a small sample won’t display the characteristics of the distribution strongly.

Here are the other exercises in this series: