Independence

Author

Russell Almond

Published

October 5, 2020

Imagine a population which is split into two groups: \(A\) and \(B\). We select 100 people at random and ask them a question, which has two answers yes and no. Define the following quantities:

\(N_{Ay}\) The number of people from Group \(A\) who answered yes.
\(N_{An}\) The number of people from Group \(A\) who answered no.
\(N_{By}\) The number of people from Group \(B\) who answered yes.
\(N_{Bn}\) The number of people from Group \(B\) who answered no.

Define the following values (row and column totals):

\(N_{A+}=N_{Ay}+N_{An}\) The number of people from Group \(A\).
\(N_{B+}=N_{By}+N_{Bn}\) The number of people from Group \(B\)
\(N_{+y}=N_{Ay}+N_{By}\) The number of people who answered yes.
\(N_{+n}=N_{An}+N_{Bn}\) The number of people who answered no.
\(N=N_{A+}+N_{B+} = N_{+y} + N_{+n}\) The total number of people in the sample.

Dividing any of those numbers by \(N_{xx}\) produces a corresponding proportion \(P_{xx}\) (which can be interpreted as a probability or proportion.

Suppose group membership and the answer to the question are statistically indepedent. In the diagram below, adjust \(P_{A+}\) and \(P_{+y}\) to make a two-by-two table:

#| standalone: true
#| viewerHeight: 300
library(shiny)
N <- 100
ui <- fluidPage(
inputPanel(
  sliderInput("pa", label = "P(Member of Group A)",
              min = 0, max = 1, value = .5, step = 0.05),
  sliderInput("py", label = "P(Answered `yes`)",
              min = 0, max = 1, value = .5, step = 0.05)
),
mainPanel(
  tableOutput("table"),
  plotOutput("mosaic")))

server <- function (input,output) {

tabi <- reactive(
  matrix(N*c(input$pa*input$py,
           input$pa*(1-input$py),
           (1-input$pa)*input$py,
           (1-input$pa)*(1-input$py)),2,2,
         byrow=TRUE,
         dimnames=list(Group=c("A","B"),Answer=c("y","n")))
)
output$table <- renderTable(tabi(),rownames=TRUE, digits=1)
output$mosaic <- renderPlot(mosaicplot(tabi()),color=TRUE,main="Independent data")
}
shinyApp(ui=ui,server=server)

There are two things you should notice about the independent data.

The proportion of yes and no answers in group A and B are always the same: \(N_{Ay}/N_{A+} = N_{By}/N_{B+} = N_{+y}/N\)
The proportion of people in both Groups \(A\) and \(B\) are the same for people who answered both yes and no: \(N_{Ay}/N_{+y} = N_{An}/N_{+n} = N_{A+}/N\)

We could say that the row and column proportions are always the same.

Another way to think about this is to say:

If we learned which group a person belongs to, that would not change the probability of their answer.
If we learned how a person answered, that would not change the probablity of their group.

Dependent

To make the table dependence, we need to add another parameter to the model to specify the degree of dependence.

For a two-by-two table, the odds ratio is as fairly easy to understand choice: \[ OR = \frac{P_{Ay}/P_{An}}{P_{By}/P_{Bn}}\] When group and answer are indpendent the cross product ratio should be 1.

If Group \(A\) is more likely to answer yes, then the ratio should be bigger than 1.

If Group \(B\) is more likely to answer yes, then the ratio should be less than one.

#| standalone: true
#| viewerHeight: 300
library(shiny)
N <- 100
ui1 <- fluidPage(
inputPanel(
  sliderInput("pad", label = "P(Member of Group A)",
              min = 0, max = 1, value = .5, step = 0.05),
  sliderInput("pyd", label = "P(Answered `yes`)",
              min = 0, max = 1, value = .5, step = 0.05),
  selectInput("OR",label="Odds Ratio",
              choices=c("1/4","1/3","1/2","2/3","1","3/2","2","3","4"), selected ="1")
),
mainPanel(
  tableOutput("table1"),
  plotOutput("mosaic1")))

server1 <- function (input,output) {

tabd <- reactive({
  pa <-input$pad
  py <-input$pyd
  OR <- eval(str2lang(input$OR))
  cat(pa,py,OR,"\n")
  if (OR == 1) {
    pay <- pa*py
  } else {
    S <- sqrt((1+(pa+py)*(OR-1))^2 + 4*OR*(1-OR)*pa*py)
    cat(S,"\n")
    pay <- (1+(pa+py)*(OR-1)-S)/2/(OR-1)
    cat(pay,"\n")
  }
  matrix(N*c(pay,(pa-pay),(py-pay),(1-py-pa+pay)),
        2,2, byrow=TRUE,
        dimnames=list(Group=c("A","B"),Answer=c("y","n")))
})
output$table1 <- renderTable(tabd(),rownames=TRUE, digits=1)
output$mosaic1 <- renderPlot(mosaicplot(tabd()),color=TRUE,main="Dependent data")
}
shinyApp(ui=ui1,server=server1)