combinedListDirect()
This vignette will provide a brief
introduction to the combined list estimator described in Aronow,
Coppock, Crawford, and Green (2015): Combining List Experiment and
Direct Question Estimates of Sensitive Behavior Prevalence. In addition
to the mechanics of the combinedListDirect()
function, you
will learn how to interpret the results of the two placebo tests that
can serve as checks on the validity of the list experimental
assumptions.
If you want to use the combined estimator, you must tweak your standard list experimental design. All subjects must be asked the direct question in addition to being randomly assigned to either treatment or control lists. It is recommended that the order in which subjects are asked the two questions (direct and list) be randomized.
List experiments are designed to estimate the prevalence of some sensitive attitude or behavior. Typically, direct questioning would lead to an underestimate of prevalence because some subjects who do hold the attitude or engage in the behavior would withhold.
For example, suppose we have 1500 subjects, 1000 of whom engage, but 500 of whom would withhold if asked directly.
# Set a seed for reproducibility
set.seed(123)
# Define subject types.
# Truthfully respond "Yes" to direct question
N.trueadmitter <- 500
# Falsely respond "No" to direct question
N.withholder <- 500
# Truthfully respond "No" to direct question
N.innocent <- 500
type <- rep(c("TA", "WH", "IN"), times=c(N.trueadmitter, N.withholder, N.innocent))
Now suppose we were to ask the direct question, “Do you engage?”
## [1] 0.3333333
The true proportion of engagers is 1000/1500 = 0.67. However, the direct question is badly biased by social desirability: our direct question prevalence estimate is 0.33.
A conventional list experiment addresses social desirability by
asking a control group how many of J
(non-sensitive)
behaviors they engage in and a treatment group how many of
J + 1
behaviors they engage in, where the additional
behavior is the sensitive one. The (possibly covariate-adjusted)
difference-in-means renders a prevalence estimate that is free from
social desiriability bias. This estimate relies on two additional
assumptions: No Liars and No Design Effects. No Liars requires that
treatment subjects respond truthfully to the list question and No Design
Effects requires that the presence of the sensitive item does not change
treated subjects’ responses to the non-sensitive items.
N <- length(type)
# Generate list response potential outcomes
# Control potential outcome
Y0 <- sample(1:4, N, replace=TRUE)
# Treated potential outcome is 1 higher for true admitters and withholders
Y1 <- Y0 + ifelse(type %in% c("TA", "WH"), 1, 0)
# Conduct random assignment
Z <- rbinom(N, 1, 0.5)
# Reveal list responses
Y <- Z*Y1 + (1-Z)*Y0
list.est <- mean(Y[Z==1]) - mean(Y[Z==0])
list.se <- sqrt((var(Y[Z==1])/sum(Z) + var(Y[Z==0])/sum(1-Z)))
list.est
## [1] 0.619082
## [1] 0.05910645
The list experiment comes closer to the truth: our estimate is now 0.62. The standard error is somewhat large, at 0.06. A principal difficulty with using list experiments is that estimates can be quite imprecise.
The purpose of the combined estimator is to increase precision by combining direct questioning with list experimentation. The combined estimate is a weighted average of the direct question estimate and the list experiment estimate among those who answer “No” to the direct question. Under two additional assumptions (Treatment Independence and Monotonicity), the combined estimator yields more precise estimates than the conventional estimator. Treatment independence requires that the treatment not have any effect on the direct question response. Monotonicity requires that no subjects “falsely confess” to the direct question.
Estimation is straightforward:
## Loading required package: sandwich
# Wrap up all data in a dataframe
df <- data.frame(Y, Z, D)
out.1 <- combinedListDirect(formula = Y ~ Z, data = df, treat = "Z", direct = "D")
out.1
##
## Combined List Estimates
##
## Call: combinedListDirect(formula = Y ~ Z, data = df, treat = "Z", direct = "D")
##
## Prevalence estimate
## Prevalence
## Estimate 0.69551882
## Standard Error 0.04933932
If we compare the standard errors of the two methods, we can see that the combined estimator is more precise than the conventional estimator.
The combinedListDirect()
function automatically conducts
two placebo tests that can check the assumptions underlying the list
experimental design.
This test checks to see if the list experiment estimate among those who answer “Yes” to the direct question is significantly different from 1. Rejecting the hypothesis that this estimate is equal to one, indicates that one or more of four list experiment assumptions might be wrong: No Liars, No Design Effects, Treatment Ignorability, or Monotonicity.
This test checks to see if the direct question is affected by the treatment. If Treatment Independnce is satisfied, the (possibly covariate-adjusted) difference-in-means should not be significantly different from 0.
It’s easy to see the results of both tests using
summary.comblist()
. Because we generated the data above
respecting the list experiment assumptions, we know that we should pass
both tests.
##
## Combined List Estimates
##
## Call: combinedListDirect(formula = Y ~ Z, data = df, treat = "Z", direct = "D")
##
## Prevalence estimates
## Combined Direct Conventional
## Estimate 0.69551882 0.33333333 0.61908199
## Standard Error 0.04933932 0.01217567 0.05910645
##
## Placebo Test I
## Beta is the conventional list experiment estimate among those who answer 'Yes' to the direct question.
## Ho: beta = 1
## Ha: beta != 1
##
## Estimate SE p n
## beta 0.7702253 0.09870956 0.01992348 500
##
## Placebo Test II
## Delta is the average effect of the receiving the treatment list on the direct question response.
## Ho: delta = 0
## Ha: delta != 0
##
## Estimate SE p n
## delta 0.00177798 0.02436116 0.9418187 1500
The high p-values for both tests suggest that we cannot reject either null hypothesis. The assumptions underlying both the conventional and combined list experiment estimators have not been demonstrated to be false.
Let’s show cases where the tests indicate that there are problems. First, let’s consider the case where some subjects are “design affected”, i.e., they lower their response to the non-sensitive items when the sensitive item is also on the list.
# Define three subject types as before plus one new type
N.trueadmitter <- 400
N.withholder <- 500
N.innocent <- 500
# Truthfully responds "Yes" to direct question
# but decreases response to the non-sensitive items
# in the presence of the sensitive item
N.designaffected <- 100
type <- rep(c("TA", "WH", "IN", "DA"),
times=c(N.trueadmitter, N.withholder, N.innocent, N.designaffected))
N <- length(type)
D <- ifelse(type%in%c("TA","DA"), 1, 0)
# Control potential outcome
Y0 <- sample(1:4, N, replace=TRUE)
# Treated potential outcome is 1 higher for true admitters and withholders
# Note that it is NOT higher for those who are "design affected"
Y1 <- Y0 + ifelse(type %in% c("TA", "WH"), 1, 0)
Z <- rbinom(N, 1, 0.5)
Y <- Z*Y1 + (1-Z)*Y0
df <- data.frame(Y, Z, D)
out.2 <- combinedListDirect(formula = Y ~ Z, data = df, treat = "Z", direct = "D")
# Extract Placebo Test I results
unlist(out.2$placebo.I)
## estimate se p n
## 0.75978816 0.10374990 0.02059668 500.00000000
The low p-value suggests that we should reject the hypothesis that the list experimental estimate is equal to one among those who answer “Yes” to the direct question. We could reject this hypothesis if any of the four assumptions above were violated in some way. If the null is rejected, the list experiment estimates - both conventional and combined - are possibly biased.
Next let’s consider a case where the treatment does affect the direct question response, violating the Treatment Ignorability assumption.
# Define three subject types as before plus one new type
N.trueadmitter <- 400
N.withholder <- 500
N.innocent <- 500
# Truthfully answers "Yes" when in control
# But falsely answers "No" when in treatment
N.affectedbytreatment <- 100
type <- rep(c("TA", "WH", "IN", "ABT"),
times=c(N.trueadmitter, N.withholder, N.innocent, N.affectedbytreatment))
N <- length(type)
# Direct Question Potential outcomes
D0 <- ifelse(type%in%c("TA","ABT"), 1, 0)
D1 <- ifelse(type%in%c("TA"), 1, 0)
# List Experiment potential outcomes
Y0 <- sample(1:4, N, replace=TRUE)
Y1 <- Y0 + ifelse(type %in% c("TA", "WH"), 1, 0)
# Reveal outcomes according to random assignment
Z <- rbinom(N, 1, 0.5)
Y <- Z*Y1 + (1-Z)*Y0
D <- Z*D1 + (1-Z)*D0
df <- data.frame(Y, Z, D)
out.3 <- combinedListDirect(formula = Y ~ Z, data = df, treat = "Z", direct = "D")
# Extract Placebo Test II results
unlist(out.3$placebo.II)
## estimate se p n
## -0.05092491 0.02364766 0.03128051 1500.00000000
Again, the low p-value suggests that the null hypothesis that the average effect of the treatment on the direct response is zero is false. When this null is rejected, the combined estimator may yield biased results.
Another way to increase the precision of list experiments is to include pre-treatment covariates that are predictive of the list experiment outcome. The combined estimator can accomodate the inclusion of pre-treatment covariates quite easily.
# Define subject types.
N.trueadmitter <- 500
N.withholder <- 500
N.innocent <- 500
type <- rep(c("TA", "WH", "IN"), times=c(N.trueadmitter, N.withholder, N.innocent))
N <- length(type)
# Generate a predictive pre-treatment covariate "X")
X <- rnorm(N, sd = 2)
# Control potential outcome is related to "X"
Y0 <- as.numeric(cut(X + runif(N), breaks = 4))
Y1 <- Y0 + ifelse(type %in% c("TA", "WH"), 1, 0)
Z <- rbinom(N, 1, 0.5)
D <- ifelse(type=="TA", 1, 0)
Y <- Z*Y1 + (1-Z)*Y0
df <- data.frame(Y, Z, D, X)
# Conduct estimation without covariate adjustment
out.4 <- combinedListDirect(formula = Y ~ Z, data = df, treat = "Z", direct = "D")
out.4
##
## Combined List Estimates
##
## Call: combinedListDirect(formula = Y ~ Z, data = df, treat = "Z", direct = "D")
##
## Prevalence estimate
## Prevalence
## Estimate 0.68373531
## Standard Error 0.03165778
# Conduct estimation with covariate adjustment
# Just add the covariate on the right-hand side of the formula
out.5 <- combinedListDirect(formula = Y ~ Z + X, data = df, treat = "Z", direct = "D")
out.5
##
## Combined List Estimates
##
## Call: combinedListDirect(formula = Y ~ Z + X, data = df, treat = "Z",
## direct = "D")
##
## Prevalence estimate
## Prevalence
## Estimate 0.67850607
## Standard Error 0.02051244
A comparison of the standard errors with and without covariate adjustment confirms that the covariate-adjusted estimator is more precise. When you include covariates, the placebo tests become more powerful as well.