The purpose of writing up this document is that, while one can find some sources that do mention the theoretical range of D-scores, it has really not been discussed much. Consequently, at the time of writing, it does not seem to be very well known even to IAT users. In fact, some IAT users whom I’ve pointed this out to have been incredulous. But it’s useful to know, partly because it means we can do a quick sanity check on D-scores we’ve computed by making sure they at least don’t fall outside this interval. Anyway, this document is mainly to convince the incredulous people, but I hope it gives a little intuition about the phenomenon as well.

Below I’ll show a little computational example in R, and then a mathematical proof. But before I do that I want to point to a couple of sources that have also pointed this fact out. The first is a recent paper by Blanton, Jaccard, and Burrows (2014), which discusses this a little and then gives a sketch of a proof in an Appendix. Below I give a more thorough proof. The second source is a response paper written by some of the IAT people (Nosek & Sriram, 2007), but the comment is buried in Footnote 2. Finally, it is briefly mentioned on this webpage that I found on the Project Implicit website.

Computational example

IAT scores—defined by Greenwald, Nosek, and Banaji (2003) and named D-scores—are computed as the mean difference divided by the overall standard deviation (SD). This is important: It’s the overall SD and not the pooled SD (the latter being a weighted mean of the SDs computed separately in each condition, which is what e.g. Cohen’s d uses)! Actually this fact is exactly what causes the bounding, because unlike the pooled SD, the overall SD is not independent of the mean difference: when the mean difference increases, the overall SD increases, and when the SD decreases, the mean difference decreases. Anyway, because D-scores are the mean difference divided by the SD, they increase when either the mean difference increases or the SD decreases. The strategy I’ll use both here and in the proof is to show what happens to D-scores when (a) the pooled SD is held constant and the mean difference goes to infinity, and (b) the mean difference is held constant, and the pooled SD goes to 0.

First we define an R function that computes a D-score given two sets of data representing the incongruent trials and the congruent trials:

Dscore <- function(incongruent, congruent){
  (mean(incongruent)-mean(congruent))/sd(c(incongruent, congruent))
}

Now we’ll compute D-scores for a series of datasets in which the pooled SD is held constant but the mean difference is a little bigger in each dataset, up to Cohen’s d = 10:

dat <- lapply(seq(from=0, to=10, by=.25), function(x){
  rnorm(50, mean=x)
})

D <- lapply(dat, function(x){
  Dscore(incongruent=x, congruent=rnorm(50))
})

And plot the results:

plot(y=D, x=seq(from=0, to=10, by=.25), type="l", ylab="D-score",
  xlab="Mean difference", main="Pooled SD = 1")

The function here is bumpy because there’s just a single dataset at each mean difference, but it would smooth out if we simulated many datasets for each mean difference.

Now we’ll do the same sort of thing again, but this time we hold constant the mean difference (setting it equal to 1) and let the pooled SD get closer and closer to 0 in each successive dataset. Here’s how we create the datasets and corresponding D-scores:

D <- lapply(seq(from=log(10), to=log(.01), length.out=40), function(x){
  con <- rnorm(50, sd=exp(x))
  Dscore(incongruent=con+1, congruent=con)
})

And plot the results:

plot(y=D, x=seq(from=-log(10), to=-log(.01), length.out=40),
  type="l",xaxt="n", ylab="D-score",
  xlab="Pooled SD (log scale)", main="Mean difference = 1")
axis(side=1, at=-log(c(10, 1, .1, .01)),
  labels=c(10, 1, .1, .01))

Proof

For the proof we will use the same strategy as above, that is, we’ll see what happens to the D-score when we let the mean difference go to infinity and when we let the pooled SD go to 0.

First we define some variables. Let \(x\) be the mean difference, \(\sigma\) be the pooled SD, and \(\sigma'\) be the overall SD. So then the D-score is \(D=x/\sigma'\). The first step is to rewrite \(\sigma'\) as a function of \(\sigma\) and \(x\). To do that, note that we could write the equation for a single observation in the ith condition (where i refers to either the congruent or incongruent condition) as \[ y_i=\mu+\tau_i \] where \(\mu\) is the participant’s average response and \(\tau_i\) is the “effect” of the ith condition, that is, the difference between the participant’s mean in that condition and the participant’s overall mean. Taking the variance of this equation, we get \[ \begin{aligned} \text{var}(y_i) &= \text{var}(\mu)+\text{var}(\tau_i) \\ &= \sigma^2+\left(\frac{x}{2}\right)^2 \end{aligned} \] Since \(\sigma'\) is the square root of the variance of \(y\), this means \(\sigma'\) is equal to the square root of the equation above. So we can rewrite the D-score as \[ D=\frac{x}{\sqrt{\sigma^2+\left(\frac{x}{2}\right)^2}} \] First we’ll look at what happens when \(\sigma\) goes to 0, since that’s much simpler: \[ \lim_{\sigma\to0} \frac{x}{\sqrt{\sigma^2+\left(\frac{x}{2}\right)^2}}=\frac{x}{\sqrt{\left(\frac{x}{2}\right)^2}}=\pm\sqrt{\frac{4x^2}{x^2}}=\pm2 \] We don’t technically need a limit in this case, since it’s theoretically possible for \(\sigma\) to be 0, but at least for me the limit more neatly matches the way I think about this issue. Anyway, letting the mean difference go to infinity is slightly more complicated (and we do need a limit), but it’s the same basic idea: \[ \lim_{x\to\infty} \frac{x}{\sqrt{\sigma^2+\left(\frac{x}{2}\right)^2}} =\lim_{x\to\infty} \pm\sqrt{\frac{x^2}{\sigma^2+\frac{x^2}{4}}} =\lim_{x\to\infty} \pm\sqrt{\frac{1}{\frac{\sigma^2}{x^2}+\frac{1}{4}}} =\pm\sqrt{4}=\pm2 \]

Addendum, June 2, 2015

Tom Stafford has pointed out to me that some variants of the D-scoring algorithm involve adding a constant “error penalty” to incorrect responses (e.g., code incorrect responses as the recorded time + 0.6 seconds), and that this procedure can, in some cases, lead to D-scores that fall outside of the theoretical bounds. Very interesting! Thanks Tom.

References

Blanton, H., Jaccard, J., & Burrows, C. N. (2014). Implications of the Implicit Association Test D-Transformation for Psychological Assessment. Assessment.

Greenwald, A. G., Nosek, B. A., & Banaji, M. R. (2003). Understanding and using the implicit association test: I. An improved scoring algorithm. Journal of Personality and Social Psychology, 85(2), 197.

Nosek, B. A., & Sriram, N. (2007). Faulty assumptions: a comment on Blanton, Jaccard, Gonzales, and Christie (2006). Journal of Experimental Social Psychology, 43(3), 393-398.