We examine methods for measuring performance in signal-detection-like tasks when each participant provides only a few observations. Monte Carlo simulations demonstrate that standard statistical techniques applied to a d′ analysis can lead to large numbers of Type I errors (incorrectly rejecting a hypothesis of no difference). Various statistical methods were compared in terms of their Type I and Type n error (incorrectly accepting a hypothesis of no difference) rates. Our conclusions are the same whether these two types of errors are weighted equally or Type I errors are weighted more heavily. The most promising method is to combine an aggregate d′ measure with a percentile bootstrap confidence interval, a computer-intensive nonparametric method of statistical inference. Researchers who prefer statistical techniques more commonly used in psychology, such as a repeated measures t test, should use γ (Goodman & Kruskal, 1954), since it performs slightly better than or nearly as well as d′. In general, when repeated measures t tests are used, γ is more conservative than d′: It makes more Type n errors, but its Type I error rate tends to be much closer to that of the traditional .05 α level. It is somewhat surprising that γ performs as well as it does, given that the simulations that generated the hypothetical data conformed completely to the d′ model. Analyses in which H - FA was used had the highest Type I error rates. Detailed simulation results can be downloaded from www.psychonomic.org/ archive/Schooler-BRM-2004.zip.
ASJC Scopus subject areas
- Experimental and Cognitive Psychology
- Developmental and Educational Psychology
- Arts and Humanities (miscellaneous)
- Psychology (miscellaneous)