TY - JOUR

T1 - Simpson's aggregation paradox in nonparametric statistical analysis

T2 - Theory, computation, and susceptibility in public health data

AU - Sanders, Shane

AU - Ehrlich, Justin

AU - Boudreau, James

N1 - Publisher Copyright:
Copyright © 2023 Sanders, Ehrlich and Boudreau.

PY - 2023

Y1 - 2023

N2 - This study establishes sufficient conditions for observing instances of Simpson's (data aggregation) Paradox under rank sum scoring (RSS), as used, e.g., in the Wilcoxon-Mann-Whitney (WMW) rank sum test. The WMW test is a primary nonparametric statistical test in FDA drug product evaluation and other prominent medical settings. Using computational nonparametric statistical methods, we also establish the relative frequency with which paradox-generating Simpson Reversals occur under RSS when an initial data sequence is pooled with its ordinal replicate. For each 2-sample, n-element per sample or 2 x n case of RSS considered, strict Reversals occurred for between 0% and 1.74% of data poolings across the whole sample space, roughly similar to that observed for 2 x 2 x 2 contingency tables and considerably less than that observed for path models. The Reversal rate conditional on observed initial sequence is highly variable. Despite a mode at 0%, this rate exceeds 20% for some initial sequences. Our empirical application identifies clusters of Simpson Reversal susceptibility for publicly-released mobile phone radiofrequency exposure data. Simpson Reversals under RSS are not simply a theoretical concern but can reverse nonparametric or parametric biostatistical results even in vitally important public health settings. Conceptually, Paradox incidence can be viewed as a robustness check on a given WMW statistical test result. When an instance of Paradox occurs, results constituting this instance are found to be data-scale dependent. Given that the rate of Reversal can vary substantially by initial sequence, the practice of calculating this rate conditional on observed initial sequence represents a potentially important robustness check upon a result.

AB - This study establishes sufficient conditions for observing instances of Simpson's (data aggregation) Paradox under rank sum scoring (RSS), as used, e.g., in the Wilcoxon-Mann-Whitney (WMW) rank sum test. The WMW test is a primary nonparametric statistical test in FDA drug product evaluation and other prominent medical settings. Using computational nonparametric statistical methods, we also establish the relative frequency with which paradox-generating Simpson Reversals occur under RSS when an initial data sequence is pooled with its ordinal replicate. For each 2-sample, n-element per sample or 2 x n case of RSS considered, strict Reversals occurred for between 0% and 1.74% of data poolings across the whole sample space, roughly similar to that observed for 2 x 2 x 2 contingency tables and considerably less than that observed for path models. The Reversal rate conditional on observed initial sequence is highly variable. Despite a mode at 0%, this rate exceeds 20% for some initial sequences. Our empirical application identifies clusters of Simpson Reversal susceptibility for publicly-released mobile phone radiofrequency exposure data. Simpson Reversals under RSS are not simply a theoretical concern but can reverse nonparametric or parametric biostatistical results even in vitally important public health settings. Conceptually, Paradox incidence can be viewed as a robustness check on a given WMW statistical test result. When an instance of Paradox occurs, results constituting this instance are found to be data-scale dependent. Given that the rate of Reversal can vary substantially by initial sequence, the practice of calculating this rate conditional on observed initial sequence represents a potentially important robustness check upon a result.

KW - Simpson's Aggregation Paradox

KW - aggregation rules

KW - collective choice

KW - nonparametric statistical analysis

KW - public choice

KW - social choice theory

UR - http://www.scopus.com/inward/record.url?scp=85152538167&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85152538167&partnerID=8YFLogxK

U2 - 10.3389/fams.2023.1169164

DO - 10.3389/fams.2023.1169164

M3 - Article

AN - SCOPUS:85152538167

SN - 2297-4687

VL - 9

JO - Frontiers in Applied Mathematics and Statistics

JF - Frontiers in Applied Mathematics and Statistics

M1 - 1169164

ER -