TY - JOUR
T1 - Arbitrariness and Social Prediction
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
AU - Cooper, A. Feder
AU - Lee, Katherine
AU - Choksi, Madiha Zahrah
AU - Barocas, Solon
AU - De Sa, Christopher
AU - Grimmelmann, James
AU - Kleinberg, Jon
AU - Sen, Siddhartha
AU - Zhang, Baobao
N1 - Publisher Copyright:
© 2024, Association for the Advancement of Artifcial Intelligence (www.aaai.org). All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions. We: 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest-to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair binary classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fair binary classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions - before we even try to apply any fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should reconsider how we choose to measure fairness in binary classification.
AB - Variance in predictions across different trained models is a significant, under-explored source of error in fair binary classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions. We: 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest-to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair binary classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fair binary classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions - before we even try to apply any fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should reconsider how we choose to measure fairness in binary classification.
UR - http://www.scopus.com/inward/record.url?scp=85189634045&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85189634045&partnerID=8YFLogxK
U2 - 10.1609/aaai.v38i20.30203
DO - 10.1609/aaai.v38i20.30203
M3 - Conference Article
AN - SCOPUS:85189634045
SN - 2159-5399
VL - 38
SP - 22004
EP - 22012
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 20
Y2 - 20 February 2024 through 27 February 2024
ER -