TY - JOUR
T1 - Large-Scale Non-convex Stochastic Constrained Distributionally Robust Optimization
AU - Zhang, Qi
AU - Zhou, Yi
AU - Prater-Bennette, Ashley
AU - Shen, Lixin
AU - Zou, Shaofeng
N1 - Publisher Copyright:
Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2024/3/25
Y1 - 2024/3/25
N2 - Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes x2divergences as a special case. We prove that our algorithm finds an ε-stationary point with an improved computational complexity than existing methods. Our method also applies to the smoothed conditional value at risk (CVaR) DRO.
AB - Distributionally robust optimization (DRO) is a powerful framework for training robust models against data distribution shifts. This paper focuses on constrained DRO, which has an explicit characterization of the robustness level. Existing studies on constrained DRO mostly focus on convex loss function, and exclude the practical and challenging case with non-convex loss function, e.g., neural network. This paper develops a stochastic algorithm and its performance analysis for non-convex constrained DRO. The computational complexity of our stochastic algorithm at each iteration is independent of the overall dataset size, and thus is suitable for large-scale applications. We focus on the general Cressie-Read family divergence defined uncertainty set which includes x2divergences as a special case. We prove that our algorithm finds an ε-stationary point with an improved computational complexity than existing methods. Our method also applies to the smoothed conditional value at risk (CVaR) DRO.
UR - http://www.scopus.com/inward/record.url?scp=85184601023&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85184601023&partnerID=8YFLogxK
U2 - 10.1609/aaai.v38i8.28662
DO - 10.1609/aaai.v38i8.28662
M3 - Conference Article
AN - SCOPUS:85184601023
SN - 2159-5399
VL - 38
SP - 8217
EP - 8225
JO - Proceedings of the AAAI Conference on Artificial Intelligence
JF - Proceedings of the AAAI Conference on Artificial Intelligence
IS - 8
T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024
Y2 - 20 February 2024 through 27 February 2024
ER -