TY - JOUR
T1 - VqSGD
T2 - Vector Quantized Stochastic Gradient Descent
AU - Gandikota, Venkata
AU - Kane, Daniel
AU - Maity, Raj Kumar
AU - Mazumdar, Arya
N1 - Funding Information:
This work was supported in part by NSF under Award CCF 2127929, Award CCF 1934846, and Award CCF 1909046.
Publisher Copyright:
© 1963-2012 IEEE.
PY - 2022/7/1
Y1 - 2022/7/1
N2 - In this work, we present a family of vector quantization schemes vqSGD (Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization. In the process we derive the following fundamental information theoretic fact: \Theta \left({\frac {d}{R2}}}\right) bits are necessary and sufficient (up to an additive O(\log d) term) to describe an unbiased estimator \hat{\boldsymbol {g}}(\boldsymbol {g}) for any \boldsymbol {g} in the d -dimensional unit sphere, under the constraint that g 2 R almost surely, R > 1. In particular, we consider a randomized scheme based on the convex hull of a point set, that returns an unbiased estimator of a d -dimensional gradient vector with almost surely bounded norm. We provide multiple efficient instances of our scheme, that are near optimal, and require o(d) bits of communication at the expense of tolerable increase in error. The instances of our quantization scheme are obtained using well-known families of binary error-correcting codes and provide a smooth tradeoff between the communication and the estimation error of quantization. Furthermore, we show that vqSGD also offers automatic privacy guarantees.
AB - In this work, we present a family of vector quantization schemes vqSGD (Vector-Quantized Stochastic Gradient Descent) that provide an asymptotic reduction in the communication cost with convergence guarantees in first-order distributed optimization. In the process we derive the following fundamental information theoretic fact: \Theta \left({\frac {d}{R2}}}\right) bits are necessary and sufficient (up to an additive O(\log d) term) to describe an unbiased estimator \hat{\boldsymbol {g}}(\boldsymbol {g}) for any \boldsymbol {g} in the d -dimensional unit sphere, under the constraint that g 2 R almost surely, R > 1. In particular, we consider a randomized scheme based on the convex hull of a point set, that returns an unbiased estimator of a d -dimensional gradient vector with almost surely bounded norm. We provide multiple efficient instances of our scheme, that are near optimal, and require o(d) bits of communication at the expense of tolerable increase in error. The instances of our quantization scheme are obtained using well-known families of binary error-correcting codes and provide a smooth tradeoff between the communication and the estimation error of quantization. Furthermore, we show that vqSGD also offers automatic privacy guarantees.
KW - Vector quantization
KW - communication efficiency
KW - mean estimation
KW - stochastic gradient descent (SGD)
UR - http://www.scopus.com/inward/record.url?scp=85127032482&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85127032482&partnerID=8YFLogxK
U2 - 10.1109/TIT.2022.3161620
DO - 10.1109/TIT.2022.3161620
M3 - Article
AN - SCOPUS:85127032482
SN - 0018-9448
VL - 68
SP - 4573
EP - 4587
JO - IRE Professional Group on Information Theory
JF - IRE Professional Group on Information Theory
IS - 7
ER -