Convergence analysis of proximal gradient with momentum for nonconvex optimization

Qunwei Li, Yi Zhou, Yingbin Liang, Pramod K. Varshney

Research output: Chapter in Book/Entry/PoemConference contribution

13 Scopus citations

Abstract

In this work, we investigate the accelerated proximal gradient method for nonconvex programming (APGnc). The method compares between a usual proximal gradient step and a linear extrapolation step, and accepts the one that has a lower function value to achieve a monotonie decrease. In specific, under a general nonsmooth and nonconvex setting, we provide a rigorous argument to show that the limit points of the sequence generated by APGnc are critical points of the objective function. Then, by exploiting the Kurdyka-Lojasiewicz (KL) property for a broad class of functions, we establish the linear and sub-linear convergence rates of the function value sequence generated by APGnc. We further propose a stochastic variance reduced APGnc (SVRG-APGnc), and establish its linear convergence under a special case of the KL property. We also extend the analysis to the inexact version of these methods and develop an adaptive momentum strategy that improves the numerical performance.

Original languageEnglish (US)
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages3341-3357
Number of pages17
ISBN (Electronic)9781510855144
StatePublished - 2017
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: Aug 6 2017Aug 11 2017

Publication series

Name34th International Conference on Machine Learning, ICML 2017
Volume5

Other

Other34th International Conference on Machine Learning, ICML 2017
Country/TerritoryAustralia
CitySydney
Period8/6/178/11/17

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'Convergence analysis of proximal gradient with momentum for nonconvex optimization'. Together they form a unique fingerprint.

Cite this