ADMM-NN

An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

Ao Ren, Tianyun Zhang, Shaokai Ye, Jiayu Li, Wenyao Xu, Xuehai Qian, Xue Lin, Yanzhi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs). The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, a systematic framework of joint weight pruning and quantization of DNNs is lacking, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted besides simply model size reduction. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to solve non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than the state-of-the-art. The second part is hardwareaware DNN optimizations to facilitate hardware-level implementations. We perform ADMM-based weight pruning and quantization considering (i) the computation reduction and energy efficiency improvement, and (ii) the hardware performance overhead due to irregular sparsity. The first requirement prioritizes the convolutional layer compression over fully-connected layers, while the latter requires a concept of the break-even pruning ratio, defined as the minimum pruning ratio of a specific layer that results in no hardware performance degradation. Without accuracy loss, ADMM-NN achieves 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, - significantly higher than the state-of-the-art. Combiningweight pruning and quantization,we achieve 1,910× and 231× reductions in overall model size on these two benchmarks . Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50.We release codes and models at https://github.com/yeshaokai/admm-nn.

Original languageEnglish (US)
Title of host publicationASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems
PublisherAssociation for Computing Machinery
Pages925-938
Number of pages14
ISBN (Electronic)9781450362405
DOIs
StatePublished - Apr 4 2019
Externally publishedYes
Event24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019 - Providence, United States
Duration: Apr 13 2019Apr 17 2019

Publication series

NameInternational Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

Conference

Conference24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019
CountryUnited States
CityProvidence
Period4/13/194/17/19

Fingerprint

Hardware
Redundancy
Energy efficiency
Deep neural networks
Degradation

Keywords

  • ADMM
  • Hardware Optimization
  • Neural Network
  • Quantization
  • Weight Pruning

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture

Cite this

Ren, A., Zhang, T., Ye, S., Li, J., Xu, W., Qian, X., ... Wang, Y. (2019). ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers. In ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems (pp. 925-938). (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). Association for Computing Machinery. https://doi.org/10.1145/3297858.3304076

ADMM-NN : An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers. / Ren, Ao; Zhang, Tianyun; Ye, Shaokai; Li, Jiayu; Xu, Wenyao; Qian, Xuehai; Lin, Xue; Wang, Yanzhi.

ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 2019. p. 925-938 (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ren, A, Zhang, T, Ye, S, Li, J, Xu, W, Qian, X, Lin, X & Wang, Y 2019, ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers. in ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS, Association for Computing Machinery, pp. 925-938, 24th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, United States, 4/13/19. https://doi.org/10.1145/3297858.3304076
Ren A, Zhang T, Ye S, Li J, Xu W, Qian X et al. ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers. In ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery. 2019. p. 925-938. (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS). https://doi.org/10.1145/3297858.3304076
Ren, Ao ; Zhang, Tianyun ; Ye, Shaokai ; Li, Jiayu ; Xu, Wenyao ; Qian, Xuehai ; Lin, Xue ; Wang, Yanzhi. / ADMM-NN : An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers. ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems. Association for Computing Machinery, 2019. pp. 925-938 (International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS).
@inproceedings{88e5cd6c68784a0083c7db16d4155739,
title = "ADMM-NN: An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers",
abstract = "Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs). The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, a systematic framework of joint weight pruning and quantization of DNNs is lacking, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted besides simply model size reduction. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to solve non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than the state-of-the-art. The second part is hardwareaware DNN optimizations to facilitate hardware-level implementations. We perform ADMM-based weight pruning and quantization considering (i) the computation reduction and energy efficiency improvement, and (ii) the hardware performance overhead due to irregular sparsity. The first requirement prioritizes the convolutional layer compression over fully-connected layers, while the latter requires a concept of the break-even pruning ratio, defined as the minimum pruning ratio of a specific layer that results in no hardware performance degradation. Without accuracy loss, ADMM-NN achieves 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, - significantly higher than the state-of-the-art. Combiningweight pruning and quantization,we achieve 1,910× and 231× reductions in overall model size on these two benchmarks . Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50.We release codes and models at https://github.com/yeshaokai/admm-nn.",
keywords = "ADMM, Hardware Optimization, Neural Network, Quantization, Weight Pruning",
author = "Ao Ren and Tianyun Zhang and Shaokai Ye and Jiayu Li and Wenyao Xu and Xuehai Qian and Xue Lin and Yanzhi Wang",
year = "2019",
month = "4",
day = "4",
doi = "10.1145/3297858.3304076",
language = "English (US)",
series = "International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS",
publisher = "Association for Computing Machinery",
pages = "925--938",
booktitle = "ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems",

}

TY - GEN

T1 - ADMM-NN

T2 - An Algorithm-Hardware Co-Design Framework of DNNs Using Alternating Direction Method of Multipliers

AU - Ren, Ao

AU - Zhang, Tianyun

AU - Ye, Shaokai

AU - Li, Jiayu

AU - Xu, Wenyao

AU - Qian, Xuehai

AU - Lin, Xue

AU - Wang, Yanzhi

PY - 2019/4/4

Y1 - 2019/4/4

N2 - Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs). The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, a systematic framework of joint weight pruning and quantization of DNNs is lacking, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted besides simply model size reduction. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to solve non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than the state-of-the-art. The second part is hardwareaware DNN optimizations to facilitate hardware-level implementations. We perform ADMM-based weight pruning and quantization considering (i) the computation reduction and energy efficiency improvement, and (ii) the hardware performance overhead due to irregular sparsity. The first requirement prioritizes the convolutional layer compression over fully-connected layers, while the latter requires a concept of the break-even pruning ratio, defined as the minimum pruning ratio of a specific layer that results in no hardware performance degradation. Without accuracy loss, ADMM-NN achieves 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, - significantly higher than the state-of-the-art. Combiningweight pruning and quantization,we achieve 1,910× and 231× reductions in overall model size on these two benchmarks . Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50.We release codes and models at https://github.com/yeshaokai/admm-nn.

AB - Model compression is an important technique to facilitate efficient embedded and hardware implementations of deep neural networks (DNNs). The target is to simultaneously reduce the model storage size and accelerate the computation, with minor effect on accuracy. Two important categories of DNN model compression techniques are weight pruning and weight quantization. The former leverages the redundancy in the number of weights, whereas the latter leverages the redundancy in bit representation of weights. These two sources of redundancy can be combined, thereby leading to a higher degree of DNN model compression. However, a systematic framework of joint weight pruning and quantization of DNNs is lacking, thereby limiting the available model compression ratio. Moreover, the computation reduction, energy efficiency improvement, and hardware performance overhead need to be accounted besides simply model size reduction. To address these limitations, we present ADMM-NN, the first algorithm-hardware co-optimization framework of DNNs using Alternating Direction Method of Multipliers (ADMM), a powerful technique to solve non-convex optimization problems with possibly combinatorial constraints. The first part of ADMM-NN is a systematic, joint framework of DNN weight pruning and quantization using ADMM. It can be understood as a smart regularization technique with regularization target dynamically updated in each ADMM iteration, thereby resulting in higher performance in model compression than the state-of-the-art. The second part is hardwareaware DNN optimizations to facilitate hardware-level implementations. We perform ADMM-based weight pruning and quantization considering (i) the computation reduction and energy efficiency improvement, and (ii) the hardware performance overhead due to irregular sparsity. The first requirement prioritizes the convolutional layer compression over fully-connected layers, while the latter requires a concept of the break-even pruning ratio, defined as the minimum pruning ratio of a specific layer that results in no hardware performance degradation. Without accuracy loss, ADMM-NN achieves 85× and 24× pruning on LeNet-5 and AlexNet models, respectively, - significantly higher than the state-of-the-art. Combiningweight pruning and quantization,we achieve 1,910× and 231× reductions in overall model size on these two benchmarks . Highly promising results are also observed on other representative DNNs such as VGGNet and ResNet-50.We release codes and models at https://github.com/yeshaokai/admm-nn.

KW - ADMM

KW - Hardware Optimization

KW - Neural Network

KW - Quantization

KW - Weight Pruning

UR - http://www.scopus.com/inward/record.url?scp=85064599705&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064599705&partnerID=8YFLogxK

U2 - 10.1145/3297858.3304076

DO - 10.1145/3297858.3304076

M3 - Conference contribution

T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS

SP - 925

EP - 938

BT - ASPLOS 2019 - 24th International Conference on Architectural Support for Programming Languages and Operating Systems

PB - Association for Computing Machinery

ER -