TY - GEN
T1 - PCONV
T2 - 34th AAAI Conference on Artificial Intelligence, AAAI 2020
AU - Ma, Xiaolong
AU - Guo, Fu Ming
AU - Niu, Wei
AU - Lin, Xue
AU - Tang, Jian
AU - Ma, Kaisheng
AU - Ren, Bin
AU - Wang, Yanzhi
N1 - Publisher Copyright:
Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2020
Y1 - 2020
N2 - Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, – fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39.2×, 11.4×, and 6.3×, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.
AB - Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method. There are currently two mainstreams of pruning methods representing two extremes of pruning regularity: non-structured, fine-grained pruning can achieve high sparsity and accuracy, but is not hardware friendly; structured, coarse-grained pruning exploits hardware-efficient structures in pruning, but suffers from accuracy drop when the pruning rate is high. In this paper, we introduce PCONV, comprising a new sparsity dimension, – fine-grained pruning patterns inside the coarse-grained structures. PCONV comprises two types of sparsities, Sparse Convolution Patterns (SCP) which is generated from intra-convolution kernel pruning and connectivity sparsity generated from inter-convolution kernel pruning. Essentially, SCP enhances accuracy due to its special vision properties, and connectivity sparsity increases pruning rate while maintaining balanced workload on filter computation. To deploy PCONV, we develop a novel compiler-assisted DNN inference framework and execute PCONV models in real-time without accuracy compromise, which cannot be achieved in prior work. Our experimental results show that, PCONV outperforms three state-of-art end-to-end DNN frameworks, TensorFlow-Lite, TVM, and Alibaba Mobile Neural Network with speedup up to 39.2×, 11.4×, and 6.3×, respectively, with no accuracy loss. Mobile devices can achieve real-time inference on large-scale DNNs.
UR - http://www.scopus.com/inward/record.url?scp=85094805360&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85094805360&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85094805360
T3 - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
SP - 5117
EP - 5124
BT - AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
PB - AAAI Press
Y2 - 7 February 2020 through 12 February 2020
ER -