TY - GEN
T1 - CIRCNN
T2 - 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017
AU - Ding, Caiwen
AU - Liao, Siyu
AU - Wang, Yanzhi
AU - Li, Zhe
AU - Liu, Ning
AU - Zhuo, Youwei
AU - Wang, Chao
AU - Qian, Xuehai
AU - Bai, Yu
AU - Yuan, Geng
AU - Ma, Xiaolong
AU - Zhang, Yipeng
AU - Tang, Jian
AU - Qiu, Qinru
AU - Lin, Xue
AU - Yuan, Bo
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/10/14
Y1 - 2017/10/14
N2 - Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffiers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n logn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIRCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.
AB - Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffiers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n logn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIRCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.
KW - Acceleration
KW - Block-circulant matrix
KW - Compression
KW - Deep learning
KW - FPGA
UR - http://www.scopus.com/inward/record.url?scp=85034015289&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85034015289&partnerID=8YFLogxK
U2 - 10.1145/3123939.3124552
DO - 10.1145/3123939.3124552
M3 - Conference contribution
AN - SCOPUS:85034015289
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 395
EP - 408
BT - MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings
PB - IEEE Computer Society
Y2 - 14 October 2017 through 18 October 2017
ER -