TY - GEN
T1 - Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices
AU - Liao, Siyu
AU - Li, Zhe
AU - Lin, Xue
AU - Qiu, Qinru
AU - Wang, Yanzhi
AU - Yuan, Bo
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/13
Y1 - 2017/12/13
N2 - Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ∼ 116X with negligible accuracy degradation.
AB - Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ∼ 116X with negligible accuracy degradation.
UR - http://www.scopus.com/inward/record.url?scp=85043491124&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85043491124&partnerID=8YFLogxK
U2 - 10.1109/ICCAD.2017.8203813
DO - 10.1109/ICCAD.2017.8203813
M3 - Conference contribution
AN - SCOPUS:85043491124
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
SP - 458
EP - 465
BT - 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 36th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
Y2 - 13 November 2017 through 16 November 2017
ER -