TY - GEN
T1 - Structured weight matrices-Based hardware accelerators in deep neural networks
T2 - 28th Great Lakes Symposium on VLSI, GLSVLSI 2018
AU - Ding, Caiwen
AU - Ren, Ao
AU - Yuan, Geng
AU - Ma, Xiaolong
AU - Li, Jiayu
AU - Liu, Ning
AU - Yuan, Bo
AU - Wang, Yanzhi
N1 - Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/5/30
Y1 - 2018/5/30
N2 - Both industry and academia have extensively investigated hardware accelerations. In this work, to address the increasing demands in computational capability and memory requirement, we propose structured weight matrices (SWM)-based compression techniques for both field programmable gate array (FPGA) and application-specific integrated circuit (ASIC) implementations. In algorithm part, SWM-based framework adopts block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. The SWM-based technique can reduce computational complexity from O(n2) to O(n log n) and storage complexity from O(n2) to O(n) for each layer and both training and inference phases. For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the baseline accelerator. For ASIC implementations, the SWM-based ASIC design exhibits impressive advantages in terms of power, throughput, and energy efficiency. Experimental results indicate that this method is greatly suitable for applying DNNs onto both FPGAs and mobile/IoT devices.
AB - Both industry and academia have extensively investigated hardware accelerations. In this work, to address the increasing demands in computational capability and memory requirement, we propose structured weight matrices (SWM)-based compression techniques for both field programmable gate array (FPGA) and application-specific integrated circuit (ASIC) implementations. In algorithm part, SWM-based framework adopts block-circulant matrices to achieve a fine-grained tradeoff between accuracy and compression ratio. The SWM-based technique can reduce computational complexity from O(n2) to O(n log n) and storage complexity from O(n2) to O(n) for each layer and both training and inference phases. For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the baseline accelerator. For ASIC implementations, the SWM-based ASIC design exhibits impressive advantages in terms of power, throughput, and energy efficiency. Experimental results indicate that this method is greatly suitable for applying DNNs onto both FPGAs and mobile/IoT devices.
KW - ASIC
KW - Accelerator
KW - Deep learning
KW - FPGA
KW - Structured weight matrices
UR - http://www.scopus.com/inward/record.url?scp=85049457507&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049457507&partnerID=8YFLogxK
U2 - 10.1145/3194554.3194625
DO - 10.1145/3194554.3194625
M3 - Conference contribution
AN - SCOPUS:85049457507
T3 - Proceedings of the ACM Great Lakes Symposium on VLSI, GLSVLSI
SP - 353
EP - 358
BT - GLSVLSI 2018 - Proceedings of the 2018 Great Lakes Symposium on VLSI
PB - Association for Computing Machinery
Y2 - 23 May 2018 through 25 May 2018
ER -