TY - GEN
T1 - C-LSTM
T2 - 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2018
AU - Wang, Shuo
AU - Li, Zhe
AU - Ding, Caiwen
AU - Yuan, Bo
AU - Qiu, Qinru
AU - Wang, Yanzhi
AU - Liang, Yun
N1 - Funding Information:
This work is supported by Beijing Natural Science Foundation (No. L172004) and National Science Foundation under grants CNS #1704662 and CNS #1739748. We thank all the anonymous reviewers for their feedback.
Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/2/15
Y1 - 2018/2/15
N2 - Recently, significant accuracy improvement has been achieved for acoustic recognition systems by increasing the model size of Long Short-Term Memory (LSTM) networks. Unfortunately, the ever-increasing size of LSTM model leads to inefficient designs on FPGAs due to the limited on-chip resources. The previous work proposes to use a pruning based compression technique to reduce the model size and thus speedups the inference on FPGAs. However, the random nature of the pruning technique transforms the dense matrices of the model to highly unstructured sparse ones, which leads to unbalanced computation and irregular memory accesses and thus hurts the overall performance and energy efficiency. In contrast, we propose to use a structured compression technique which could not only reduce the LSTM model size but also eliminate the irregularities of computation and memory accesses. This approach employs block-circulant instead of sparse matrices to compress weight matrices and reduces the storage requirement from (k2) to (k). Fast Fourier Transform algorithm is utilized to further accelerate the inference by reducing the computational complexity from (k2) to (klogk). The datapath and activation functions are quantized as 16-bit to improve the resource utilization. More importantly, we propose a comprehensive framework called C-LSTM to automatically optimize and implement a wide range of LSTM variants on FPGAs. According to the experimental results, C-LSTM achieves up to 18.8X and 33.5X gains for performance and energy efficiency compared with the state-of-the-art LSTM implementation under the same experimental setup, and the accuracy degradation is very small.
AB - Recently, significant accuracy improvement has been achieved for acoustic recognition systems by increasing the model size of Long Short-Term Memory (LSTM) networks. Unfortunately, the ever-increasing size of LSTM model leads to inefficient designs on FPGAs due to the limited on-chip resources. The previous work proposes to use a pruning based compression technique to reduce the model size and thus speedups the inference on FPGAs. However, the random nature of the pruning technique transforms the dense matrices of the model to highly unstructured sparse ones, which leads to unbalanced computation and irregular memory accesses and thus hurts the overall performance and energy efficiency. In contrast, we propose to use a structured compression technique which could not only reduce the LSTM model size but also eliminate the irregularities of computation and memory accesses. This approach employs block-circulant instead of sparse matrices to compress weight matrices and reduces the storage requirement from (k2) to (k). Fast Fourier Transform algorithm is utilized to further accelerate the inference by reducing the computational complexity from (k2) to (klogk). The datapath and activation functions are quantized as 16-bit to improve the resource utilization. More importantly, we propose a comprehensive framework called C-LSTM to automatically optimize and implement a wide range of LSTM variants on FPGAs. According to the experimental results, C-LSTM achieves up to 18.8X and 33.5X gains for performance and energy efficiency compared with the state-of-the-art LSTM implementation under the same experimental setup, and the accuracy degradation is very small.
KW - Block-circulant matrix
KW - Compression
KW - FFT
KW - FPGA
KW - LSTM
KW - RNNs
UR - http://www.scopus.com/inward/record.url?scp=85052102073&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85052102073&partnerID=8YFLogxK
U2 - 10.1145/3174243.3174253
DO - 10.1145/3174243.3174253
M3 - Conference contribution
AN - SCOPUS:85052102073
T3 - FPGA 2018 - Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
SP - 11
EP - 20
BT - FPGA 2018 - Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
PB - Association for Computing Machinery, Inc
Y2 - 25 February 2018 through 27 February 2018
ER -