TY - GEN
T1 - E-RNN
T2 - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
AU - Li, Zhe
AU - Ding, Caiwen
AU - Wang, Siyue
AU - Wen, Wujie
AU - Zhuo, Youwei
AU - Liu, Chang
AU - Qiu, Qinru
AU - Xu, Wenyao
AU - Lin, Xue
AU - Qian, Xuehai
AU - Wang, Yanzhi
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/3/26
Y1 - 2019/3/26
N2 - Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement.We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy.
AB - Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement.We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy.
KW - Block-circulant matrix
KW - Design optimization
KW - FPGAS
KW - RNN
UR - http://www.scopus.com/inward/record.url?scp=85064200253&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064200253&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2019.00028
DO - 10.1109/HPCA.2019.00028
M3 - Conference contribution
AN - SCOPUS:85064200253
T3 - Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
SP - 69
EP - 80
BT - Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 February 2019 through 20 February 2019
ER -