E-RNN: Design optimization for efficient recurrent neural networks in FPGAS

Zhe Li, Caiwen Ding, Siyue Wang, Wujie Wen, Youwei Zhuo, Chang Liu, Qinru Qiu, Wenyao Xu, Xue Lin, Xuehai Qian, Yanzhi Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement.We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy.

Original languageEnglish (US)
Title of host publicationProceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages69-80
Number of pages12
ISBN (Electronic)9781728114446
DOIs
StatePublished - Mar 26 2019
Event25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 - Washington, United States
Duration: Feb 16 2019Feb 20 2019

Publication series

NameProceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019

Conference

Conference25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
CountryUnited States
CityWashington
Period2/16/192/20/19

Fingerprint

Recurrent neural networks
Field programmable gate arrays (FPGA)
Hardware
Energy efficiency
Design optimization
Chemical activation
Speech recognition
Time series

Keywords

  • Block-circulant matrix
  • Design optimization
  • FPGAS
  • RNN

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Li, Z., Ding, C., Wang, S., Wen, W., Zhuo, Y., Liu, C., ... Wang, Y. (2019). E-RNN: Design optimization for efficient recurrent neural networks in FPGAS. In Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 (pp. 69-80). [8675229] (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HPCA.2019.00028

E-RNN : Design optimization for efficient recurrent neural networks in FPGAS. / Li, Zhe; Ding, Caiwen; Wang, Siyue; Wen, Wujie; Zhuo, Youwei; Liu, Chang; Qiu, Qinru; Xu, Wenyao; Lin, Xue; Qian, Xuehai; Wang, Yanzhi.

Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 69-80 8675229 (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Li, Z, Ding, C, Wang, S, Wen, W, Zhuo, Y, Liu, C, Qiu, Q, Xu, W, Lin, X, Qian, X & Wang, Y 2019, E-RNN: Design optimization for efficient recurrent neural networks in FPGAS. in Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019., 8675229, Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Institute of Electrical and Electronics Engineers Inc., pp. 69-80, 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019, Washington, United States, 2/16/19. https://doi.org/10.1109/HPCA.2019.00028
Li Z, Ding C, Wang S, Wen W, Zhuo Y, Liu C et al. E-RNN: Design optimization for efficient recurrent neural networks in FPGAS. In Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 69-80. 8675229. (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019). https://doi.org/10.1109/HPCA.2019.00028
Li, Zhe ; Ding, Caiwen ; Wang, Siyue ; Wen, Wujie ; Zhuo, Youwei ; Liu, Chang ; Qiu, Qinru ; Xu, Wenyao ; Lin, Xue ; Qian, Xuehai ; Wang, Yanzhi. / E-RNN : Design optimization for efficient recurrent neural networks in FPGAS. Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 69-80 (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019).
@inproceedings{cf3c89357b7f4562a490751cb48ae84f,
title = "E-RNN: Design optimization for efficient recurrent neural networks in FPGAS",
abstract = "Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement.We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy.",
keywords = "Block-circulant matrix, Design optimization, FPGAS, RNN",
author = "Zhe Li and Caiwen Ding and Siyue Wang and Wujie Wen and Youwei Zhuo and Chang Liu and Qinru Qiu and Wenyao Xu and Xue Lin and Xuehai Qian and Yanzhi Wang",
year = "2019",
month = "3",
day = "26",
doi = "10.1109/HPCA.2019.00028",
language = "English (US)",
series = "Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "69--80",
booktitle = "Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019",

}

TY - GEN

T1 - E-RNN

T2 - Design optimization for efficient recurrent neural networks in FPGAS

AU - Li, Zhe

AU - Ding, Caiwen

AU - Wang, Siyue

AU - Wen, Wujie

AU - Zhuo, Youwei

AU - Liu, Chang

AU - Qiu, Qinru

AU - Xu, Wenyao

AU - Lin, Xue

AU - Qian, Xuehai

AU - Wang, Yanzhi

PY - 2019/3/26

Y1 - 2019/3/26

N2 - Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement.We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy.

AB - Recurrent Neural Networks (RNNs) are becoming increasingly important for time series-related applications which require efficient and real-time implementations. The two major types are Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks. It is a challenging task to have real-time, efficient, and accurate hardware RNN implementations because of the high sensitivity to imprecision accumulation and the requirement of special activation function implementations. Recently two works have focused on FPGA implementation of inference phase of LSTM RNNs with model compression. First, ESE uses a weight pruning based compressed RNN model but suffers from irregular network structure after pruning. The second work C-LSTM mitigates the irregular network limitation by incorporating block-circulant matrices for weight matrix representation in RNNs, thereby achieving simultaneous model compression and acceleration. A key limitation of the prior works is the lack of a systematic design optimization framework of RNN model and hardware implementations, especially when the block size (or compression ratio) should be jointly optimized with RNN type, layer size, etc. In this paper, we adopt the block-circulant matrixbased framework, and present the Efficient RNN (E-RNN) framework for FPGA implementations of the Automatic Speech Recognition (ASR) application. The overall goal is to improve performance/energy efficiency under accuracy requirement.We use the alternating direction method of multipliers (ADMM) technique for more accurate block-circulant training, and present two design explorations providing guidance on block size and reducing RNN training trials. Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations given RNN model, including processing element design/optimization, quantization, activation implementation, etc. 1 Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy.

KW - Block-circulant matrix

KW - Design optimization

KW - FPGAS

KW - RNN

UR - http://www.scopus.com/inward/record.url?scp=85064200253&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064200253&partnerID=8YFLogxK

U2 - 10.1109/HPCA.2019.00028

DO - 10.1109/HPCA.2019.00028

M3 - Conference contribution

AN - SCOPUS:85064200253

T3 - Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019

SP - 69

EP - 80

BT - Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -