Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices

Siyu Liao, Zhe Li, Xue Lin, Qinru Qiu, Yanzhi Wang, Bo Yuan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ∼ 116X with negligible accuracy degradation.

Original languageEnglish (US)
Title of host publication2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages458-465
Number of pages8
Volume2017-November
ISBN (Electronic)9781538630938
DOIs
StatePublished - Dec 13 2017
Event36th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017 - Irvine, United States
Duration: Nov 13 2017Nov 16 2017

Other

Other36th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
CountryUnited States
CityIrvine
Period11/13/1711/16/17

Fingerprint

Hardware
Energy efficiency
Data storage equipment
Deep neural networks
Fast Fourier transforms
Computer hardware
Particle accelerators
Learning systems
Field programmable gate arrays (FPGA)
Costs
Degradation

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design

Cite this

Liao, S., Li, Z., Lin, X., Qiu, Q., Wang, Y., & Yuan, B. (2017). Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices. In 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017 (Vol. 2017-November, pp. 458-465). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCAD.2017.8203813

Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices. / Liao, Siyu; Li, Zhe; Lin, Xue; Qiu, Qinru; Wang, Yanzhi; Yuan, Bo.

2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017. Vol. 2017-November Institute of Electrical and Electronics Engineers Inc., 2017. p. 458-465.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liao, S, Li, Z, Lin, X, Qiu, Q, Wang, Y & Yuan, B 2017, Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices. in 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017. vol. 2017-November, Institute of Electrical and Electronics Engineers Inc., pp. 458-465, 36th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017, Irvine, United States, 11/13/17. https://doi.org/10.1109/ICCAD.2017.8203813
Liao S, Li Z, Lin X, Qiu Q, Wang Y, Yuan B. Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices. In 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017. Vol. 2017-November. Institute of Electrical and Electronics Engineers Inc. 2017. p. 458-465 https://doi.org/10.1109/ICCAD.2017.8203813
Liao, Siyu ; Li, Zhe ; Lin, Xue ; Qiu, Qinru ; Wang, Yanzhi ; Yuan, Bo. / Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices. 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017. Vol. 2017-November Institute of Electrical and Electronics Engineers Inc., 2017. pp. 458-465
@inproceedings{6b83ced1dbdc4ed0a75ae693150bafb1,
title = "Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices",
abstract = "Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ∼ 116X with negligible accuracy degradation.",
author = "Siyu Liao and Zhe Li and Xue Lin and Qinru Qiu and Yanzhi Wang and Bo Yuan",
year = "2017",
month = "12",
day = "13",
doi = "10.1109/ICCAD.2017.8203813",
language = "English (US)",
volume = "2017-November",
pages = "458--465",
booktitle = "2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Energy-efficient, high-performance, highly-compressed deep neural network design using block-circulant matrices

AU - Liao, Siyu

AU - Li, Zhe

AU - Lin, Xue

AU - Qiu, Qinru

AU - Wang, Yanzhi

AU - Yuan, Bo

PY - 2017/12/13

Y1 - 2017/12/13

N2 - Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ∼ 116X with negligible accuracy degradation.

AB - Deep neural networks (DNNs) have emerged as the most powerful machine learning technique in numerous artificial intelligent applications. However, the large sizes of DNNs make themselves both computation and memory intensive, thereby limiting the hardware performance of dedicated DNN accelerators. In this paper, we propose a holistic framework for energy-efficient high-performance highly-compressed DNN hardware design. First, we propose block-circulant matrix-based DNN training and inference schemes, which theoretically guarantee Big-O complexity reduction in both computational cost (from O(n2) to O(n log n)) and storage requirement (from O(n2) to O(n)) of DNNs. Second, we dedicatedly optimize the hardware architecture, especially on the key fast Fourier transform (FFT) module, to improve the overall performance in terms of energy efficiency, computation performance and resource cost. Third, we propose a design flow to perform hardware-software co-optimization with the purpose of achieving good balance between test accuracy and hardware performance of DNNs. Based on the proposed design flow, two block-circulant matrix-based DNNs on two different datasets are implemented and evaluated on FPGA. The fixed-point quantization and the proposed block-circulant matrix-based inference scheme enables the network to achieve as high as 3.5 TOPS computation performance and 3.69 TOPS/W energy efficiency while the memory is saved by 108X ∼ 116X with negligible accuracy degradation.

UR - http://www.scopus.com/inward/record.url?scp=85043491124&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85043491124&partnerID=8YFLogxK

U2 - 10.1109/ICCAD.2017.8203813

DO - 10.1109/ICCAD.2017.8203813

M3 - Conference contribution

AN - SCOPUS:85043491124

VL - 2017-November

SP - 458

EP - 465

BT - 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -