CIRCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices

Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yipeng Zhang, Jian Tang, Qinru Qiu, Xue Lin, Bo Yuan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

54 Citations (Scopus)

Abstract

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffiers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n logn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIRCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.

Original languageEnglish (US)
Title of host publicationMICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings
PublisherIEEE Computer Society
Pages395-408
Number of pages14
VolumePart F131207
ISBN (Electronic)9781450349529
DOIs
StatePublished - Oct 14 2017
Event50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017 - Cambridge, United States
Duration: Oct 14 2017Oct 18 2017

Other

Other50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017
CountryUnited States
CityCambridge
Period10/14/1710/18/17

Fingerprint

Energy efficiency
Application specific integrated circuits
Fast Fourier transforms
Field programmable gate arrays (FPGA)
Throughput
Inference engines
Network architecture
Computer hardware
Deep neural networks
Scalability
Computational complexity
Neural networks
Hardware
Data storage equipment

Keywords

  • Acceleration
  • Block-circulant matrix
  • Compression
  • Deep learning
  • FPGA

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Ding, C., Liao, S., Wang, Y., Li, Z., Liu, N., Zhuo, Y., ... Yuan, B. (2017). CIRCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices. In MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings (Vol. Part F131207, pp. 395-408). IEEE Computer Society. https://doi.org/10.1145/3123939.3124552

CIRCNN : Accelerating and compressing deep neural networks using block-circulant weight matrices. / Ding, Caiwen; Liao, Siyu; Wang, Yanzhi; Li, Zhe; Liu, Ning; Zhuo, Youwei; Wang, Chao; Qian, Xuehai; Bai, Yu; Yuan, Geng; Ma, Xiaolong; Zhang, Yipeng; Tang, Jian; Qiu, Qinru; Lin, Xue; Yuan, Bo.

MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings. Vol. Part F131207 IEEE Computer Society, 2017. p. 395-408.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ding, C, Liao, S, Wang, Y, Li, Z, Liu, N, Zhuo, Y, Wang, C, Qian, X, Bai, Y, Yuan, G, Ma, X, Zhang, Y, Tang, J, Qiu, Q, Lin, X & Yuan, B 2017, CIRCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices. in MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings. vol. Part F131207, IEEE Computer Society, pp. 395-408, 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017, Cambridge, United States, 10/14/17. https://doi.org/10.1145/3123939.3124552
Ding C, Liao S, Wang Y, Li Z, Liu N, Zhuo Y et al. CIRCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices. In MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings. Vol. Part F131207. IEEE Computer Society. 2017. p. 395-408 https://doi.org/10.1145/3123939.3124552
Ding, Caiwen ; Liao, Siyu ; Wang, Yanzhi ; Li, Zhe ; Liu, Ning ; Zhuo, Youwei ; Wang, Chao ; Qian, Xuehai ; Bai, Yu ; Yuan, Geng ; Ma, Xiaolong ; Zhang, Yipeng ; Tang, Jian ; Qiu, Qinru ; Lin, Xue ; Yuan, Bo. / CIRCNN : Accelerating and compressing deep neural networks using block-circulant weight matrices. MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings. Vol. Part F131207 IEEE Computer Society, 2017. pp. 395-408
@inproceedings{486585acfb204287bab725345129f373,
title = "CIRCNN: Accelerating and compressing deep neural networks using block-circulant weight matrices",
abstract = "Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffiers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n logn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same {"}effectiveness{"} as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIRCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.",
keywords = "Acceleration, Block-circulant matrix, Compression, Deep learning, FPGA",
author = "Caiwen Ding and Siyu Liao and Yanzhi Wang and Zhe Li and Ning Liu and Youwei Zhuo and Chao Wang and Xuehai Qian and Yu Bai and Geng Yuan and Xiaolong Ma and Yipeng Zhang and Jian Tang and Qinru Qiu and Xue Lin and Bo Yuan",
year = "2017",
month = "10",
day = "14",
doi = "10.1145/3123939.3124552",
language = "English (US)",
volume = "Part F131207",
pages = "395--408",
booktitle = "MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings",
publisher = "IEEE Computer Society",
address = "United States",

}

TY - GEN

T1 - CIRCNN

T2 - Accelerating and compressing deep neural networks using block-circulant weight matrices

AU - Ding, Caiwen

AU - Liao, Siyu

AU - Wang, Yanzhi

AU - Li, Zhe

AU - Liu, Ning

AU - Zhuo, Youwei

AU - Wang, Chao

AU - Qian, Xuehai

AU - Bai, Yu

AU - Yuan, Geng

AU - Ma, Xiaolong

AU - Zhang, Yipeng

AU - Tang, Jian

AU - Qiu, Qinru

AU - Lin, Xue

AU - Yuan, Bo

PY - 2017/10/14

Y1 - 2017/10/14

N2 - Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffiers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n logn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIRCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.

AB - Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffiers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n logn) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same "effectiveness" as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CIRCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.

KW - Acceleration

KW - Block-circulant matrix

KW - Compression

KW - Deep learning

KW - FPGA

UR - http://www.scopus.com/inward/record.url?scp=85034015289&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034015289&partnerID=8YFLogxK

U2 - 10.1145/3123939.3124552

DO - 10.1145/3123939.3124552

M3 - Conference contribution

AN - SCOPUS:85034015289

VL - Part F131207

SP - 395

EP - 408

BT - MICRO 2017 - 50th Annual IEEE/ACM International Symposium on Microarchitecture Proceedings

PB - IEEE Computer Society

ER -