7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture

Jinshan Yue, Ruoyang Liu, Wenyu Sun, Zhe Yuan, Zhibo Wang, Yung Ning Tu, Yi Ju Chen, Ao Ren, Yanzhi Wang, Meng Fan Chang, Xueqing Li, Huazhong Yang, Yongpan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128× storage savings and a O({n}{2})-to-O(nlog(n)) computation complexity reduction.

Original languageEnglish (US)
Title of host publication2019 IEEE International Solid-State Circuits Conference, ISSCC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages138-140
Number of pages3
ISBN (Electronic)9781538685310
DOIs
StatePublished - Mar 6 2019
Externally publishedYes
Event2019 IEEE International Solid-State Circuits Conference, ISSCC 2019 - San Francisco, United States
Duration: Feb 17 2019Feb 21 2019

Publication series

NameDigest of Technical Papers - IEEE International Solid-State Circuits Conference
Volume2019-February
ISSN (Print)0193-6530

Conference

Conference2019 IEEE International Solid-State Circuits Conference, ISSCC 2019
CountryUnited States
CitySan Francisco
Period2/17/192/21/19

Fingerprint

Neural networks
Reconfigurable architectures
TOPS
Deep learning

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Electrical and Electronic Engineering

Cite this

Yue, J., Liu, R., Sun, W., Yuan, Z., Wang, Z., Tu, Y. N., ... Liu, Y. (2019). 7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture In 2019 IEEE International Solid-State Circuits Conference, ISSCC 2019 (pp. 138-140). [8662360] (Digest of Technical Papers - IEEE International Solid-State Circuits Conference; Vol. 2019-February). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISSCC.2019.8662360

7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture . / Yue, Jinshan; Liu, Ruoyang; Sun, Wenyu; Yuan, Zhe; Wang, Zhibo; Tu, Yung Ning; Chen, Yi Ju; Ren, Ao; Wang, Yanzhi; Chang, Meng Fan; Li, Xueqing; Yang, Huazhong; Liu, Yongpan.

2019 IEEE International Solid-State Circuits Conference, ISSCC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. p. 138-140 8662360 (Digest of Technical Papers - IEEE International Solid-State Circuits Conference; Vol. 2019-February).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yue, J, Liu, R, Sun, W, Yuan, Z, Wang, Z, Tu, YN, Chen, YJ, Ren, A, Wang, Y, Chang, MF, Li, X, Yang, H & Liu, Y 2019, 7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture in 2019 IEEE International Solid-State Circuits Conference, ISSCC 2019., 8662360, Digest of Technical Papers - IEEE International Solid-State Circuits Conference, vol. 2019-February, Institute of Electrical and Electronics Engineers Inc., pp. 138-140, 2019 IEEE International Solid-State Circuits Conference, ISSCC 2019, San Francisco, United States, 2/17/19. https://doi.org/10.1109/ISSCC.2019.8662360
Yue J, Liu R, Sun W, Yuan Z, Wang Z, Tu YN et al. 7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture In 2019 IEEE International Solid-State Circuits Conference, ISSCC 2019. Institute of Electrical and Electronics Engineers Inc. 2019. p. 138-140. 8662360. (Digest of Technical Papers - IEEE International Solid-State Circuits Conference). https://doi.org/10.1109/ISSCC.2019.8662360
Yue, Jinshan ; Liu, Ruoyang ; Sun, Wenyu ; Yuan, Zhe ; Wang, Zhibo ; Tu, Yung Ning ; Chen, Yi Ju ; Ren, Ao ; Wang, Yanzhi ; Chang, Meng Fan ; Li, Xueqing ; Yang, Huazhong ; Liu, Yongpan. / 7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture 2019 IEEE International Solid-State Circuits Conference, ISSCC 2019. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 138-140 (Digest of Technical Papers - IEEE International Solid-State Circuits Conference).
@inproceedings{dec552e7f1bf48cc96c2f9a9977abdde,
title = "7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture",
abstract = "Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128× storage savings and a O({n}{2})-to-O(nlog(n)) computation complexity reduction.",
author = "Jinshan Yue and Ruoyang Liu and Wenyu Sun and Zhe Yuan and Zhibo Wang and Tu, {Yung Ning} and Chen, {Yi Ju} and Ao Ren and Yanzhi Wang and Chang, {Meng Fan} and Xueqing Li and Huazhong Yang and Yongpan Liu",
year = "2019",
month = "3",
day = "6",
doi = "10.1109/ISSCC.2019.8662360",
language = "English (US)",
series = "Digest of Technical Papers - IEEE International Solid-State Circuits Conference",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "138--140",
booktitle = "2019 IEEE International Solid-State Circuits Conference, ISSCC 2019",

}

TY - GEN

T1 - 7.5 A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified Neural Network Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1 × Higher TOPS/mm 2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture

AU - Yue, Jinshan

AU - Liu, Ruoyang

AU - Sun, Wenyu

AU - Yuan, Zhe

AU - Wang, Zhibo

AU - Tu, Yung Ning

AU - Chen, Yi Ju

AU - Ren, Ao

AU - Wang, Yanzhi

AU - Chang, Meng Fan

AU - Li, Xueqing

AU - Yang, Huazhong

AU - Liu, Yongpan

PY - 2019/3/6

Y1 - 2019/3/6

N2 - Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128× storage savings and a O({n}{2})-to-O(nlog(n)) computation complexity reduction.

AB - Energy-efficient neural-network (NN) processors have been proposed for battery-powered deep-learning applications, where convolutional (CNN), fully-connected (FC) and recurrent NNs (RNN) are three major workloads. To support all of them, previous solutions [1-3] use either area-inefficient heterogeneous architectures, including CNN and RNN cores, or an energy-inefficient reconfigurable architecture. A block-circulant algorithm [4] can unify CNN/FC/RNN workloads with transpose-domain acceleration, as shown in Fig. 7.5.1. Once NN weights are trained using the block-circulant pattern, all workloads are transformed into consistent matrix-vector multiplications (MVM), which can potentially achieve 8 to-128× storage savings and a O({n}{2})-to-O(nlog(n)) computation complexity reduction.

UR - http://www.scopus.com/inward/record.url?scp=85063536330&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063536330&partnerID=8YFLogxK

U2 - 10.1109/ISSCC.2019.8662360

DO - 10.1109/ISSCC.2019.8662360

M3 - Conference contribution

T3 - Digest of Technical Papers - IEEE International Solid-State Circuits Conference

SP - 138

EP - 140

BT - 2019 IEEE International Solid-State Circuits Conference, ISSCC 2019

PB - Institute of Electrical and Electronics Engineers Inc.

ER -