Towards ultra-high performance and energy efficiency of deep learning systems: An algorithm-hardware co-optimization framework

Yanzhi Wang, Caiwen Ding, Zhe Li, Geng Yuan, Siyu Liao, Xiaolong Ma, Bo Yuan, Xuehai Qian, Jian Tang, Qinru Qiu, Xue Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff of accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n 2 ) to O(n log n) and storage complexity from O(n 2 ) to O(n), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.

Original languageEnglish (US)
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
PublisherAAAI Press
Pages4235-4243
Number of pages9
ISBN (Electronic)9781577358008
StatePublished - Jan 1 2018
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: Feb 2 2018Feb 7 2018

Publication series

Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Conference

Conference32nd AAAI Conference on Artificial Intelligence, AAAI 2018
CountryUnited States
CityNew Orleans
Period2/2/182/7/18

Fingerprint

Energy efficiency
Learning systems
Hardware
Field programmable gate arrays (FPGA)
Computational complexity
Deep learning
Industry
Deep neural networks

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Wang, Y., Ding, C., Li, Z., Yuan, G., Liao, S., Ma, X., ... Lin, X. (2018). Towards ultra-high performance and energy efficiency of deep learning systems: An algorithm-hardware co-optimization framework. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 (pp. 4235-4243). (32nd AAAI Conference on Artificial Intelligence, AAAI 2018). AAAI Press.

Towards ultra-high performance and energy efficiency of deep learning systems : An algorithm-hardware co-optimization framework. / Wang, Yanzhi; Ding, Caiwen; Li, Zhe; Yuan, Geng; Liao, Siyu; Ma, Xiaolong; Yuan, Bo; Qian, Xuehai; Tang, Jian; Qiu, Qinru; Lin, Xue.

32nd AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI Press, 2018. p. 4235-4243 (32nd AAAI Conference on Artificial Intelligence, AAAI 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Y, Ding, C, Li, Z, Yuan, G, Liao, S, Ma, X, Yuan, B, Qian, X, Tang, J, Qiu, Q & Lin, X 2018, Towards ultra-high performance and energy efficiency of deep learning systems: An algorithm-hardware co-optimization framework. in 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, AAAI Press, pp. 4235-4243, 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, United States, 2/2/18.
Wang Y, Ding C, Li Z, Yuan G, Liao S, Ma X et al. Towards ultra-high performance and energy efficiency of deep learning systems: An algorithm-hardware co-optimization framework. In 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI Press. 2018. p. 4235-4243. (32nd AAAI Conference on Artificial Intelligence, AAAI 2018).
Wang, Yanzhi ; Ding, Caiwen ; Li, Zhe ; Yuan, Geng ; Liao, Siyu ; Ma, Xiaolong ; Yuan, Bo ; Qian, Xuehai ; Tang, Jian ; Qiu, Qinru ; Lin, Xue. / Towards ultra-high performance and energy efficiency of deep learning systems : An algorithm-hardware co-optimization framework. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. AAAI Press, 2018. pp. 4235-4243 (32nd AAAI Conference on Artificial Intelligence, AAAI 2018).
@inproceedings{9278063a8d29478c99a2b57abf0d4ce0,
title = "Towards ultra-high performance and energy efficiency of deep learning systems: An algorithm-hardware co-optimization framework",
abstract = "Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff of accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n 2 ) to O(n log n) and storage complexity from O(n 2 ) to O(n), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.",
author = "Yanzhi Wang and Caiwen Ding and Zhe Li and Geng Yuan and Siyu Liao and Xiaolong Ma and Bo Yuan and Xuehai Qian and Jian Tang and Qinru Qiu and Xue Lin",
year = "2018",
month = "1",
day = "1",
language = "English (US)",
series = "32nd AAAI Conference on Artificial Intelligence, AAAI 2018",
publisher = "AAAI Press",
pages = "4235--4243",
booktitle = "32nd AAAI Conference on Artificial Intelligence, AAAI 2018",

}

TY - GEN

T1 - Towards ultra-high performance and energy efficiency of deep learning systems

T2 - An algorithm-hardware co-optimization framework

AU - Wang, Yanzhi

AU - Ding, Caiwen

AU - Li, Zhe

AU - Yuan, Geng

AU - Liao, Siyu

AU - Ma, Xiaolong

AU - Yuan, Bo

AU - Qian, Xuehai

AU - Tang, Jian

AU - Qiu, Qinru

AU - Lin, Xue

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff of accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n 2 ) to O(n log n) and storage complexity from O(n 2 ) to O(n), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.

AB - Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural networks (DNNs). An algorithm-hardware co-optimization framework is developed, which is applicable to different DNN types, sizes, and application scenarios. The algorithm part adopts the general block-circulant matrices to achieve a fine-grained tradeoff of accuracy and compression ratio. It applies to both fully-connected and convolutional layers and contains a mathematically rigorous proof of the effectiveness of the method. The proposed algorithm reduces computational complexity per layer from O(n 2 ) to O(n log n) and storage complexity from O(n 2 ) to O(n), both for training and inference. The hardware part consists of highly efficient Field Programmable Gate Array (FPGA)-based implementations using effective reconfiguration, batch processing, deep pipelining, resource re-using, and hierarchical control. Experimental results demonstrate that the proposed framework achieves at least 152X speedup and 71X energy efficiency gain compared with IBM TrueNorth processor under the same test accuracy. It achieves at least 31X energy efficiency gain compared with the reference FPGA-based work.

UR - http://www.scopus.com/inward/record.url?scp=85052762317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052762317&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85052762317

T3 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

SP - 4235

EP - 4243

BT - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

PB - AAAI Press

ER -