Effective utilization of CUDA hyper-Q for improved power and performance efficiency

Ryan S. Luley, Qinru Qiu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

High utilization of hardware resources is the key for designing performance and power optimized GPUapplications. The efficiency of applications and kernels, which do not fully utilize the GPU resources, can be improved through concurrent execution with independent kernels and/or applications. Hyper-Q enables multiple CPU threads or processes to launch work on a single GPU simultaneously for increased GPU utilization. However, without careful design, false serialization may occur due to the contention for shared hardware resources such as direct memory access (DMA) engines. In this paper, we reveal the impact of such contention on performance and assess a method for overcoming the limitation with minimal algorithmic overhead. We demonstrate a method to achieve up to 31.8% improvement in performance and 10.4%reduction in energy on average for a finite set of application tasks when maximizing GPU execution concurrency.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
PublisherIEEE Computer Society
Pages1160-1169
Number of pages10
Volume2016-August
ISBN (Electronic)9781509021406
DOIs
StatePublished - Aug 2 2016
Event30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 - Chicago, United States
Duration: May 23 2016May 27 2016

Other

Other30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
CountryUnited States
CityChicago
Period5/23/165/27/16

Fingerprint

Computer hardware
Program processors
Engines
Hardware
Data storage equipment
Graphics processing unit

Keywords

  • Concurrency
  • GPU performance
  • GPU utilization
  • Hyper-Q
  • Power efficiency
  • Resource sharing

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Hardware and Architecture
  • Software

Cite this

Luley, R. S., & Qiu, Q. (2016). Effective utilization of CUDA hyper-Q for improved power and performance efficiency. In Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 (Vol. 2016-August, pp. 1160-1169). [7529999] IEEE Computer Society. https://doi.org/10.1109/IPDPSW.2016.154

Effective utilization of CUDA hyper-Q for improved power and performance efficiency. / Luley, Ryan S.; Qiu, Qinru.

Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016. Vol. 2016-August IEEE Computer Society, 2016. p. 1160-1169 7529999.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Luley, RS & Qiu, Q 2016, Effective utilization of CUDA hyper-Q for improved power and performance efficiency. in Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016. vol. 2016-August, 7529999, IEEE Computer Society, pp. 1160-1169, 30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016, Chicago, United States, 5/23/16. https://doi.org/10.1109/IPDPSW.2016.154
Luley RS, Qiu Q. Effective utilization of CUDA hyper-Q for improved power and performance efficiency. In Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016. Vol. 2016-August. IEEE Computer Society. 2016. p. 1160-1169. 7529999 https://doi.org/10.1109/IPDPSW.2016.154
Luley, Ryan S. ; Qiu, Qinru. / Effective utilization of CUDA hyper-Q for improved power and performance efficiency. Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016. Vol. 2016-August IEEE Computer Society, 2016. pp. 1160-1169
@inproceedings{3c2f084424f64160a0c41aeabddc997a,
title = "Effective utilization of CUDA hyper-Q for improved power and performance efficiency",
abstract = "High utilization of hardware resources is the key for designing performance and power optimized GPUapplications. The efficiency of applications and kernels, which do not fully utilize the GPU resources, can be improved through concurrent execution with independent kernels and/or applications. Hyper-Q enables multiple CPU threads or processes to launch work on a single GPU simultaneously for increased GPU utilization. However, without careful design, false serialization may occur due to the contention for shared hardware resources such as direct memory access (DMA) engines. In this paper, we reveal the impact of such contention on performance and assess a method for overcoming the limitation with minimal algorithmic overhead. We demonstrate a method to achieve up to 31.8{\%} improvement in performance and 10.4{\%}reduction in energy on average for a finite set of application tasks when maximizing GPU execution concurrency.",
keywords = "Concurrency, GPU performance, GPU utilization, Hyper-Q, Power efficiency, Resource sharing",
author = "Luley, {Ryan S.} and Qinru Qiu",
year = "2016",
month = "8",
day = "2",
doi = "10.1109/IPDPSW.2016.154",
language = "English (US)",
volume = "2016-August",
pages = "1160--1169",
booktitle = "Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016",
publisher = "IEEE Computer Society",
address = "United States",

}

TY - GEN

T1 - Effective utilization of CUDA hyper-Q for improved power and performance efficiency

AU - Luley, Ryan S.

AU - Qiu, Qinru

PY - 2016/8/2

Y1 - 2016/8/2

N2 - High utilization of hardware resources is the key for designing performance and power optimized GPUapplications. The efficiency of applications and kernels, which do not fully utilize the GPU resources, can be improved through concurrent execution with independent kernels and/or applications. Hyper-Q enables multiple CPU threads or processes to launch work on a single GPU simultaneously for increased GPU utilization. However, without careful design, false serialization may occur due to the contention for shared hardware resources such as direct memory access (DMA) engines. In this paper, we reveal the impact of such contention on performance and assess a method for overcoming the limitation with minimal algorithmic overhead. We demonstrate a method to achieve up to 31.8% improvement in performance and 10.4%reduction in energy on average for a finite set of application tasks when maximizing GPU execution concurrency.

AB - High utilization of hardware resources is the key for designing performance and power optimized GPUapplications. The efficiency of applications and kernels, which do not fully utilize the GPU resources, can be improved through concurrent execution with independent kernels and/or applications. Hyper-Q enables multiple CPU threads or processes to launch work on a single GPU simultaneously for increased GPU utilization. However, without careful design, false serialization may occur due to the contention for shared hardware resources such as direct memory access (DMA) engines. In this paper, we reveal the impact of such contention on performance and assess a method for overcoming the limitation with minimal algorithmic overhead. We demonstrate a method to achieve up to 31.8% improvement in performance and 10.4%reduction in energy on average for a finite set of application tasks when maximizing GPU execution concurrency.

KW - Concurrency

KW - GPU performance

KW - GPU utilization

KW - Hyper-Q

KW - Power efficiency

KW - Resource sharing

UR - http://www.scopus.com/inward/record.url?scp=84991584944&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84991584944&partnerID=8YFLogxK

U2 - 10.1109/IPDPSW.2016.154

DO - 10.1109/IPDPSW.2016.154

M3 - Conference contribution

AN - SCOPUS:84991584944

VL - 2016-August

SP - 1160

EP - 1169

BT - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016

PB - IEEE Computer Society

ER -