Optimizing data transfers for improved performance on shared GPUs using reinforcement learning

Ryan S. Luley, Qinru Qiu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.

Original languageEnglish (US)
Title of host publicationProceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages378-381
Number of pages4
ISBN (Electronic)9781538658154
DOIs
StatePublished - Jul 13 2018
Event18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018 - Washington, United States
Duration: May 1 2018May 4 2018

Other

Other18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
CountryUnited States
CityWashington
Period5/1/185/4/18

Fingerprint

Reinforcement learning
Data transfer
Program processors
Agglomeration
Bandwidth
Graphics processing unit

Keywords

  • Concurrent kernel execution
  • Data transfer
  • GPGPU
  • Reinforcement learning
  • Resource contention

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture

Cite this

Luley, R. S., & Qiu, Q. (2018). Optimizing data transfers for improved performance on shared GPUs using reinforcement learning. In Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018 (pp. 378-381). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CCGRID.2018.00061

Optimizing data transfers for improved performance on shared GPUs using reinforcement learning. / Luley, Ryan S.; Qiu, Qinru.

Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 378-381.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Luley, RS & Qiu, Q 2018, Optimizing data transfers for improved performance on shared GPUs using reinforcement learning. in Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. Institute of Electrical and Electronics Engineers Inc., pp. 378-381, 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018, Washington, United States, 5/1/18. https://doi.org/10.1109/CCGRID.2018.00061
Luley RS, Qiu Q. Optimizing data transfers for improved performance on shared GPUs using reinforcement learning. In Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 378-381 https://doi.org/10.1109/CCGRID.2018.00061
Luley, Ryan S. ; Qiu, Qinru. / Optimizing data transfers for improved performance on shared GPUs using reinforcement learning. Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 378-381
@inproceedings{7d1f5fca1b144e28992591798dbcb6f8,
title = "Optimizing data transfers for improved performance on shared GPUs using reinforcement learning",
abstract = "Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.",
keywords = "Concurrent kernel execution, Data transfer, GPGPU, Reinforcement learning, Resource contention",
author = "Luley, {Ryan S.} and Qinru Qiu",
year = "2018",
month = "7",
day = "13",
doi = "10.1109/CCGRID.2018.00061",
language = "English (US)",
pages = "378--381",
booktitle = "Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Optimizing data transfers for improved performance on shared GPUs using reinforcement learning

AU - Luley, Ryan S.

AU - Qiu, Qinru

PY - 2018/7/13

Y1 - 2018/7/13

N2 - Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.

AB - Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.

KW - Concurrent kernel execution

KW - Data transfer

KW - GPGPU

KW - Reinforcement learning

KW - Resource contention

UR - http://www.scopus.com/inward/record.url?scp=85050990019&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85050990019&partnerID=8YFLogxK

U2 - 10.1109/CCGRID.2018.00061

DO - 10.1109/CCGRID.2018.00061

M3 - Conference contribution

AN - SCOPUS:85050990019

SP - 378

EP - 381

BT - Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -