TY - GEN
T1 - Optimizing data transfers for improved performance on shared GPUs using reinforcement learning
AU - Luley, Ryan S.
AU - Qiu, Qinru
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/13
Y1 - 2018/7/13
N2 - Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.
AB - Optimizing resource utilization is a critical issue in cloud and cluster-based computing systems. In such systems, computing resources often consist of one or more GPU devices, and much research has already been conducted on means for maximizing compute resources through shared execution strategies. However, one of the most severe resource constraints in these scenarios is the data transfer channel between the host (i.e., CPU) and the device (i.e., GPU). Data transfer contention has been shown to have a significant impact on performance, yet methods for optimizing such contention have not been thoroughly studied. Techniques that have been examined make certain assumptions which limit effectiveness in the general case. In this paper, we introduce a heuristic which selectively aggregates transfers in order to maximize system performance by optimizing the transfer channel bandwidth. We compare this heuristic to traditional first-come-first-served approach, and apply Monte Carlo reinforcement learning to find an optimal policy for message aggregation. Finally, we evaluate the performance of Monte Carlo reinforcement learning with an arbitrarily-initialized policy. We demonstrate its effectiveness in learning optimal data transfer policy without detailed system characterization, which will enable a general adaptable solution for resource management of future systems.
KW - Concurrent kernel execution
KW - Data transfer
KW - GPGPU
KW - Reinforcement learning
KW - Resource contention
UR - http://www.scopus.com/inward/record.url?scp=85050990019&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050990019&partnerID=8YFLogxK
U2 - 10.1109/CCGRID.2018.00061
DO - 10.1109/CCGRID.2018.00061
M3 - Conference contribution
AN - SCOPUS:85050990019
T3 - Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
SP - 378
EP - 381
BT - Proceedings - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018
Y2 - 1 May 2018 through 4 May 2018
ER -