TY - JOUR
T1 - PnP-DRL
T2 - A Plug-and-Play Deep Reinforcement Learning Approach for Experience-Driven Networking
AU - Xu, Zhiyuan
AU - Wu, Kun
AU - Zhang, Weiyi
AU - Tang, Jian
AU - Wang, Yanzhi
AU - Xue, Guoliang
N1 - Funding Information:
Manuscript received December 15, 2020; revised March 15, 2021; accepted April 18, 2021. Date of publication June 14, 2021; date of current version July 16, 2021. This work was supported in part by the NSF under Grant 1704662 and Grant 1704092. (Corresponding author: Jian Tang.) Zhiyuan Xu, Kun Wu, and Jian Tang are with the Department of Computer Science and Engineering, Syracuse University, Syracuse, NY 13244 USA (e-mail: zxu105@syr.edu; kwu102@syr.edu; jtang02@syr.edu).
Funding Information:
ACKNOWLEDGMENT This work was supported in part by the NSF under Grant 1704662 and Grant 1704092. The information reported here does not reflect the position of the government.
Publisher Copyright:
© 1983-2012 IEEE.
PY - 2021/8
Y1 - 2021/8
N2 - While Deep Reinforcement Learning has emerged as a de facto approach to many complex experience-driven networking problems, it remains challenging to deploy DRL into real systems. Due to the random exploration or half-trained deep neural networks during the online training process, the DRL agent may make unexpected decisions, which may lead to system performance degradation or even system crash. In this paper, we propose PnP-DRL, an offline-trained, plug and play DRL solution, to leverage the batch reinforcement learning approach to learn the best control policy from pre-collected transition samples without interacting with the system. After being trained without interaction with systems, our Plug and Play DRL agent will start working seamlessly, without additional exploration or possible disruption of the running systems. We implement and evaluate our PnP-DRL solution on a prevalent experience-driven networking problem, Dynamic Adaptive Streaming over HTTP (DASH). Extensive experimental results manifest that 1) The existing batch reinforcement learning method has its limits; 2) Our approach PnP-DRL significantly outperforms classical adaptive bitrate algorithms in average user Quality of Experience (QoE); 3) PnP-DRL, unlike the state-of-the-art online DRL methods, can be off and running without learning gaps, while achieving comparable performances.
AB - While Deep Reinforcement Learning has emerged as a de facto approach to many complex experience-driven networking problems, it remains challenging to deploy DRL into real systems. Due to the random exploration or half-trained deep neural networks during the online training process, the DRL agent may make unexpected decisions, which may lead to system performance degradation or even system crash. In this paper, we propose PnP-DRL, an offline-trained, plug and play DRL solution, to leverage the batch reinforcement learning approach to learn the best control policy from pre-collected transition samples without interacting with the system. After being trained without interaction with systems, our Plug and Play DRL agent will start working seamlessly, without additional exploration or possible disruption of the running systems. We implement and evaluate our PnP-DRL solution on a prevalent experience-driven networking problem, Dynamic Adaptive Streaming over HTTP (DASH). Extensive experimental results manifest that 1) The existing batch reinforcement learning method has its limits; 2) Our approach PnP-DRL significantly outperforms classical adaptive bitrate algorithms in average user Quality of Experience (QoE); 3) PnP-DRL, unlike the state-of-the-art online DRL methods, can be off and running without learning gaps, while achieving comparable performances.
KW - Experience-driven networking
KW - batch reinforcement learning
KW - deep reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85110621965&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85110621965&partnerID=8YFLogxK
U2 - 10.1109/JSAC.2021.3087270
DO - 10.1109/JSAC.2021.3087270
M3 - Article
AN - SCOPUS:85110621965
SN - 0733-8716
VL - 39
SP - 2476
EP - 2486
JO - IEEE Journal on Selected Areas in Communications
JF - IEEE Journal on Selected Areas in Communications
IS - 8
M1 - 9454317
ER -