TY - GEN
T1 - Asynchronous multitask reinforcement learning with dropout for continuous control
AU - Jiao, Zilong
AU - Oh, Jae
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Deep reinforcement learning is sample inefficient for solving complex tasks. Recently, multitask reinforcement learning has received increased attention because of its ability to learn general policies with improved sample efficiency. In multitask reinforcement learning, a single agent must learn multiple related tasks, either sequentially or simultaneously. Based on the DDPG algorithm, this paper presents Asyn-DDPG, which asynchronously learns a multitask policy for continuous control with simultaneous worker agents. We empirically found that sparse policy gradients can significantly reduce interference among conflicting tasks and make multitask learning more stable and sample efficient. To ensure the sparsity of gradients evaluated for each task, Asyn-DDPG represents both actor and critic functions as deep neural networks and regularizes them using Dropout. During training, worker agents share the actor and the critic functions, and asynchronously optimize them using task-specific gradients. For evaluating Asyn-DDPG, we proposed robotic navigation tasks based on realistically simulated robots and physics-enabled maze-like environments. Although the number of tasks used in our experiment is small, each task is conducted based on a real-world setting and posts a challenging environment. Through extensive evaluation, we demonstrate that Dropout regularization can effectively stabilize asynchronous learning and enable Asyn-DDPG to outperform DDPG significantly. Also, Asyn-DDPG was able to learn a multitask policy that can be well generalized for handling environments unseen during training.
AB - Deep reinforcement learning is sample inefficient for solving complex tasks. Recently, multitask reinforcement learning has received increased attention because of its ability to learn general policies with improved sample efficiency. In multitask reinforcement learning, a single agent must learn multiple related tasks, either sequentially or simultaneously. Based on the DDPG algorithm, this paper presents Asyn-DDPG, which asynchronously learns a multitask policy for continuous control with simultaneous worker agents. We empirically found that sparse policy gradients can significantly reduce interference among conflicting tasks and make multitask learning more stable and sample efficient. To ensure the sparsity of gradients evaluated for each task, Asyn-DDPG represents both actor and critic functions as deep neural networks and regularizes them using Dropout. During training, worker agents share the actor and the critic functions, and asynchronously optimize them using task-specific gradients. For evaluating Asyn-DDPG, we proposed robotic navigation tasks based on realistically simulated robots and physics-enabled maze-like environments. Although the number of tasks used in our experiment is small, each task is conducted based on a real-world setting and posts a challenging environment. Through extensive evaluation, we demonstrate that Dropout regularization can effectively stabilize asynchronous learning and enable Asyn-DDPG to outperform DDPG significantly. Also, Asyn-DDPG was able to learn a multitask policy that can be well generalized for handling environments unseen during training.
KW - Asynchronous method
KW - Continuous control
KW - Deep reinforcement learning
KW - Multitask reinforcement learning
KW - Partial observability
UR - http://www.scopus.com/inward/record.url?scp=85080877032&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85080877032&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2019.00099
DO - 10.1109/ICMLA.2019.00099
M3 - Conference contribution
AN - SCOPUS:85080877032
T3 - Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
SP - 529
EP - 534
BT - Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
A2 - Wani, M. Arif
A2 - Khoshgoftaar, Taghi M.
A2 - Wang, Dingding
A2 - Wang, Huanjing
A2 - Seliya, Naeem
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
Y2 - 16 December 2019 through 19 December 2019
ER -