TY - GEN
T1 - End-to-end reinforcement learning for multi-agent continuous control
AU - Jiao, Zilong
AU - Oh, Jae
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - In end-to-end reinforcement learning, an agent captures the entire mapping from its raw sensor data to actuation commands using a single neural network. End-to-end reinforcement learning is mostly studied in single-agent domains, and its scalability to multi-agent setting is under-explored. Without effective techniques, learning effective policies based on the joint observation of agents can be intractable, particularly when sensor data perceived by each agent is high-dimensional. Extending the multi-agent actor-critic method MADDPG, this paper presents Rec-MADDPG, an end-to-end reinforcement learning method for multi-agent continuous control in a cooperative environment. To ease end-to-end learning in a multi-agent setting, we proposed two embedding mechanisms, joint and independent embedding, to project agents' joint sensor observation to low-dimensional features. For training efficiency, we applied parameter sharing and the A3C-based asynchronous framework to Rec-MADDPG. Considering the challenges that can arise in real-world multi-agent control, we evaluated Rec-MADDPG in robotic navigation tasks based on realistic simulated robots and physics enable environments. Through extensive evaluation, we demonstrated that Rec-MADDPG can significantly outperform MADDPG and was able to learn individual end-to-end policies for continuous control based on raw sensor data. In addition, compared to joint embedding, independent embedding enabled Rec-MADDPG to learn even better optimal policies.
AB - In end-to-end reinforcement learning, an agent captures the entire mapping from its raw sensor data to actuation commands using a single neural network. End-to-end reinforcement learning is mostly studied in single-agent domains, and its scalability to multi-agent setting is under-explored. Without effective techniques, learning effective policies based on the joint observation of agents can be intractable, particularly when sensor data perceived by each agent is high-dimensional. Extending the multi-agent actor-critic method MADDPG, this paper presents Rec-MADDPG, an end-to-end reinforcement learning method for multi-agent continuous control in a cooperative environment. To ease end-to-end learning in a multi-agent setting, we proposed two embedding mechanisms, joint and independent embedding, to project agents' joint sensor observation to low-dimensional features. For training efficiency, we applied parameter sharing and the A3C-based asynchronous framework to Rec-MADDPG. Considering the challenges that can arise in real-world multi-agent control, we evaluated Rec-MADDPG in robotic navigation tasks based on realistic simulated robots and physics enable environments. Through extensive evaluation, we demonstrated that Rec-MADDPG can significantly outperform MADDPG and was able to learn individual end-to-end policies for continuous control based on raw sensor data. In addition, compared to joint embedding, independent embedding enabled Rec-MADDPG to learn even better optimal policies.
KW - Continuous control
KW - End-to-end reinforcement learning
KW - Multi-agent learning
KW - State abstraction
UR - http://www.scopus.com/inward/record.url?scp=85080884201&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85080884201&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2019.00100
DO - 10.1109/ICMLA.2019.00100
M3 - Conference contribution
AN - SCOPUS:85080884201
T3 - Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
SP - 535
EP - 540
BT - Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
A2 - Wani, M. Arif
A2 - Khoshgoftaar, Taghi M.
A2 - Wang, Dingding
A2 - Wang, Huanjing
A2 - Seliya, Naeem
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019
Y2 - 16 December 2019 through 19 December 2019
ER -