TY - GEN
T1 - A real-time actor-critic architecture for continuous control
AU - Jiao, Zilong
AU - Oh, Jae
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - Reinforcement learning achieved impressive results in various challenging artificial environments and demonstrated its practical potential. In a real-world environment, an agent operates in continuous time, and it is unavoidable for the agent to have control delay. In the context of reinforcement learning, we define control delays as the time delay before an agent actuates an action in a particular state. The high-variance of control delay can destabilize an agent’s learning performance and make an environment Non-Markovian, creating a challenging situation for reinforcement learning algorithms. To address this issue, we present a scalable real-time architecture, RTAC, for reinforcement learning to continuous control. A reinforcement learning application usually consists of a policy training phase in simulation and the deployment phase of the learned policy to the real-world environment. We evaluated RTAC in a simulated environment close to its real-world setting, where agents operate in real-time and learn to map high-dimensional sensor data to continuous actions. In extensive experiments, RTAC was able to stabilize control delay and consistently learn optimal policies. Additionally, we demonstrated that RTAC was suitable for distributed learning even in the presence of control delay.
AB - Reinforcement learning achieved impressive results in various challenging artificial environments and demonstrated its practical potential. In a real-world environment, an agent operates in continuous time, and it is unavoidable for the agent to have control delay. In the context of reinforcement learning, we define control delays as the time delay before an agent actuates an action in a particular state. The high-variance of control delay can destabilize an agent’s learning performance and make an environment Non-Markovian, creating a challenging situation for reinforcement learning algorithms. To address this issue, we present a scalable real-time architecture, RTAC, for reinforcement learning to continuous control. A reinforcement learning application usually consists of a policy training phase in simulation and the deployment phase of the learned policy to the real-world environment. We evaluated RTAC in a simulated environment close to its real-world setting, where agents operate in real-time and learn to map high-dimensional sensor data to continuous actions. In extensive experiments, RTAC was able to stabilize control delay and consistently learn optimal policies. Additionally, we demonstrated that RTAC was suitable for distributed learning even in the presence of control delay.
KW - Actor-critic methods
KW - Continuous control
KW - Control delay
KW - Distributed learning
KW - Real-time architecture
UR - http://www.scopus.com/inward/record.url?scp=85091312033&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091312033&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-55789-8_47
DO - 10.1007/978-3-030-55789-8_47
M3 - Conference contribution
AN - SCOPUS:85091312033
SN - 9783030557881
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 545
EP - 556
BT - Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices - 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020, Proceedings
A2 - Fujita, Hamido
A2 - Sasaki, Jun
A2 - Fournier-Viger, Philippe
A2 - Ali, Moonis
PB - Springer Science and Business Media Deutschland GmbH
T2 - 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2020
Y2 - 22 September 2020 through 25 September 2020
ER -