TY - GEN
T1 - Efficient Human Activity Classification from Egocentric Videos Incorporating Actor-Critic Reinforcement Learning
AU - Lu, Yantao
AU - Li, Yilan
AU - Velipasalar, Senem
N1 - Funding Information:
The information, data, or work presented herein was funded in part by National Science Foundation under Grants 1739748 and 1816732, and by the Advanced Research Projects Agency-Energy, U.S. Department of Energy, under Award Number DE-AR0000940. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - In this paper, we introduce a novel framework to significantly reduce the computational cost of human temporal activity recognition from egocentric videos while maintaining the accuracy at the same level. We propose to apply the actor-critic model of reinforcement learning to optical flow data to locate a bounding box around region of interest, which is then used for clipping a sub-image from a video frame. We also propose to use one shallow and one deeper 3D convolutional neural network to process the original image and the clipped image region, respectively. We compared our proposed method with another approach using 3D convolutional networks on the recently released Dataset of Multimodal Semantic Egocentric Video. Experimental results show that the proposed method reduces the processing time by 36.4% while providing comparable accuracy at the same time.
AB - In this paper, we introduce a novel framework to significantly reduce the computational cost of human temporal activity recognition from egocentric videos while maintaining the accuracy at the same level. We propose to apply the actor-critic model of reinforcement learning to optical flow data to locate a bounding box around region of interest, which is then used for clipping a sub-image from a video frame. We also propose to use one shallow and one deeper 3D convolutional neural network to process the original image and the clipped image region, respectively. We compared our proposed method with another approach using 3D convolutional networks on the recently released Dataset of Multimodal Semantic Egocentric Video. Experimental results show that the proposed method reduces the processing time by 36.4% while providing comparable accuracy at the same time.
KW - activity classification
KW - actor critic
KW - reinforcement learning
UR - http://www.scopus.com/inward/record.url?scp=85076814542&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076814542&partnerID=8YFLogxK
U2 - 10.1109/ICIP.2019.8803823
DO - 10.1109/ICIP.2019.8803823
M3 - Conference contribution
AN - SCOPUS:85076814542
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 564
EP - 568
BT - 2019 IEEE International Conference on Image Processing, ICIP 2019 - Proceedings
PB - IEEE Computer Society
T2 - 26th IEEE International Conference on Image Processing, ICIP 2019
Y2 - 22 September 2019 through 25 September 2019
ER -