TY - JOUR
T1 - Autonomous Human Activity Classification from Wearable Multi-Modal Sensors
AU - Lu, Yantao
AU - Velipasalar, Senem
N1 - Funding Information:
Manuscript received June 14, 2019; revised August 6, 2019; accepted August 7, 2019. Date of publication August 12, 2019; date of current version November 13, 2019. The information, data, or work presented herein was funded in part by the National Science Foundation (NSF) under Grant 1739748, Grant 1816732 and by the Advanced Research Projects Agency-Energy (ARPA-E), U.S. Department of Energy, under Award Number DE-AR0000940. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. The associate editor coordinating the review of this article and approving it for publication was Prof. Chang-hee Won. (Corresponding author: Senem Velipasalar.) The authors are with the Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY 13244 USA (e-mail: ylu25@syr.edu; svelipas@syr.edu). Digital Object Identifier 10.1109/JSEN.2019.2934678
Publisher Copyright:
© 2001-2012 IEEE.
PY - 2019/12/1
Y1 - 2019/12/1
N2 - There has been significant amount of research work on human activity classification relying either on Inertial Measurement Unit (IMU) data or data from static cameras providing a third-person view. There has been relatively less work using wearable cameras, providing first-person or egocentric view, and even fewer approaches combining egocentric video with IMU data. Using only IMU data limits the variety and complexity of the activities that can be detected. For instance, the sitting activity can be detected by IMU data, but it cannot be determined whether the subject has sat on a chair or a sofa, or where the subject is. To perform fine-grained activity classification, and to distinguish between activities that cannot be differentiated by only IMU data, we present an autonomous and robust method using data from both wearable cameras and IMUs. In contrast to convolutional neural network-based approaches, we propose to employ capsule networks to obtain features from egocentric video data. Moreover, Convolutional Long Short Term Memory framework is employed both on egocentric videos and IMU data to capture the temporal aspect of actions. We also propose a genetic algorithm-based approach to autonomously and systematically set various network parameters, rather than using manual settings. Experiments have been conducted to perform 9- and 26-label activity classification, and the proposed method, using autonomously set network parameters, has provided very promising results, achieving overall accuracies of 86.6% and 77.2%, respectively. The proposed approach, combining both modalities, also provides increased accuracy compared to using only egovision data and only IMU data.
AB - There has been significant amount of research work on human activity classification relying either on Inertial Measurement Unit (IMU) data or data from static cameras providing a third-person view. There has been relatively less work using wearable cameras, providing first-person or egocentric view, and even fewer approaches combining egocentric video with IMU data. Using only IMU data limits the variety and complexity of the activities that can be detected. For instance, the sitting activity can be detected by IMU data, but it cannot be determined whether the subject has sat on a chair or a sofa, or where the subject is. To perform fine-grained activity classification, and to distinguish between activities that cannot be differentiated by only IMU data, we present an autonomous and robust method using data from both wearable cameras and IMUs. In contrast to convolutional neural network-based approaches, we propose to employ capsule networks to obtain features from egocentric video data. Moreover, Convolutional Long Short Term Memory framework is employed both on egocentric videos and IMU data to capture the temporal aspect of actions. We also propose a genetic algorithm-based approach to autonomously and systematically set various network parameters, rather than using manual settings. Experiments have been conducted to perform 9- and 26-label activity classification, and the proposed method, using autonomously set network parameters, has provided very promising results, achieving overall accuracies of 86.6% and 77.2%, respectively. The proposed approach, combining both modalities, also provides increased accuracy compared to using only egovision data and only IMU data.
KW - Activity classification
KW - IMU data
KW - capsule network
KW - egocentric
KW - egovision
KW - genetic algorithm
UR - http://www.scopus.com/inward/record.url?scp=85077499463&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077499463&partnerID=8YFLogxK
U2 - 10.1109/JSEN.2019.2934678
DO - 10.1109/JSEN.2019.2934678
M3 - Article
AN - SCOPUS:85077499463
SN - 1530-437X
VL - 19
SP - 11403
EP - 11412
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
IS - 23
M1 - 8794564
ER -