TY - GEN
T1 - Human activity classification incorporating egocentric video and inertial measurement unit data
AU - Lu, Yantao
AU - Velipasalar, Senem
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Many methods have been proposed for human activity classification, which rely either on Inertial Measurement Unit (IMU) data or data from static cameras watching subjects. There have been relatively less work using egocentric videos, and even fewer approaches combining egocentric video and IMU data. Systems relying only on IMU data are limited in the complexity of the activities that they can detect. In this paper, we present a robust and autonomous method, for fine-grained activity classification, that leverages data from multiple wearable sensor modalities to differentiate between activities, which are similar in nature, with a level of accuracy that would be impossible by each sensor alone. We use both egocentric videos and IMU sensors on the body. We employ Capsule Networks together with Convolutional Long Short Term Memory (LSTM) to analyze egocentric videos, and an LSTM framework to analyze IMU data, and capture temporal aspect of actions. We performed experiments on the CMU-MMAC dataset achieving overall recall and precision rates of 85.8% and 86.2%, respectively. We also present results of using each sensor modality alone, which show that the proposed approach provides 19.47% and 39.34% increase in accuracy compared to using only ego-vision data and only IMU data, respectively.
AB - Many methods have been proposed for human activity classification, which rely either on Inertial Measurement Unit (IMU) data or data from static cameras watching subjects. There have been relatively less work using egocentric videos, and even fewer approaches combining egocentric video and IMU data. Systems relying only on IMU data are limited in the complexity of the activities that they can detect. In this paper, we present a robust and autonomous method, for fine-grained activity classification, that leverages data from multiple wearable sensor modalities to differentiate between activities, which are similar in nature, with a level of accuracy that would be impossible by each sensor alone. We use both egocentric videos and IMU sensors on the body. We employ Capsule Networks together with Convolutional Long Short Term Memory (LSTM) to analyze egocentric videos, and an LSTM framework to analyze IMU data, and capture temporal aspect of actions. We performed experiments on the CMU-MMAC dataset achieving overall recall and precision rates of 85.8% and 86.2%, respectively. We also present results of using each sensor modality alone, which show that the proposed approach provides 19.47% and 39.34% increase in accuracy compared to using only ego-vision data and only IMU data, respectively.
KW - Activity classification
KW - Capsule networks
KW - Egocentric video
KW - IMU data
KW - Multi-modal sensors
UR - http://www.scopus.com/inward/record.url?scp=85063083739&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063083739&partnerID=8YFLogxK
U2 - 10.1109/GlobalSIP.2018.8646367
DO - 10.1109/GlobalSIP.2018.8646367
M3 - Conference contribution
AN - SCOPUS:85063083739
T3 - 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018 - Proceedings
SP - 429
EP - 433
BT - 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018
Y2 - 26 November 2018 through 29 November 2018
ER -