Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition

Mo Sun, Jian Li, Hui Feng, Wei Gou, Haifeng Shen, Jian Tang, Yi Yang, Jieping Ye

Research output: Chapter in Book/Entry/PoemConference contribution

11 Scopus citations

Abstract

This paper presents our approach for Audio-video Group Emotion Recognition sub-challenge in the EmotiW 2020. The task is to classify a video into one of the group emotions such as positive, neutral, and negative. Our approach exploits two different feature levels for this task, spatio-temporal feature and static feature level. In spatio-temporal feature level, we adopt multiple input modalities (RGB, RGB difference, optical flow, warped optical flow) into multiple video classification network to train the spatio-temporal model. In static feature level, we crop all faces and bodies in an image with the state-of the-art human pose estimation method and train kinds of CNNs with the image-level labels of group emotions. Finally, we fuse all 14 models result together, and achieve the third place in this sub-challenge with classification accuracies of 71.93% and 70.77% on the validation set and test set, respectively.

Original languageEnglish (US)
Title of host publicationICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction
PublisherAssociation for Computing Machinery, Inc
Pages835-840
Number of pages6
ISBN (Electronic)9781450375818
DOIs
StatePublished - Oct 21 2020
Event22nd ACM International Conference on Multimodal Interaction, ICMI 2020 - Virtual, Online, Netherlands
Duration: Oct 25 2020Oct 29 2020

Publication series

NameICMI 2020 - Proceedings of the 2020 International Conference on Multimodal Interaction

Conference

Conference22nd ACM International Conference on Multimodal Interaction, ICMI 2020
Country/TerritoryNetherlands
CityVirtual, Online
Period10/25/2010/29/20

Keywords

  • audio-video based emotion recognition
  • group-level emotion recognition
  • multi-model

ASJC Scopus subject areas

  • Hardware and Architecture
  • Human-Computer Interaction
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Multi-modal Fusion Using Spatio-temporal and Static Features for Group Emotion Recognition'. Together they form a unique fingerprint.

Cite this