TY - GEN
T1 - DSTER
T2 - 18th IEEE International Joint Conference on Biometrics, IJCB 2024
AU - Chen, Frank
AU - Rao, Shruti
AU - Tiwari, Brijesh
AU - Phoha, Vir V.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Emotion Recognition is a critical research area for enhancing human-computer interaction. Keystroke dynamics, a behavioral biometric capturing typing patterns, offers a non-intrusive, user-friendly method for recognizing emotions. We propose a Dual-Stream Transformer-based Emotion Recognition (DSTER) model, which leverages keystroke dynamics to determine emotional states. The DSTER model features a dual-stream architecture that separately extracts temporal-over-channel and channel-over-temporal information. Each stream employs multi-head self-attention mechanisms, Long-Short Term Memory (LSTM), and Convolutional Neural Network (CNN) layers, along with dense vector embeddings of keycode data, to improve the extraction of temporal and contextual information from typing sequences. To the best of our knowledge, the DSTER model is the first to integrate transformer architecture with keystroke dynamics for emotion recognition. Our experiments on a widely-used fixed-text dataset demonstrate that the DSTER model significantly outperforms the three most recent baseline models, achieving average F1 scores up to 0.989 and an average accuracy increase of up to 66.04%. Unlike the significant performance variations reported in baseline models, the DSTER model maintains consistent and robust performance across all five tested emotional states. Further analysis shows that the model performs better with longer window lengths and greater overlaps.
AB - Emotion Recognition is a critical research area for enhancing human-computer interaction. Keystroke dynamics, a behavioral biometric capturing typing patterns, offers a non-intrusive, user-friendly method for recognizing emotions. We propose a Dual-Stream Transformer-based Emotion Recognition (DSTER) model, which leverages keystroke dynamics to determine emotional states. The DSTER model features a dual-stream architecture that separately extracts temporal-over-channel and channel-over-temporal information. Each stream employs multi-head self-attention mechanisms, Long-Short Term Memory (LSTM), and Convolutional Neural Network (CNN) layers, along with dense vector embeddings of keycode data, to improve the extraction of temporal and contextual information from typing sequences. To the best of our knowledge, the DSTER model is the first to integrate transformer architecture with keystroke dynamics for emotion recognition. Our experiments on a widely-used fixed-text dataset demonstrate that the DSTER model significantly outperforms the three most recent baseline models, achieving average F1 scores up to 0.989 and an average accuracy increase of up to 66.04%. Unlike the significant performance variations reported in baseline models, the DSTER model maintains consistent and robust performance across all five tested emotional states. Further analysis shows that the model performs better with longer window lengths and greater overlaps.
UR - http://www.scopus.com/inward/record.url?scp=85211376137&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85211376137&partnerID=8YFLogxK
U2 - 10.1109/IJCB62174.2024.10744524
DO - 10.1109/IJCB62174.2024.10744524
M3 - Conference contribution
AN - SCOPUS:85211376137
T3 - Proceedings - 2024 IEEE International Joint Conference on Biometrics, IJCB 2024
BT - Proceedings - 2024 IEEE International Joint Conference on Biometrics, IJCB 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 September 2024 through 18 September 2024
ER -