Abstract
The rapid development of machine learning on acoustic signal processing has resulted in many solutions for detecting emotions from speech. Early works were developed for clean and acted speech and for a fixed set of emotions. Importantly, the datasets and solutions assumed that a person only exhibited one of these emotions. More recent work has continually been adding realism to emotion detection by considering issues such as reverberation, de-amplification, and background noise, but often considering one dataset at a time, and also assuming all emotions are accounted for in the model. We significantly improve realistic considerations for emotion detection by (i) more comprehensively assessing different situations by combining the five common publicly available datasets as one and enhancing the new dataset with data augmentation that considers reverberation and de-amplification, (ii) incorporating 11 typical home noises into the acoustics, and (iii) considering that in real situations a person may be exhibiting many emotions that are not currently of interest and they should not have to fit into a pre-fixed category nor be improperly labeled. Our novel solution combines CNN with out-of-data distribution detection. Our solution increases the situations where emotions can be effectively detected and outperforms a state-of-the-art baseline.
Original language | English (US) |
---|---|
Article number | 15 |
Journal | ACM Transactions on Computing for Healthcare |
Volume | 3 |
Issue number | 2 |
DOIs | |
State | Published - Apr 2022 |
Keywords
- Synthetic datasets
- convolutional neural networks
- distance
- emotion detection
- noise
- out-of-distribution detection
- reverberation
ASJC Scopus subject areas
- Software
- Medicine (miscellaneous)
- Information Systems
- Biomedical Engineering
- Computer Science Applications
- Health Informatics
- Health Information Management