Arasid: Artificial reverberation-adjusted indoor speaker identification dealing with variable distances

Zeya Chen, Mohsin Y. Ahmed, Asif Salekin, John A. Stankovic

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Indoor speaker identification systems have been researched for a long time and are widely used in many human interaction acoustic monitoring systems. Many works have focused on improving accuracy in dealing with different realisms, including noise and varying distances from the microphone. However, these works either require significant extra effort such as measuring room types and dimensions, obtaining many speakers’ samples, or requiring expensive hardware such as microphone arrays and complex deployment settings. In this paper, we introduce a complete speaker identification solution using an artificial reverberation generator with different parameters to adjust the original close-distance speech samples so that each speaker has different artificial voice samples. Samples in different environments are not required because these artificial samples are close approximations to different environments. Two kinds of models, GMM-UBM and the i-vector, are evaluated. The models are trained on all samples separately, and testing is done against all in parallel. A score fusing approach with two thresholds, a minimum value and a minimum difference, is applied to the scores in producing the final result. Also, several standard acoustic pre-processing routines, including a voice activity detection algorithm and an overlapped speech remover, are included to make the system fully deployable. Finally, to assess the improvements when applying a reverberation adjustment, we evaluate our system with two literature speech databases, one has 251 people and the other one has four kinds of emotions. Further, we perform an in-lab speaking experiment. The evaluation results show our system has more than 90% accuracy in identifying speakers within 6 meters if the emotion is neutral, and a 10% improve- ment over no reverberation adjustments when speakers have non-neutral emotions.

Original languageEnglish (US)
Title of host publicationInternational Conference on Embedded Wireless Systems and Networks, EWSN 2019
EditorsYunhao Liu, Guoliang Xing
PublisherJunction Publishing
Pages154-165
Number of pages12
ISBN (Print)9780994988638
StatePublished - 2019
Externally publishedYes
EventInternational Conference on Embedded Wireless Systems and Networks, EWSN 2019 - Beijing, China
Duration: Feb 25 2019Feb 27 2019

Publication series

NameInternational Conference on Embedded Wireless Systems and Networks
ISSN (Electronic)2562-2331

Conference

ConferenceInternational Conference on Embedded Wireless Systems and Networks, EWSN 2019
Country/TerritoryChina
CityBeijing
Period2/25/192/27/19

Keywords

  • Distance
  • Reverberation
  • Speaker identification

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Arasid: Artificial reverberation-adjusted indoor speaker identification dealing with variable distances'. Together they form a unique fingerprint.

Cite this