A real-time prototype for small-vocabulary audio-visual ASR

J. H. Connell, N. Haas, E. Marcheret, C. Neti, G. Potamianos, S. Velipasalar

Research output: Chapter in Book/Entry/PoemConference contribution

15 Scopus citations

Abstract

We present a prototype for the automatic recognition of audio-visual speech, developed to augment the IBM ViaVoicetrade speech recognition system. Frontal face, full frame video is captured through a USB 2.0 interface by means of an inexpensive PC camera, and processed to obtain appearance-based visual features. Subsequently, these are combined with audio features, synchronously extracted from the acoustic signal, using a simple discriminant feature fusion technique. On the average, the required computations utilize approximately 67% of a Pentiumtrade 4, 1.8 GHz processor, leaving the remaining resources available to hidden Markov model based speech recognition. Real-time performance is there-fore achieved for small-vocabulary tasks, such as connected-digit recognition. In the paper, we discuss the prototype architecture based on the ViaVoice engine, the basic algorithms employed, and their necessary modifications to ensure real-time performance and causality of the visual front end processing. We benchmark the resulting system performance on stored videos against prior research experiments, and we report a close match between the two.

Original languageEnglish (US)
Title of host publicationProceedings - 2003 International Conference on Multimedia and Expo, ICME
PublisherIEEE Computer Society
PagesII469-II472
ISBN (Electronic)0780379659
DOIs
StatePublished - 2003
Externally publishedYes
Event2003 International Conference on Multimedia and Expo, ICME 2003 - Baltimore, United States
Duration: Jul 6 2003Jul 9 2003

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Other

Other2003 International Conference on Multimedia and Expo, ICME 2003
Country/TerritoryUnited States
CityBaltimore
Period7/6/037/9/03

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A real-time prototype for small-vocabulary audio-visual ASR'. Together they form a unique fingerprint.

Cite this