Humanity's desire to enable machines to "understand" us drives research that seeks to uncover the mysteries of human beings and of their reactions. That is because a computer's ability to correctly classify our emotions will lead to an enhanced experience for a user. Making use of the eye of the computer, a webcam, we can acquire human reaction data through the acquisition of facial images in response to stimuli. The data of interest in this research are changes in pupil size and gaze patterns in conjunction with classification of facial expression. Although fusion of these measurements has been considered in the past by Xiang and Kankanhalli  as well as Valverde et al. , their approach was quite different from ours. Both groups used a multimodal set-up: an eye tracker alongside a webcam and the stimulus was visual. A novel approach is to avoid costly eye trackers and rely on images acquired only from a standard webcam to measure changes in pupil size, gaze patterns and facial expression in response to auditory stimuli. The auditory mode is often preferred since luminance does not need to be accounted for, unlike visual stimulation from a monitor. The fusion of the information from these features is then used to distinguish between negative, neutral and positive emotional states. In this paper we discuss an experiment (n = 15) where the stimuli from the auditory version of the international affective picture system (IAPS) are used to elicit these three main emotions in participants. Webcam data is recorded during the experiments and advanced signal processing and feature extraction techniques are used on the resulting image files to achieve a model capable of predicting neutral, positive, and negative emotional states.