Audiovisual Integration in Speech Perception


Audiovisual Effects

The way we perceive speech can be dramatically affected by visual information from a speaker's face. For instance, if you hear the consonant /b/ while watching the face of a speaker saying /g/, you are likely to hear the sound as a /d/ (McGurk & MacDonald, 1976).

reference: McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.

Stephens & Holt (submitted) Behavioral Data Try this example: Watch these movies, then play them again with your eyes closed. What did you hear each time?

Movie 1.

Movie 2.

It is likely that you heard the first as /aba/ and the second as /ada/ (or possibly /atha/). However, if you listened with your eyes closed, you may have discovered that the sounds were actually identical! Information from the movements of the speaker's mouth changed your perception of the sounds.

Various theories have been used to explain how auditory and visual information are combined. However, little research has explored the influence of learning on audiovisual speech perception. We have investigated the effects of experience on audiovisual integration in speech perception by training individuals to use novel visual cues for speech. In our experiments, participants watched and listened to an animated cartoon robot that moved as it produced speech sounds. Over several training sessions, participants learned about a systematic relationship between the robot's movements and the consonants /b/, /d/, and /g/.

Here are some examples of the animated robot:

/aba/
/ada/
/aga/


After training, most participants learned the relationship between the robot's movements and the speech sounds well enough that they could identify which consonant the robot produced just by watching it move, without sound.

Participants were able to use information from the robot to improve their accuracy in identifying consonants in noise (top two panels of figure). Depending on how the robot vidoes were presented during training, listeners could use the robot to improve identification accuracy to the same extent as by watching a speaker's face (bottom panel of figure). Further research in this area may have implications for improving speech perception in noise or with hearing impairments, and may help to improve theories of information integration in speech perception.

reference: Stephens, J.D.W., & Holt, L.L. (submitted). Training of an artificial visual cue for use in speech identification.

















<-- BACK Phonetic Context Effects
Lexical Context Effects
Audiovisual Effects
Spectrally Rotated Speach
NEXT -->


Top | Home | Lori Holt | Research | People | Teaching | Contact | Gallery | Email Webmaster
Site designed by Seth Liber, maintained by Anthony Kelly.