|
|
Conference Presentations
2008 2007 2006 2005 2004 2003 2002 2001 2000 1999
2008
Workshop on Linguistic Variation Across the Lifespan
Columbus, OH, USA
May 2-3, 2008
6-Month Olds are More Sensitive to Variations in F1 than F2 in Vowels
Hufnagle, D., Curtin, S., Holt, L. L. & Keisling, S. L.
Infants at six months have already begun to organize vowel categories according to their native language, based on the fact that they are less able to perceive non-native vowel contrasts (Kuhl et al., 1992). However, the details of infants' native vowel organization are not well-characterized. We test infants' sensitivity to two prominent cues to vowel identity, first and second formant (F1 and F2). Ultimately we are pursuing an understanding of whether and how infants perceive variation within vowel categories such as those that occur across dialects.
Adults who were native of an English-speaking area in North America were recorded to determine the parameters of the local vowel space. We determined the center of the vowel space for each category in terms of F1 and F2 based on these measurements. We then synthesized nine /u/ and nine /I/ vowels that varied along F1 and F2: the vowel from the center of that space (centroid), four vowels that varied along F1 (with centroid F2), and four that varied along F2 (with centroid F1). Ten 6-month-old infants from the same area heard trials that contained four tokens of the /u/ or /I/ centroid vowel, followed by four instances of a variant, followed by four more centroid tokens (task adapted from Best & Jones, 1998). All infants heard two pretest trials (a non-varying trial for each phoneme) followed by all 16 variable trial types and the two non-varying trials for comparison in random orders. Looking time after the first switch was recorded using the Sequential Looking Preference procedure (Cooper & Aslin, 1990). The assumption is that infants who notice vowel changes in a given trial (and thus, discriminate among the tokens) will look longer. Other vowels were not tested due to experimental constraints stemming from the attention span of infants.
A repeated-measures ANOVA revealed a main effect of formant, with infants listening significantly longer to F1 variations than to F2 variations. The formant effect was significant for both vowels, and the identity of the vowel did not interact with the formant effect in other ways. Other effects and their interactions were not significant. These results suggest that infants are likely to be more sensitive to differences in F1. The explanation for this result could be that no vowels closely border /I/ or /u/ along F2 in this English variety, so infants should be more tolerant of this variation. On the other hand, both /I/ and /u/ have neighboring vowels along F1, making it more likely that infants would be sensitive to this dimension. Our results show that infants are quite sensitive to variation within vowel categories, even at six months of age.
These results also suggest that community variation in the F2 dimension will likely be greater than in the F1 dimension for both peripheral and non-peripheral vowels (in Labov's 1994 terminology). However, it also supports a view of sound change in which closely-realized vowels produce misunderstandings or mischaracterizations that lead to linguistic change (see Labov 1994:586-588).
-------------- top | Research --------------
15th Annual Cognitive Neuroscience Meeting
San Francisco, CA, USA
April 12-15, 2008
Changes in functional organization following naturalistic training on complex auditory categories
R. Leech, L. Holt, J. Devlin, & F. Dick
Poster available in PDF format
-------------- top | Research --------------
International Society on Infant Studies
Vancouver, BC, Canada
March 27-29
Six-month olds’ acquisition of variable and stable vowel categories
Hufnagle, D., Curtin, S., Holt, L. L., & Kiesling, S. F.
Poster available in PDF format
-------------- top | Research --------------
2nd Annual Auditory Cognitive Science Society
Tucson, AZ, USA
January 11, 2008
Cue weighting in speech perception
Lori L. Holt
On Motor Theory and Mirror Neurons
Lori L. Holt
-------------- top | Research --------------
2007
Auditory Cognitive Science Society -- Inaugural Meeting
Tuscon, AZ, USA
January 8, 2007
Speech Perception Cognition
Lori L. Holt
Talk slides available in PDF format
-------------- top | Research --------------
2006
151st Meeting of the Acoustical Society of America
Providence, Rhode Island, USA
June 5-9, 2006
A theoretical model of cochlear processing improves spectrally degraded speech perception
Evan C. Smith & Lori L. Holt
Poster available in PDF format
Smith and Lewicki, Neural Comp. 17, 19–45, 2005a; Adv. Neural. Inf. Process. Syst. 17, 1289–1296, 2005b; Nature, 439, 7079, 2006, demonstrated that mammalian hearing follows an efficient coding principle (Barlow, Sensory Communications, 217–234, 1961; Atick, Network, 3(2), 213–251, 1992; Simoncelli and Olshausen, Ann. Rev. Neurosci., 24, 1193–1216, 2001; Laughlin and Sejnowski, ``Communications in Neuronal Networks,'' Science, 301, 1870–1874 2003). Auditory neurons efficiently code for natural sounds in the environment, maximizing information rate while minimizing coding cost (Shannon, Science, 270, 303–304, 1948). Applying the same analysis to speech coding suggests that speech acoustics are optimally adapted to the mammalian auditory code (Smith and Lewicki, Neural Comp. 17, 19–45, 2005a; Adv. Neural. Inf. Process. Syst. 17, 1289–1296, 2005b; Nature, 439, 7079, 2006). The present work applies this efficient coding theory to the problem of speech perception in individuals using cochlear implants (CI), for which there exist vast individual differences in speech perception and spectral resolution (Zeng et al., Auditory Prostheses and Electric Hearing, 20, 1–14, 2004). A machine-learning method for CI filterbank design based on the efficient-coding hypothesis is presented. Further, a pair of experiments to evaluate this approach using noise-excited vocoder speech (Shannon et al., Bell Systems Technical Journal 27, 379–423, 623–656, 1995) is described. Participants' recognition of continuous speech and isolated syllables is significantly more accurate for speech filtered through the theoretically-motivated efficient-coding filterbank relative to the standard cochleotopic filterbank, particularly for speech transients. These findings offer insight in CI design and provide behavioral evidence for efficient coding in human perception.
Experience-driven effects of visual cues in speech perception
Joseph D. W. Stephens & Lori L. Holt
Poster available in PDF format
The integration of information across modalities is a key component of behavior in everyday settings. The current study examined the extent to which experience drives multimodal speech integration. Two groups of participants were trained on combinations of speech sounds with corresponding videos of an animated robot, whose movements and features bore no resemblance to speech articulators. Participants' identification of acoustically presented consonants was influenced by simultaneous presentation of learned visual stimuli in a manner that reflected the correlation structure of auditory and visual cues in training. The influence of novel non face visual cues on speech perception developed over the course of training, suggesting that experience altered the perceptual mechanisms used in combining this cross-modal information. Pairings of auditory and visual cues given to two groups of participants resulted in patterns of bimodal perception that differed in systematic ways. Perceptual integration of the newly learned visual cues with auditory speech was not optimal, reflecting a bias toward the sound cue. The findings are discussed in reference to current models of speech perception and multimodal integration.
-------------- top | Research --------------
2005
46th Meeting of the Psychonomic Society
Toronto, ON, Canada
November 10-13, 2005
Learned cross-modal integration of novel visual cues with auditory speech
J.D.W. Stephens & L.L. Holt
Poster available in PDF format
The integration of information across modalities is a key component of behavior in everyday settings. However, little is known about the extent to which experience affects mechanisms of multimodal integration. In the current study, participants were trained for more than ten sessions on audiovisual combinations of speech sounds and corresponding movements of an animated robot, whose features bore no resemblance to speech articulators.
Participants' use of auditory and visual information was tested periodically throughout the experiment. During training, participants' identification of acoustically-presented consonants began to be influenced by simultaneous presentation of trained visual stimuli. The nature of this influence changed by the end of training, suggesting that further experience altered perceptual mechanisms for combining information. A subsequent experiment manipulated relations between the trained visual stimuli, such that they were more incompatible with the structure of natural visual speech. The findings are relevant to theories of speech perception and multimodal integration.
-------------- top | Research --------------
149th Meeting of the Acoustical Society of America
Vancouver, BC, Canada
May 16-20, 2005
Perception of coarticulated speech with contrastively enhanced spectrotemporal patterns
L.L. Holt & T. Wade
Poster available in PDF format
High-level contrastive mechanisms cause perception of auditory events to be influenced by spectral and temporal properties of surrounding acoustic context, and may play a role in perceptual compensation for coarticulation in human speech. However, it is unknown whether auditory contrast is incorporated optimally to compensate for different speakers, languages and situations or whether amplification of the processes involved would provide additional benefit, for example, in the perception of hypoarticulated speech, under adverse listening conditions, or in an incompletely acquired language. This study examines effects of artificial contrastive modification of spectrotemporal trajectories on the intelligibility of connected speech in noise by native and non-native listeners. Adopting methods known to improve automatic classification of speech sounds, we model contrast-providing context as an averaged estimated vocal tract function (LPC-derived log area ratio coefficient vector) over a Gaussian-weighted temporal window. Local coefficient values are adjusted from this context based on previously observed contrastive perceptual tendencies, and the intelligibility of the resulting speech is compared with that of unmodified trajectories across listener language backgrounds. Results are discussed with respect to implementation and applicability of general auditory processes.
Categorization of spectrally complex non-invariant auditory stimuli in a computer game task
T. Wade & L.L. Holt
Poster available in PDF format
This study examined perceptual learning of spectrally complex nonspeech auditory categories in an interactive multi-modal training paradigm. Participants played a computer game in which they navigated through a three-dimensional space while responding to animated characters encountered along the way. Characters appearances in the game correlated with distinctive sound category distributions, exemplars of which repeated each time the characters were encountered. As the game progressed, the speed and difficulty of required tasks increased and characters became harder to identify visually, so quick identification of approaching characters by sound patterns was, although never required or encouraged, of gradually increasing benefit. After thirty minutes of play, participants performed a categorization task, matching sounds to characters. Despite not being informed of audio-visual correlations, participants exhibited reliable learning of these patterns at post-test. Categorization accuracy was related to several measures of game performance and category learning was sensitive to category distribution differences modeling acoustic structures of speech categories. Category knowledge resulting from the game was qualitatively different from that gained from an explicit unsupervised categorization task involving the same stimuli. Results are discussed with respect to information sources and mechanisms involved in acquiring complex, context-dependent auditory categories, including phonetic categories, and to multi-modal statistical learning.
How auditory discontinuities and linguistic experience affect the perception of speech and non-speech in English- and Spanish-speaking listeners.
Jessica F. Hay, Lori L. Holt, Andrew J. Lotto & Randy L. Diehl
Poster available in PDF format
The present study was designed to investigate the effects of long-term linguistic experience on the perception of non-speech sounds in English and Spanish speakers. Research using tone-onset-time (TOT) stimuli, a type of non-speech analogue of voice-onset-time (VOT) stimuli, has suggested that there is an underlying auditory basis for the perception of stop consonants based on a threshold for detecting onset asynchronies in the vicinity of +20 ms. For English listeners, stop consonant labeling boundaries are congruent with the positive auditory discontinuity, while Spanish speakers place their VOT labeling boundaries and discrimination peaks in the vicinity of 0 ms VOT. The present study addresses the question of whether long-term linguistic experience with different VOT categories affects the perception of non-speech stimuli that are analogous in their acoustic timing characteristics. A series of synthetic VOT stimuli and TOT stimuli were created for this study. Using language appropriate labeling and ABX discrimination tasks, labeling boundaries (VOT) and discrimination peaks (VOT and TOT) are assessed for 24 monolingual English speakers and 24 monolingual Spanish speakers. The interplay between language experience and auditory biases are discussed. [Work supported by NIDCD.]
-------------- top | Research --------------
Annual Meeting of the Cognitive Neuroscience Society
New York, NY
April 9-12, 2005
Experience-driven audio-visual integration in speech perception
J. D. W. Stephens & L. L. Holt
Poster available in PDF format
Integration of auditory and visual cues greatly affects speech perception. Current theories of speech perception make different assumptions about the perceptual processes underlying cross-modal integration. A method was developed for testing these theoretical assumptions by training participants on novel visual speech cues that can be controlled and manipulated in ways that normal visual speech cues (i.e., speakers' faces) cannot. In a preliminary experiment, participants played an hour-long video game on each of five consecutive days. During the game they learned to identify consonants based on the movements of an animated robot, whose features bore no resemblance to speech articulators. Subsequent to training, participants accurately identified consonants based solely on the newly-learned visual cues, and their accuracy in identifying consonants presented in noise was improved when the visual cues were present.
Additionally, participants' identification of acoustically ambiguous consonants was influenced by simultaneous presentation of the trained visual stimuli. A subsequent experiment studied the development of these audiovisual effects over longer time frames. The data from both studies are relevant to current theoretical issues and provide a basis for investigating the development of cross-modal integration through learning.
-------------- top | Research --------------
28th Midwinter Meeting of the Association for Research in Otolaryngology
New Orleans, LA
February 19-24, 2005
Incidental Complex Auditory Category Learning in a Computer Game Task
Travis Wade, Lori Holt
Poster available in PDF format
This study examined the perceptual learning of spectrally complex non-speech auditory categories in a novel incidental, interactive multi-modal training paradigm. Participants played a computer game in which they were required to navigate through a three-dimensional space while responding appropriately to animated characters encountered along the way. Each character's appearance in the game correlated with a sound category distribution, a randomly selected member of which was repeated each time the character was encountered. As the game progressed, the speed and difficulty of required tasks increased and characters became gradually more difficult to identify by visual patterns alone. As a result, quick identification of approaching characters by means of sound patterns was, while never required or explicitly encouraged, of gradually increasing benefit. After a thirty minute session, participants performed a categorization task, matching sounds to characters encountered during game play. Despite not being informed of audio-visual correlations beforehand, participants showed reliable learning of these patterns at post-test. Post-test performance was shown to be related to several measures of success at the game task, and learning was also sensitive to differences in category structure analogous to patterns seen in speech categories. Category knowledge resulting from the game was shown to be quantitatively different from that gained from an explicit grouping task involving the same categories. Results are discussed with respect to the mechanisms and information sources involved
in the acquisition of complex, context-dependent
phonetic categories.
Factors Affecting Perceptual Weighting of Acoustic Cues in a Categorization Task
Lori L. Holt, Andrew J. Lotto
The perception of complex sounds, such as speech, often requires the integration of information across multiple dimensions. The present experiments investigate the perceptual effectiveness or "weighting" of acoustic dimensions in a categorization task. Human listeners categorized sounds drawn from two input distributions lying within a two-dimensional acoustic space defining the center frequency (CF) and modulation frequency (MF) of frequency-modulated sinewaves. The 2-d acoustic space was scaled such that each dimension was psychophysically matched to be equally discriminable and, in the first experiment, equally informative for accurate categorization. Despite this normalization, listeners' category responses reflected a bias for use of CF. The CF bias was moderated when training distribution overlap was increased along the CF dimension, thereby decreasing the informativeness of CF for the task. A reversal of weighting (MF over CF) was obtained when distribution variance was increased along the CF dimension. These results demonstrate that even when equally informative and equally discriminable, acoustic cues are not necessarily equivalently weighted in perception; listeners exhibit biases when integrating multiple acoustic dimensions. Drastic changes in cue weighting strategies can be effected by changes in input distribution parameters. Moreover, a final experiment demonstrates that listeners can be encouraged to re-weight acoustic dimensions by mere exposure to acoustic tokens varying along the less preferred acoustic dimension. These methods provide potential insights into acquisition of speech sound categories, particularly second language categories for which cue weighting is a critical issue.
-------------- top | Research --------------
2004
45th Annual Meeting of the Psychonomic Society
Minneapolis, MN
November 18-21, 2004
Learning to integrate auditory and visual information in speech perception
Joseph D. Stephens & Lori L. Holt
Poster available in PDF format
Integration of auditory and visual cues greatly affects speech perception.
Current theories of speech perception make different assumptions about the
perceptual representations underlying cross-modal integration. A method was
developed for testing these theoretical assumptions by training
participants on novel visual speech cues that can be controlled and
manipulated in ways that normal visual speech cues (i.e., speakers' faces)
cannot. Participants played an hour-long video game on each of five
consecutive days. During the game they learned to identify consonants based
on the movements of an animated robot, whose features bore no resemblance
to speech articulators. Subsequent to training, participants accurately
identified consonants based solely on the newly-learned visual cues.
Additionally, participants' identification of acoustically-presented
consonants was influenced by simultaneous presentation of the trained
visual stimuli. The data are relevant to current theoretical issues and
provide a basis for investigating the development of cross-modal
integration through learning.
-------------- top | Research --------------
148th Meeting of the Acoustical Society
of America
San Diego, CA
November 15-19, 2004
Contrastive backward effects of nonspeech tones on speech perception
Travis Wade & Lori L. Holt
Poster available in PDF format
Nonspeech stimuli influence phonetic categorization, but effects observed so far have been limited to precursors' influence on perception of following sounds. However, both preceding and following speech affect phonetic categorization. This asymmetry in nonspeech and speech effects raises questions about whether general auditory processes play a role in context-dependent speech perception. Here, experiments test whether the asymmetry stems from methodological issues rather than genuine mechanistic limitations. To determine whether backward effects of nonspeech on speech may be achieved when listeners are sufficiently encouraged to incorporate later-occurring acoustic events, a series of experiments examined perception of CVC words with [da]--[ga] series onsets followed by embedded tones and one of two possible final consonants. When the final consonant was required for word identification, subjects showed clear contrastive effects; more [d]-initial words were heard with higher-frequency tones approximating a [g] third formant location, and vice versa. More limited effects were observed when subjects identified only the initial consonant and when no final consonant was present. Results are discussed with respect to a contrastive mechanism not speech specific but operating at a relatively high level, taking into account spectral patterns occurring over extended periods before and after a target event. [Work supported by NIH.]
Auditory categorization: Cue weighting and dimension bias
Lori L. Holt & Andrew J. Lotto
Slides available in PDF format
The ability to integrate and weight information across dimensions is central to perception. The present experiments investigate this issue by training participants to categorize sounds drawn from two input distributions in a two-dimensional acoustic space defined by frequency-modulated sine waves center frequency (CF) and modulation frequency (MF). These dimensions were psychophysically matched to be equally discriminable and, in the first experiment, were equally informative for accurate categorization. Nevertheless, listeners' category responses reflected a bias for use of CF. This bias was moderated when the informativeness of CF was decreased by shifting distributions to create more overlap in CF. A reversal of weighting (MF over CF) was obtained when distribution variance was increased for CF. These results demonstrate that even when equally informative and discriminable, acoustic cues are not necessarily equally weighted in categorization; listeners exhibit biases when integrating multiple acoustic dimensions. Moreover, drastic changes in weighting strategies can be affected by changes in input distribution parameters. This methodology provides potential insights into acquisition of speech sound categories, particularly second language categories. One implication is that ineffective cue weighting strategies for phonetic categories may be alleviated by adding variance to noninformative dimensions in training stimuli. [Work supported by NIH.]
-------------- top | Research --------------
26th Annual Meeting of the Cognitive Science Society
Chicago, IL
August 2004
Attentional modulation of lexical effects in an interactive model of speech perception
Mirman, D., McClelland, J. M., & Holt, L. L.
Abstract available in PDF format
Poster available in PDF format
-------------- top | Research --------------
From Sound to Sense: Fifty+ Years of Discoveries in Speech Communication
Massachusetts Institute of Technology
Cambridge, MA
June 2004
Non-linguistic sentence-length precursors affect speech perception: Implications for speaker and rate normalization
Holt, L. L. & Wade, T.
Proceedings paper available in PDF format
Poster available in PDF format
Speech contexts can influence phonetic perception considerably, even across extended temporal windows. For example, manipulating spectral or temporal characteristics of precursor sentences leads to dramatic changes in categorization of subsequent vowels and consonants (e.g., Ladefoged & Broadbent, 1957; Summerfield, 1981). These findings often have been discussed in terms of speaker and rate normalization. The present study aimed to discover precisely which types of information in the speech signal subserve such shifts in speech categorization. A series of experiments examined the influence of sentence-length non-speech precursors—series of brief pure tones—on the perception of speech segments with which they shared critical spectral and temporal properties. Across multiple experimental manipulations, the non-speech precursors affected the perceived place (alveolar, velar) and manner (stop, glide) of articulation of synthesized English consonants. Effects were observed even when non-speech precursor series were temporally-nonadjacent to the speech categorization targets and even when multiple interrupting acoustic events separated precursor and target. Both category boundary shifts and changes in graded internal category structure were observed. These results indicate that the auditory system is sensitive to both spectral and temporal information conveyed by non-linguistic sounds across sentence-length temporal windows. Moreover, this sensitivity influences speech categorization, highlighting that general auditory processing may play a role in the speech categorization shifts described as rate and speaker normalization.
-------------- top | Research --------------
147th Meeting of the Acoustical Society
of America
New York, NY
May 2004
Perception of correlations between acoustic cues in category tuning and speaker adaptation
Lori L. Holt & Travis Wade
Poster available in PDF format
In English and many other languages, fundamental frequency (f0) varies with voicing such that voiced consonants are produced with lower f0s than their voiceless counterparts. This regularity robustly influences perception, such that sounds synthesized or spoken with a low f0 are more often perceived as voiced than are sounds with a higher f0. This series of studies exploited these observations to investigate category tuning as a function of incidental exposure to correlations among speech cues and adaptation to speaker idiosyncrasies or accent. Manipulation of f0 across sets of natural speech utterances produced stimulus sets varying in their inherent f0/voicing relationship. Listeners were exposed to these different f0/voicing patterns via spoken word and non-word items in a lexical decision task, and their resulting categorization of ambiguous consonants varying in f0 and voice onset time (VOT) was measured. The results suggest listeners adapt quickly to speaker-specific cues but also remain influenced by more global, naturally-occurring covariance patterns of f0 and voicing in English. This pattern contrasts somewhat with studies where idiosyncrasy is represented instead by manipulation of primary, first-order cues to speech sounds, in which listeners are seen to adapt more straightforwardly to the cues they are presented.
Perceptual effects of preceding non-speech rate information on temporal properties of speech categories
Travis Wade & Lori L. Holt
Poster available in PDF format
The rate of context speech presentation can influence speech perception. This study investigated the bounds of rate-dependent speech categorization, observing influences of non-speech precursor rate on speech perception. Two experiment sets tested effects of pure-tone presentation rate on perception of following speech continua involving duration-varying formant transitions that shared critical temporal and spectral characteristics with the tones. Results showed small but consistent shifts in the stop-continuant boundary distinguishing /ba/ and /wa/ syllables based on the rate of precursor tones, across differences in amplitude of tones and despite variability in their duration. Additionally, the shift was shown to involve the entire graded structure of the [w] category and was not limited to an ambiguous boundary region, affecting goodness judgments on both sides of an estimated best exemplar range. These results are problematic for accounts of rate-dependent processing that explicitly reference speech categories or articulatory events and are consistent with a contrast account.
-------------- top | Research --------------
2003
146th Meeting of the Acoustical Society
of America
Austin, TX
November, 2003
What are the statistics in statistical learning?
Lori L. Holt & Andrew J. Lotto
Lay Person Press Article
The idea that speech perception is shaped by the statistical structure of the input is gaining wide enthusiasm and growing empirical support. Nonetheless, statistics and statistical learning are broad terms with many possible interpretations and, perhaps, many potential underlying mechanisms. In order to define the role of statistics in speech perception mechanistically, we will need to more precisely define the statistics of statistical learning and examine similarities and differences across subgroups. In this talk, we examine learning of four types of information: (1) acoustic variance that is defining for contrastive categories, (2) the correlation between acoustic attributes or linguistic features, (3) the probability or frequency of events or a series of events, (4) the shape of input distributions. We present representative data from online speech perception and speech development and discuss inter-relationships among the subgroups. [Work supported by NSF, NIH and the James S. McDonnell Foundation.]
-------------- top | Research --------------
James S. McDonnell Foundation
21st Century Science Initiative
Tarrytown, NY
June, 2003
The Role of Experience in Speech Perception
Lori L. Holt
Poster available in PDF format
-------------- top | Research --------------
8th Annual Carnegie Mellon Meeting of the Minds
Pittsburgh, PA
May, 2003
SIMON SOUNDS: A new paradigm for studying auditory categories
Lori L. Holt & Seth A. Liber
Poster available in PDF format
Categorization refers to a perceiver's ability to treat discriminably different stimuli equivalently. The goal of this project was to develop a new method for studying categorization in the auditory domain. Most auditory categories are learned incidentally, without an explicit teacher. We developed a new experimental protocol for incidentally teaching listeners novel auditory categories. Using the framework of the memorization game Simon, we have created adaptable hardware and software to incidentally teach subjects two to four categories. As participants play Simon, they are exposed to underlying auditory categories associated as warning sounds with Simon response buttons. This new protocol will open new arenas of research in speech and non-speech category learning.
-------------- top | Research --------------
10th Annual Meeting of the Cognitive Neuroscience Society
New York, NY
March, 2003
Perceptual discontinuities and categorization: Implications for speech perception
Lori L. Holt, Andrew J. Lotto, & Randy L. Diehl
Poster available in PDF format
Behavioral experiments with infants, adults and nonhuman animals
converge with findings from neurophysiological investigations of the
VIIIth nerve, inferior colliculus and primary auditory cortex to suggest
that there is a discontinuity in auditory processing of stimulus
components differing in onset time by about 20 ms. This discontinuity
has been implicated as a basis for boundaries between speech categories
distinguished by voice onset time (VOT, e.g., /ba/ versus /pa/). Here,
we investigate how this discontinuity interacts with the learning of
novel perceptual categories. Adult listeners were trained to respond
based on a non-speech acoustic cue that mimics the temporal distinction
of VOT. One group of listeners learned categories with a boundary
coincident with the perceptual discontinuity. Another group learned
categories defined such that the perceptual discontinuity fell within a
category. Listeners in the latter group required significantly more
experience to reach criterion categorization performance. Evidence of
interactions between the perceptual discontinuity and the learned
categories extended to discrimination and generalization tests as well.
It has been hypothesized that languages make use of perceptual
discontinuities to promote perceptual distinctiveness among sounds
within a language inventory. The present data suggest that these
influences interact with category learning. As such, "learnability" may
play a predictive role in selection of language sound inventories.
Moreover, since some categories are more easily learned than others, it
may be possible to observe predictable learning effects in infant speech
perception. Finally, the data have implications for the neural
processing of speech.
-------------- top | Research --------------
2002
1st Annual Auditory Perception, Cognition, and Action Meeting
Kansas City, MO
November, 2002
Speech Perception as a Paradigmatic Case of Auditory Cognition
Andrew J. Lotto & Lori L. Holt
Talk available in PDF format
Traditionally, the perception of speech sounds has been described as a
special process that is different in kind from general auditory
cognition. This speech-is-special approach has robbed auditory
cognitive science of an important theoretical testing ground. Just as
text recognition has been essential to the development of visual
cognitive science, the study of speech perception has potential to
significantly advance auditory cognitive science. Recent evidence
suggests that much of the perception of speech can be explained by
appealing to general processes of audition and learning. We will
present data from new work on auditory category formation and
statistical learning that demonstrates the proposed symbiosis between
speech and general audition research. Results from categorization
tasks using speech and non-speech sounds and human and animal subjects
reveal patterns of responses that are consistent with computational
models of learning and recent theoretical proposals from visual
categorization.
-------------- top | Research --------------
43rd Annual Meeting of the Psychonomic Society
Kansas City, MO
November, 2002
Are context effects in speech perception modulated by visual information?
Joseph D. Stephens & Lori L. Holt
Poster available in PDF format
An important goal in speech perception research is to understand the
means by which the perception of speech sounds is influenced by
surrounding context. Fowler, Brown, & Mann (2000) reported a shift in
perception of a consonant-vowel syllable as a function of visual speech
information accompanying a preceding syllable. That finding was
interpreted as contradictory to a general auditory account of such
context effects (e.g., Lotto & Kluender, 1998). The present study
attempted to replicate that finding. Replication was only possible
using the stimulus materials of the original study, and data from a
modification of the original experiment suggest the effect may have been
caused by unintended variation in the visual tracks of the original
stimuli. Results will be described in terms of a general perceptual
account of context effects in speech perception.
-------------- top | Research --------------
14th Annual Beckman Frontiers of Science Symposium
Los Angeles, CA
November, 2002
Perceptual context effects in speech perception
Lori L. Holt
Poster available in PDF format
-------------- top | Research --------------
Acoustical Society of America Special Session
Pittsburgh, PA
2002
Models of phonetic category formation and structure
Organized by Andrew J. Lotto & Lori L. Holt
See coverage in Scientific American.
-------------- top | Research --------------
Acoustical Society of America
Pittsburgh, PA
2002
Speech perception as complex auditory categorization
Lori L. Holt
Despite a long and rich history of categorization research in cognitive
psychology, very little work has addressed the issue of complex auditory
category formation. This is especially unfortunate because the general
underlying cognitive and perceptual mechanisms that guide auditory
category formation are of great importance to understanding speech
perception. I will discuss a new methodological approach to examining
complex auditory category formation that specifically addresses issues
relevant to speech perception. This approach utilizes novel nonspeech
sound stimuli to gain full experimental control over listeners' history
of experience. As such, the course of learning is readily measurable.
Results from this methodology indicate that the structure and formation
of auditory categories are a function of the statistical input
distributions of sound that listeners hear, aspects of the operating
characteristics of the auditory system, and characteristics of the
perceptual categorization system. These results have important
implications for phonetic acquisition and speech perception.
-------------- top | Research --------------
Acoustical Society of America
Pittsburgh, PA
2002
Formation of categories for complex novel auditory stimuli
Daniel Mirman, Lori L. Holt, and James L. McClelland
Categorization of complex sounds with multiple, imperfectly valid cues
is fundamental to phonetic perception. To study the general perceptual
and cognitive processes that support complex sound categories, a novel
stimulus set was created that allows tight control of category structure
and input distributions. Stimuli were created from 300-ms noise bursts
by applying bandstop filters at varying center frequencies and
manipulating rise/fall time of stimulus onset and offset. Stimuli were
assigned to one of two categories and presented to participants in a
category identification and an AX discrimination task. Feedback was
provided during identification trials, but not during discrimination
tasks. Participants quickly learned to apply the category labels with
high accuracy. Identification reaction times followed a pattern typical
of speech stimuli with an apex in reaction time at category boundary.
These results are consistent with formation of new auditory categories.
Preliminary results indicate that discrimination performance is not
tightly coupled with development of sharp identification functions and
response-time peaks at category boundaries. Implications for mechanisms
of speech categorization and category formation will be discussed. [Work
supported by CNBC, NIH, and NSF.]
-------------- top | Research --------------
Acoustical Society of America
Pittsburgh, PA
2002
Poster available in PDF format
Effect of preceding speech on nonspeech sound perception
Joseph D. Stephens and Lori L. Holt
Data from Japanese quail suggest that the effect of preceding liquids
(/l/ or /r/) on response to subsequent stops (/g/ or /d/) arises from
general auditory processes sensitive to the spectral structure of sound
[A. J. Lotto, K. R. Kluender, and L. L. Holt, J. Acoust. Soc. Am. 102,
1134-1140 (1997)]. If spectral content is key, appropriate nonspeech
sounds should influence perception of speech sounds and vice versa. The
former effect has been demonstrated [A. J. Lotto and K. R. Kluender,
Percept. Psychophys. 60, 602-619 (1998)]. The current experiment
investigated the influence of speech on the perception of nonspeech
sounds. Nonspeech stimuli were 80-ms chirps modeled after the F2 and F3
transitions in /ga/ and /da/. F3 onset was increased in equal steps from
1800 Hz (/ga/ analog) to 2700 Hz (/da/ analog) to create a ten-member
series. During AX discrimination trials, listeners heard chirps that
were three steps apart on the series. Each chirp was preceded by a
synthesized /al/ or /ar/. Results showed context effects predicted from
differences in spectral content between the syllables and chirps. These
results are consistent with the hypothesis that spectral contrast
influences context effects in speech perception. [Work supported by ONR,
NOHR, and CNBC.]
-------------- top | Research --------------
2001
Cognitive Neuroscience Society
New York, NY
March 2001
Poster available in PDF format
Perceptual Context Effects in Speech Perception
Lori L. Holt, Carnegie Mellon Univ., Dept. Psych., Pittsburgh, PA
Andrew J. Lotto, Washington State University, Dept. Psych., Pullman, WA
Perceptual identification of speech sounds is greatly influenced by spectral characteristics of adjacent sound.
For example, listeners will label an ambiguous syllable as /da/ when it is
preceded by /ar/ and as /ga/ when it is preceded by /al/. Recent work
demonstrates that context effects in speech identification can be induced
by adjacent non-speech sounds that mimic spectral characteristics of /al/
and /ar/ (Lotto & Kluender, 1998). Non-human animals trained to “label”
syllables as /ga/ or /da/ also exhibit context-dependent response shifts
(Lotto, Kluender & Holt, 1997). These results have led to speculation
that speech context effects arise from general perceptual mechanisms. To
examine the nature of these putative mechanisms, the temporal course and
frequency range of this context effect was examined. Identification
functions for varied context conditions were collected from adult native-English-speaking
listeners. Results suggest peripheral sensory processes play little
or no role. Shifts in identification are present even when context
and target syllable are separated by 400 ms and nearly identical shifts are
obtained for dichotic presentation of target and context. Also, the
context effect is strongly related to spectral content of the context and
not to the phonemic label assigned to the context, suggesting that the effect
is due to general auditory mechanisms and not to cognitive or speech-specific
processes. These data will be discussed in terms of possible physiological
explanations.
-------------- top | Research --------------
2000
The 140th Meeting of the Acoustical Society of America
Newport Beach, CA
December 4-8, 2000
Context-dependent neural coding in the chinchilla cochlear nucleus.
Holt, L. L., Ventura, V., Rhode, W. R., Behesta, S., & Rinaldo, A.
One of the most challenging dilemmas for theories of speech perception is the lack of invariance between acoustic signal
and perception. Due to physical constraints upon articulators, there is a good
deal of context-dependency both in speech production and in the resulting acoustic
speech signal. Consequently, the acoustic pattern most closely related to a
given speech sound varies dramatically depending on context. Yet, by some means,
the perceptual system perceives these unique acoustic events as linguistically
equivalent. This phenomenon is observed experimentally as perceptual context
effects whereby adjacent speech can modulate perceived identity of a given
speech sound. Recent perceptual results suggest that this phenomenon may be
governed by general auditory mechanisms [Holt, Lotto, & Kluender,1996;
Lotto, Kluender, & Holt, 1997; Lotto& Kluender, 1998; Holt,1999]
rather than speech-specific processes. In the present study, we sought to explore
how such context-dependencies might be encoded. We recorded responses of ventral
cochlear nucleus (VCN)neurons of anesthetized chinchillas to nonspeech stimulus
targets with adjacent context stimuli that varied in spectral content. Results
demonstrate context-dependent effects of frequency and intensity.
-------------- top | Research --------------
Association for Research in Otolaryngology
St. Petersburg Beach, FL
February 20-24, 2000
Examining Context-Dependent Speech Perception in the Chinchilla Cochlear Nucleus
Lori L. Holt & William R. Rhode
One of the most challenging dilemmas for theories of speech perception is the lack of invariance between
acoustic signal and perception. Due to physical constraints upon articulators,
there is a good deal of context-dependency both in speech production and in
the resulting acoustic speech signal. Consequently, the acoustic pattern most
closely related to a given speech sound varies dramatically depending on context.
Yet, by some means, the perceptual system perceives these unique acoustic events
as linguistically equivalent. This phenomenon is observed experimentally as
perceptual context effects whereby adjacent speech can modulate perceived identity
of a given speech sound. Recent perceptual results suggest that this phenomenon
may be governed by general auditory mechanisms rather than speech-specific
processes. In the present study, we sought to explore how such context-dependencies
might be encoded. We recorded responses of ventral cochlear nucleus (VCN) neurons
of anesthetized chinchillas to speech and non-speech stimulus targets with adjacent
context stimuli that varied in spectral content. Though neural responses captured properties
of the target stimulus, there was little evidence that adjacent context critically
influenced how the target was encoded by VCN neurons. These results, along with
recent perceptual findings (Holt, 1999), suggest more central levels of the
auditory system may be responsible for encoding context-dependencies in speech
perception.
-------------- top | Research --------------
1999
The 138th Meeting of the Acoustical Society of America
Columbus, OH
November 1-5, 1999
Influence of fundamental frequency on stop-consonant voicing perception: A case of learned covariation or auditory enhancement?
Lori L. Holt, Andrew J. Lotto, & Keith R. Kluender
Poster available in PDF format
Listeners labeling members of an acoustic series modeling VOT (e.g., /ba/-/pa/) are more
likely to identify tokens with higher f0 as voiceless than they are for otherwise-identical tokens
with lower f0s. This pattern of results may arise because a high f0 enhances
perception of voicelessness, in line with auditory enhancement accounts of
speech perception. Alternatively, because f0 and VOT covary in English production,
it is possible that listeners respond in this manner due to experience with
VOT/f0 covariation in the speech signal. The present investigation was designed
to tease apart the relative contributions of these two potential mechanisms.
Japanese quail (Coturnix coturnix japonica) were trained to "label" stimuli
drawn from VOT series by pecking a key. During training, each quail experienced
one of three styles of VOT/f0 covariation. For one group of quail, VOT and
f0 covaried naturally with voiceless series members having higher f0s than
voiced members. Another group of quail heard the inverse, "unnatural" covariation.
A final group experienced stimuli for which there was no covariation between
VOT and f0. Results indicate that experience with VOT/f0 covariation is the
predominant force in shaping perception. Thus, general learning mechanisms
may account for this symmetry between perception and production.
-------------- top | Research --------------
The 138th Meeting of the Acoustical Society of America
Columbus, OH
November 1-5, 1999
Structure of phonetic categories produced by general learning mechanisms.
Andrew J. Lotto, Lori L. Holt, & Keith R. Kluender
The development of categories for complex auditory stimuli is an interest for both studies of general category
learning and language acquisition. Previous work [Kluender et al., J. Acoust.
Soc. Am. 104, 3568–3582 (1998)] demonstrated that avian species can learn
to respond differentially to sounds from two vowel categories and the structure of their responses correlate well
with human adult ratings of the vowels. In the current study, Japanese quail
(Coturnix japonica) were trained to respond to either members of an /i/ or
/E/ distribution and to refrain, in both cases, from responding to members
of an /I/ and /æ/ distribution. Birds responding to /E/ (surrounded by
/I/ and /æ/ in the vowel space) showed a prominent peak or "prototype"
in their responses. Birds responding to /i/ (extreme in the vowel space) showed
a weak or no "prototype," but showed a strong gradient with response rate
increasing for tokens further away from the other vowel distributions in the
F1–F2 space. These data demonstrate that internal structure of (phonetic)
categories is strongly influenced by relations to the competing stimulus set
(vowel space). This is particularly important for theories of categorization
or language acquisition that rely heavily on the existence of a "prototype."
-------------- top | Research --------------
|
Top |
Home |
Lori Holt |
Research |
People |
Teaching |
Contact |
Gallery |
Email Webmaster
Site designed by Seth Liber,
maintained by Anthony Kelly.
|
|