Abstracts
Lotto, A. J., Hickok, G. S., & Holt, L. L. (2009).
Reflections on mirror neurons and speech perception.
Trends in Cognitive Science
The discovery of mirror neurons, a class of neurons that respond when a monkey performs an action and also when the monkey observes others producing the same action, has promoted a renaissance for the Motor Theory (MT) of speech perception. This is because mirror neurons seem to accomplish the same kind of one to one mapping between perception and action that MT theorizes to be the basis of human speech communication. However, this seeming correspondence is superficial, and there are theoretical and empirical reasons to temper enthusiasm about the explanatory role mirror neurons might have for speech perception. In fact, rather than providing support for MT, mirror neurons are actually inconsistent with the central tenets of MT.
-------------- back to Publications --------------
Leech, R., Holt, L. L., Devlin, J. T., Dick, F. (2009)
Expertise with artificial non-speech sounds recruits speech-sensitive cortical regions.
Journal of Neuroscience
Regions of the human temporal lobe show greater activation for speech than for other sounds. These differences may reflect intrinsically specialized domain-specific adaptations for processing speech, or they may be driven by the significant expertise we have in listening to the speech signal. To test the expertise hypothesis, we used a video-game-based paradigm that tacitly trained listeners to categorize acoustically complex, artificial nonlinguistic sounds. Before and after training, we used functional MRI to measure how expertise with these sounds modulated temporal lobe activation. Participants’ ability to explicitly categorize the nonspeech sounds predicted the change in pretraining to posttraining activation in speech-sensitive regions of the left posterior superior temporal sulcus, suggesting that emergent auditory expertise may help drive this functional regionalization. Thus, seemingly domain-specific patterns of neural activation in higher cortical regions may be driven in part by experience-based restructuring of high-dimensional perceptual space.
-------------- back to Publications --------------
Huang, J. & Holt, L. L. (2009)
General perceptual contributions to lexical tone normalization.
Journal of the Acoustical Society of America.
Within tone languages that use pitch variations to contrast meaning, large variability exists in the pitches produced by different speakers. Context-dependent perception may help to resolve this perceptual challenge. However, whether speakers rely on context in contour tone perception is unclear; previous studies have produced inconsistent results. The present study aimed to provide an unambiguous test of the effect of context on contour lexical tone perception and to explore its underlying mechanisms. In three experiments, Mandarin listeners’ perception of Mandarin first and second high-level and mid-rising[1] tones was investigated with preceding speech and non-speech contexts. Results indicate that the mean fundamental frequency f0 of a preceding sentence affects perception of contour lexical tones and the effect is contrastive. Following a sentence with a higher-frequency mean f0, the following syllable is more likely to be perceived as a lower frequency lexical tone and vice versa. Moreover, non-speech precursors modeling the mean spectrum of f0 also elicit this effect, suggesting general perceptual processing rather than articulatory-based or speaker-identity-driven mechanisms.
-------------- back to Publications --------------
Holt, L. L. & Lotto, A. J. (2008)
Speech perception within an auditory cognitive science framework.
Current Directions in Psychological Science
The complexities of the acoustic speech signal introduce many significant challenges for listeners. Listening to speech begins with auditory processing, but investigation of speech perception has progressed mostly independently of study of the auditory system. Nevertheless, a growing body of evidence demonstrates the productivity of cross-fertilization. We briefly describe two examples of symbiotic study of general auditory processing and speech perception. These examples demonstrate that the operating characteristics of the auditory system constrain and influence speech perception and that our understanding of the processes involved in speech perception is enhanced by study within a more general framework. The disconnect between speech and auditory perception has stunted the development of a truly interdisciplinary auditory cognitive science, but there is an opportunity for great strides in understanding with development of an integrated field of auditory cognitive science.
-------------- back to Publications --------------
Holt, L. L. (2006)
The mean matters: Effects of statistically-defined non-speech spectral distributions on speech categorization.
Journal of the Acoustical Society of America
Adjacent speech, and even non-speech, contexts influence phonetic categorization. Four experiments investigated how preceding sequences of sine-wave tones influence phonetic categorization. This experimental paradigm provides a means of investigating the statistical regularities of acoustic events that influence online speech categorization and, reciprocally, reveals regularities of the sound environment tracked by auditory processing. The tones comprising the sequences were drawn from distributions sampling different acoustic frequencies. Results indicate that whereas the mean of the distributions predicts contrastive shifts in speech categorization, variability of the distributions has little effect. Moreover, speech categorization is influenced by the global mean of the tone sequence, without significant influence of local statistical regularities within the tone sequence. Further arguing that the effect is strongly related to the average spectrum of the sequence, notched noise spectral complements of the tone sequences produce a complementary effect on speech categorization. Lastly, these effects are modulated by the number of tones in the acoustic history and the overall duration of the sequence, but not by the density with which the distribution defining the sequence is sampled. Results are discussed in light of stimulus-specific adaptation to statistical regularity in the acoustic input and a speculative link to talker normalization is postulated.
-------------- back to Publications --------------
Mirman, D., McClelland, J. L. & Holt, L. L. (2006).
Interactive activation and Hebbian learning produce lexically guided tuning of speech perception.
Psychonomic Bulletin and Review
We describe an account of lexically guided tuning of speech perception based on interactive processing and Hebbian learning. Interactive feedback provides lexical information to pre-lexical levels and Hebbian learning uses that information to retune the mapping from acoustic input to pre-lexical representations of speech. Simulations of an extension of the TRACE model of speech perception are presented that demonstrate the efficacy of this mechanism and raise questions for future experiments. Crucially, this account addresses the role of lexical information in guiding both perception and learning within a single set of principles of information propagation.
-------------- back to Publications --------------
Mirman, D., McClelland, & Holt, L. L. (2006).
Are there interactive processes in speech perception?
Trends in Cognitive Science
Lexical information facilitates speech perception, especially when sounds are ambiguous or degraded. The interactive approach to understanding this effect posits that this facilitation is accomplished through bi-directional flow of information, allowing lexical knowledge to influence pre-lexical processes. Alternative autonomous theories posit feed-forward processing with lexical influence restricted to post-perceptual decision processes. We review evidence supporting the prediction of interactive models that lexical influences can affect pre-lexical mechanisms, triggering compensation, adaptation, and retuning of phonological processes generally taken to be pre-lexical. These and other findings point to interactive processing as a fundamental principle for perception of speech and other modalities.
-------------- back to Publications --------------
Holt, L. L. (2006).
Speech categorization in context: joint effects of non-speech and speech precursors.
Journal of the Acoustical Society of America
The extent to which context influences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with non-speech tone contexts previously shown to affect speech categorization. Listeners’ context-dependent categorization across these conditions provides evidence that speech and non-speech context stimuli jointly influence speech processing. Specifically, when the spectral characteristics of speech and non-speech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the influence of non-speech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli.
-------------- back to Publications --------------
Holt, L. L. & Lotto, A. J. (2006).
Cue weighting in auditory categorization: Implications for first and second language acquisition.
Journal of the Acoustical Society of America
The ability to integrate and weight information across dimensions is central to perception and is particularly important for speech categorization. The present experiments investigate acoustic cue weighting by training participants to categorize sounds drawn from a two-dimensional acoustic space defined by the center frequency (CF) and modulation frequency (MF) of frequency-modulated sine waves. These dimensions were psychophysically matched to be equally discriminable and, in the first experiment, were equally informative for accurate categorization. Nevertheless, listeners' category responses reflected a bias for use of CF. This bias remained even when the informativeness of CF was decreased by shifting distributions to create more overlap in CF. A reversal of weighting (MF over CF) was obtained when distribution variance was increased for CF. These results demonstrate that even when equally informative and discriminable, acoustic cues are not necessarily equally weighted in categorization; listeners exhibit biases when integrating multiple acoustic dimensions. Moreover, changes in weighting strategies can be affected by changes in input distribution parameters. This methodology provides potential insights into acquisition of speech sound categories, particularly second language categories. One implication is that ineffective cue weighting strategies for phonetic categories may be alleviated by adding variance to uninformative dimensions in training stimuli.
-------------- back to Publications --------------
Mirman, D., McClelland, J. L., & Holt, L. L. (2006).
Attentional modulation of lexical effects on speech perception: Computational and behavioral experiments.
Proceedings of the 28th Annual Conference of the Cognitive Science Society
A number of studies suggest that attention can modulate the extent to which lexical processing influences phonological processing. We propose dampening of activation as a neurophysiologically-plausible computational mechanism that can account for this type of modulation in the context of an interactive model of speech perception. Simulation results from two concrete implementations of this mechanism indicate that each of the implementations can account for attentional modulation of lexical feedback effects but that they have different consequences on the dynamics of lexical activation. We also present a behavioral test of attentional modulation of lexical effects that is not contaminated by task or stimulus effects.
-------------- back to Publications --------------
Lotto, A. J. & Holt, L. L.(2006)
Putting phonetic context effects into
context: A commentary on Fowler (2006).
Perception and Psychophysics
Based on a review of the literature and three new experiments, Fowler (2005) concludes that a "contrast account" for phonetic context effects is not tenable and is inferior to a gestural account. We believe that this conclusion is premature and that it is based on a restricted set of assumptions about a general perceptual account. Here, we briefly address the criticisms of Fowler (2005) with the intent of clarifying what a general auditory and learning approach to speech perception entails.
-------------- back to Publications --------------
Holt, L. L., Stephens, J. D., & Lotto, A. J. (2005).
A critical evaluation of visually-moderated phonetic context effects.
Perception and Psychophysics
Fowler, Brown and Mann (2000) report a visually-moderated phonetic context effect in which a video disambiguates an acoustically ambiguous precursor syllable which, in turn, influences perception of a following syllable. The present experiments explore this finding and claims that stem from it. Experiment 1 failed to replicate Fowler et al. with novel materials modeled after the original study, but Experiment 2 successfully replicated the effect using Fowler et al.'s stimulus materials. This discrepancy was investigated in Experiments 3 and 4, which demonstrate that variation in visual information concurrent with the target syllable is sufficient to account for the original results. The Fowler et al. visually-moderated phonetic context effect appears to have been a demonstration of audiovisual interaction between concurrent stimuli and not an effect whereby preceding visual information elicits changes in the perception of subsequent speech sounds.
-------------- back to Publications --------------
Wade, T. & Holt, L. L. (2005c).
Perceptual effects of preceding non-speech rate on temporal properties of speech categories.
Perception & Psychophysics.
The rate of context speech presentation can influence speech perception. This study investigated the bounds of rate-dependent speech categorization, observing influences of non-speech precursor rate on speech perception. Four experiments tested effects of pure-tone presentation rate on perception of following speech continua involving duration-varying formant transitions that shared critical temporal and spectral characteristics with the tones. Results showed small but consistent shifts in the stop-continuant boundary distinguishing /ba/ and /wa/ syllables based on the rate of precursor tones, across differences in amplitude of tones and despite variability in their duration. Additionally, the shift was shown to involve the entire graded structure of the [w] category and was not limited to an ambiguous boundary region, affecting goodness judgments on both sides of an estimated best exemplar range. These results are problematic for accounts of rate-dependent processing that explicitly reference speech categories or articulatory events and are consistent with a contrast account.
-------------- back to Publications --------------
Wade, T. & Holt, L. L. (2005b)
Incidental categorization of spectrally complex non-invariant auditory stimuli in a computer game task.
Journal of the Acoustical Society of America
This study examined perceptual learning of spectrally complex nonspeech auditory categories in an interactive multi-modal training paradigm. Participants played a computer game in which they navigated through a three-dimensional space while responding to animated characters encountered along the way. Characters' appearances in the game correlated with distinctive sound category distributions, exemplars of which repeated each time the characters were encountered. As the game progressed, the speed and difficulty of required tasks increased and characters became harder to identify visually, so quick identification of approaching characters by sound patterns was, although never required or encouraged, of gradually increasing benefit. After thirty minutes of play, participants performed a categorization task, matching sounds to characters. Despite not being informed of audio-visual correlations, participants exhibited reliable learning of these patterns at post-test. Categorization accuracy was related to several measures of game performance and category learning was sensitive to category distribution differences modeling acoustic structures of speech categories. Category knowledge resulting from the game was qualitatively different from that gained from an explicit unsupervised categorization task involving the same stimuli. Results are discussed with respect to information sources and mechanisms involved in acquiring complex, context-dependent auditory categories, including phonetic categories, and to multi-modal statistical learning.
-------------- back to Publications --------------
Wade, T. & Holt, L. L. (2005)
Effects of later-occurring non-linguistic sounds on speech categorization.
Journal of the Acoustical Society of America.
Non-speech stimuli influence phonetic categorization, but effects observed so far have been limited to precursors' influence on perception of following speech. However, both preceding and following speech affect phonetic categorization. This asymmetry raises questions about whether general auditory processes play a role in context-dependent speech perception. This study tested whether the asymmetry stems from methodological issues or genuine mechanistic limitations. To determine whether and how backward effects of non-speech context on speech may occur, one experiment examined perception of CVC words with [ga]-[da] series onsets followed by one of two possible embedded tones and one of two possible final consonants. When the tone was separated from the target onset by 100 ms, contrastive effects of tone frequency similar to those of previous studies were observed; however, when the tone was moved closer to the target segment integrative effects were observed. In another experiment, contrastive effects of a following tone were observed in both CVC words and CV non-words, although the size of the effects depended on syllable structure. Results are discussed with respect to contrastive mechanisms not speech-specific but operating at a relatively high level, taking into account spectrotemporal patterns occurring over extended periods before and after target events.
-------------- back to Publications --------------
Holt, L. L. (2005).
Temporally non-adjacent non-linguistic sounds affect speech categorization.
Psychological Science.
Speech perception is an ecologically important example of the highly context-dependent nature of perception; adjacent speech, and even non-speech, sounds influence how listeners categorize speech. Some theories emphasize linguistic or articulation-based processes in speech-elicited context effects and peripheral (cochlear) auditory perceptual interactions in non-speech-elicited context effects. The present studies challenge this division. Results of three experiments indicate that acoustic histories composed of sine-wave tones drawn from spectral distributions with different mean frequencies robustly affect speech categorization. These context effects are observed even when the acoustic context temporally adjacent to the speech stimulus is held constant and when more than a second of silence or multiple intervening sounds separate nonlinguistic acoustic context and speech targets. These experiments indicate that speech categorization is sensitive to statistical distributions of spectral information, even if the distributions are composed of non-linguistic elements. Acoustic context need neither be linguistic nor local to influence speech perception.
-------------- back to Publications --------------
Kluender, K. R., Lotto, A. J., & Holt, L. L. (2005).
Contributions of nonhuman animal models to understanding human speech perception.
In S. Greenberg and W. Ainsworth (Eds.) Listening to Speech: An Auditory Perspective, Oxford University Press: New York, NY.
Broadly speaking, there are two ways nonhuman animal models contribute to our understanding of speech perception by humans -- by analogy and by homology. The former, is the generally easier task and historical examples are more abundant. Because demonstrating strict homology requires deeper explication of underlying mechanisms, claims may be more precarious, but they carry greater explanatory potential. When studying the nonhuman organism as analogy, emphasis is most often upon how animal neurophysiological or behavioral processes have adapted to fulfill some requirement of a particular ecological niche. By contrast, study of nonhumans as homology must exceed the bounds of such niches in search of common underlying mechanisms across varying ecologies. In practice, this frequently involves undermining ecological integrity -- presenting non-ecological stimuli, intruding into the cranium, having subjects press bars or peck keys, and controlling experience in ways that are unethical with human subjects. When studying common underlying processes (homology), it also is true that nonhuman animals become a method more than an object of study. In service of revealing processes central to human speech perception, data from experiments with nonhuman subjects join a greater arsenal that also includes data from human perception studies and from computational simulations. The problem, not the organism, dictates these methods.
-------------- back to Publications --------------
Mirman, D., McClelland, J. M. & Holt, L. L. (2005).
Computational and behavioral investigations of lexically induced delays in phoneme recognition.
Journal of Memory and Language
Previous studies have failed to demonstrate lexically induced delays in phoneme recognition, casting doubt on interactive models of speech perception. Simulations of the interactive TRACE model make specific predictions about the conditions necessary for lexical feedback to delay phoneme recognition. TRACE simulations of previously tested conditions failed to produce lexically induced delay effects because the input was too unambiguous. Since between-layer connections are solely excitatory, between-layer delay effects can emerge only indirectly through facilitation of within-layer competition. If the lexically consistent phoneme partially matches the input acoustics, it will become partially active. Additional support from lexical feedback will extend the duration of competition between the acoustically present phoneme and the lexically consistent phoneme, thus delaying detection. This prediction is tested and confirmed in two behavioral experiments. These results answer one of the challenges to the interactive view of speech perception.
-------------- back to Publications --------------
Holt, L. L., Lotto, A. J., & Diehl, R. L. (2004)
Auditory discontinuities interact with categorization: Implications for speech perception.
Journal of the Acoustical Society of America.
Behavioral experiments with infants, adults and nonhuman animals converge with neurophysiological findings to suggest that there is a discontinuity in auditory processing of stimulus components differing in onset time by about 20 ms. This discontinuity has been implicated as a basis for boundaries between speech categories distinguished by voice onset time (VOT). Here, we investigate how this discontinuity interacts with the learning of novel perceptual categories. Adult listeners were trained to categorize non-speech stimuli that mimicked certain temporal properties of VOT stimuli. One group of listeners learned categories with a boundary coincident with the perceptual discontinuity. Another group learned categories defined such that the perceptual discontinuity fell within a category. Listeners in the latter group required significantly more experience to reach criterion categorization performance. Evidence of interactions between the perceptual discontinuity and the learned categories extended to generalization tests as well. It has been hypothesized that languages make use of perceptual discontinuities to promote distinctiveness among sounds within a language inventory. The present data suggest that discontinuities interact with category learning. As such, "learnability" may play a predictive role in selection of language sound inventories.
-------------- back to Publications --------------
Mirman, D., Holt, L. L. & McClelland, J. M. (2004).
Categorization and discrimination of non-speech sounds: Differences between steady-state and rapidly-changing acoustic cues.
Journal of the Acoustical Society of America.
Different patterns of performance across vowels and consonants in tests of categorization and discrimination indicate that vowels tend to be perceived more continuously, or less categorically, than consonants. The present experiments examined whether analogous differences in perception would arise in nonspeech sounds that share critical transient acoustic cues of consonants and steady-state spectral cues of simplified synthetic vowels. Listeners were trained to categorize novel nonspeech sounds varying along a continuum defined by a steady-state cue, a rapidly-changing cue, or both cues. Listeners' categorization of stimuli varying on the rapidly changing cue showed a sharp category boundary and posttraining discrimination was well predicted from the assumption of categorical perception. Listeners more accurately discriminated but less accurately categorized steady-state nonspeech stimuli. When listeners categorized stimuli defined by both rapidly-changing and steady-state cues, discrimination performance was accurate and the categorization function exhibited a sharp boundary. These data are similar to those found in experiments with dynamic vowels, which are defined by both steady-state and rapidly-changing acoustic cues. A general account for the speech and nonspeech patterns is proposed based on the supposition that the perceptual trace of rapidly-changing sounds decays faster than the trace of steady-state sounds.
-------------- back to Publications --------------
Holt, L. L. & Wade, T. (2004).
Non-linguistic sentence-length precursors affect speech perception: implications for speaker and rate normalization.
Proceedings of From Sound to Sense: Fifty+ Years of Discoveries in Speech Communication.
Speech contexts can influence phonetic perception considerably, even across extended temporal windows. For example, manipulating spectral or temporal characteristics of precursor sentences leads to dramatic changes in categorization of subsequent vowels and consonants (e.g., Ladefoged & Broadbent, 1957; Summerfield, 1981). These findings often have been discussed in terms of speaker and rate normalization. The present study aimed to uncover precisely which types of information in the speech signal subserve such shifts in speech categorization. A series of experiments examined the influence of sentence-length non-speech precursors--series of brief pure tones--on the perception of speech segments with which they shared critical spectral and temporal properties. Across multiple experimental manipulations, the non-speech precursors affected the perceived place (alveolar, velar) and manner (stop, glide) of articulation of synthesized English consonants. Effects were observed even when non-speech precursor series were temporally-nonadjacent to the speech categorization targets and even when multiple interrupting acoustic events separated precursor and target. Both category boundary shifts and changes in graded internal category structure were observed. These results indicate that the auditory system is sensitive to both spectral and temporal information conveyed by non-linguistic sounds across sentence-length temporal windows. Moreover, this sensitivity influences speech categorization, highlighting that general auditory processing may play a role in the speech categorization shifts described as rate and speaker normalization.
-------------- back to Publications --------------
Lacerda, F., Sundberg, U., Carlson, R. & Holt, L. L. (2004).
Modeling interactive language learning: Project presentation.
Proceedings of FONETIK 2004.
This paper describes a recently started inter-disciplinary research program aiming at investigating and modeling fundamental aspects of the language acquisition process. The working hypothesis assumes that general purpose perception and memory processes, common to both human and other mammalian species, along with the particular context of initial adult-infant interaction, underlie the infant's ability to progressively derive linguistic structure implicitly available in the ambient language. The project is conceived as an interdisciplinary research effort involving the areas of Phonetics, Psychology and Speech recognition. Experimental speech perception techniques will be used at Dept. of Linguistics, SU, to investigate the development of the infant's ability to derive linguistic information from situated connected speech. These experiments will be matched by behavioural tests of animal subjects, carried out at CMU, Pittsburgh, to disclose the potential significance that recurrent multi-sensory properties of the stimuli may have for spontaneous category formation. Data from infant and child vocal productions as well as infant-adult interactions will also be collected and analyzed to address the possibility of a production-perception link. Finally, the data from the infant and animal studies will be integrated and tested in mathematical models of the language acquisition process, developed at TMH, KTH.
-------------- back to Publications --------------
Diehl, R. L., Lotto, A. J. & Holt, L. L. (2004).
Speech perception.
Annual Review of Psychology, 55, 149-179.
This chapter focuses on one of the first steps in comprehending spoken
language: How do listeners extract the most fundamental linguistic elements
- consonants and vowels, or the distinctive features which compose them - from the acoustic signal? We begin by describing three major theoretical perspectives on the perception of speech. Then we review several lines of research that are relevant to distinguishing these perspectives. The research topics surveyed include categorical perception, phonetic context effects, learning of speech and related nonspeech categories, and the relation between speech perception and production. Finally, we describe challenges facing each of the major theoretical perspectives on speech perception.
-------------- back to Publications --------------
Pollak, S. D., Holt, L. L. & Wismer Fries, A. B. (2004)
Hemispheric asymmetries in children's perception of nonlinguistic human affective sounds.
Developmental Science, 7, 10-18.
In the present work, we developed a database of nonlinguistic sounds that mirror prosodic characteristics typical of language and thus carry affective information, but do not convey linguistic information. In a dichotic-listening task, we used these novel stimuli as a means of disambiguating the relative contributions of linguistic and affective processing across the hemispheres. This method was applied to both children and adults with the goal of investigating the role of developing cognitive resource capacity on affective processing. Results suggest that children's limited computational resources influence how they process affective information and rule out attentional biases as a factor in children's perceptual asymmetries for nonlinguistic affective sounds. These data further suggest that investigation of perception of nonlinguistic affective sounds is a valuable tool in assessing interhemispheric asymmetries in affective processing, especially in parceling out linguistic contributions to hemispheric differences.
-------------- back to Publications --------------
Lotto, A. J., Sullivan, S. C., & Holt, L. L. (2003).
Central locus for non-speech effects on phonetic identification.
Journal of the Acoustical Society of America, 113 (1), 53-56
Recently, Holt and Lotto [Hear. Res. 167, 156-169 (2002)] reported that
preceding speech sounds can influence phonetic identification of a
target syllable even when the context sounds are presented to the
opposite ear or when there is a long intervening silence. These results
led them to conclude that phonetic context effects are mostly due to
non-peripheral auditory interactions. In the present paper, similar
presentation manipulations were made with non-speech context sounds.
The results agree qualitatively with the results for speech contexts.
Taken together, these findings suggest that the same non-peripheral
mechanisms may be responsible for effects of both speech and non-speech
context on phonetic identification.
-------------- back to Publications --------------
Stephens, J. D. W., & Holt, L. L. (2003).
Preceding phonetic context affects nonspeech.
Journal of the Acoustical Society of America., 114, 3036-3039.
A discrimination paradigm was used to detect the influence of phonetic context on speech (experiment 1a) and nonspeech (experiment 1b) stimuli. Results of experiment 1a were consistent with the previously observed phonetic context effect of liquid consonants (/l/ and /r/) on subsequent stop consonant (/g/ and /d/) perception. Experiment 1b demonstrated a context effect of liquid consonants on subsequent nonspeech sounds that were spectrally similar to the stop consonants. The results are consistent with findings that implicate spectral contrast in phonetic context effects.
-------------- back to Publications --------------
Holt, L. L., & Lotto, A. J. (2002).
Behavioral examinations of the neural mechanisms of speech context effects.
Hearing Research, 167, 156-169
One of the central findings of speech perception is that identical acoustics signals can be perceived as different speech sounds depending on adjacent speech context. Although these phonetic context effects are ubiquitous in speech perception, their neural mechanisms remain largely unknown. The present work presents a review of recent data suggesting that spectral content of speech mediates phonetic context effects and argues that these effects are likely to be governed by general auditory processes. A descriptive framework known as spectral contrast is presented as a means of interpreting these findings. Finally, and most centrally, four behavioral experiments that begin to delineate the level of the auditory system at which interactions among stimulus components occur are described. Two of these experiments investigate the influence of the diotic versus dichotic presentation upon two phonetic context effects. Results indicate that context effects remain even when context is presented to the ear contralateral to that of the target syllable. The other two experiments examine the time course of phonetic context effects by manipulating the silent interval between context and target syllables. These studies reveal that phonetic context effects persist for hundreds of milliseconds. Results are interpreted in terms of auditory mechanism with particular attention to the putative link between auditory enhancement and phonetic context effects.
-------------- back to Publications --------------
Holt, L. L., Lotto, A. J., & Kluender, K. R. (2001).
Influence of fundamental frequency on stop-consonant voicing perception:
A case of learned covariation or auditory enhancement?
Journal of the Acoustical Society of America, 109, 764-774.
For stimuli modeling stop consonants varying in the acoustic
correlates of voice onset time (VOT), human listeners are more likely to perceive
stimuli with lower f0s as voiced consonants – a pattern of perception
that follows regularities in English speech production. The present study
examines the basis of this observation. One hypothesis is that lower f0s enhance
perception of voiced stops by virtue of perceptual interactions that arise
from the operating characteristics of the auditory system. A second hypothesis
is that this perceptual pattern develops as a result of experience with f0-voicing
covariation. In a test of these hypotheses, Japanese quail earned to respond
to stimuli drawn from a series varying in VOT through training with one of
three patterns of f0-voicing covariation. Voicing and f0 varied in the natural
pattern (shorter VOT, lower f0), in an inverse pattern (shorter VOT, higher
f0), or in a random pattern (no f0-voicing covariation). Birds trained with
stimuli that had no f0-voicing covariation exhibited no effect of f0 on response
to novel stimuli varying in VOT. For the other groups, birds’ responses
followed the experienced pattern of covariation. These results suggest f0
does not exert an obligatory influence on categorization of consonants as
[VOICE] and emphasize the learn ability of covariation among acoustic characteristics
of speech.
-------------- back to Publications --------------
Holt, L. L. & Kluender, K. R. (2000).
General auditory processes contribute to perceptual accommodation of coarticulation.
Phonetica 57,170-180. Invited contribution to a special issue of on Speech Communication and Language Development.
The ability of listeners to recover speech information,
despite dramatic articulatory and acoustic assimilation between adjacent
speech sounds, is remarkable and central to understanding perception of fluent
speech. Lindblom (1963) shared with the field some of the most compelling
early descriptions of the acoustic effects of coarticulation, and with Studdert-Kennedy
(1967), provided perceptual data that remain central to theorization about
processes for perceiving coarticulated speech. In years that followed, hypotheses
by others, which intended to explain the ability to maintain perceptual constancy
despite coarticulation, have relied in some way or another upon relatively
detailed reference to speech articulation. A number of new findings are reported
here which suggest general auditory processes, not at all specific
to speech, contribute significantly to perceptual accommodation of coarticulation.
Studies using nonspeech flanking-energy, capturing minimal spectral aspects
of speech, suggest simple processes (that can be portrayed as contrastive)
serve to “undo” assimilative effects of coarticulation. Data
from nonhuman animal subjects suggests broad generality of these processes.
At a more mechanistic explanatory level, psychoacoustic and neurophysiological
data suggestive of underlying sensory and neural mechanisms are presented.
Lindblom and Studdert-Kennedy’s early hypotheses about the potential
for such mechanisms are revived and supported.
-------------- back to Publications --------------
Lotto, A. J.,& Holt, L. L. (2000).
The illusion of the phoneme.
In Chicago Linguistic Society, Volume 35: The Panels. Chicago: Chicago Linguistic Society. 191-204.
A caveat is warranted here. While our title is provocative, our ambitions are much more prosaic. Obviously the debate on the ontological status of
the phoneme has a long and complicated history. We offer neither a summary
of this debate nor a last word on the question. We seek only to question
the role of the phoneme in the perception of speech and, in doing so, we hope
to demonstrate that the empirical evidence for the causal role of the phoneme
in perception is limited.
-------------- back to Publications --------------
Holt, L. L., Lotto, A. J., & Kluender, K. R. (2000)
Neighboring spectral content influences vowel identification.
Journal of the Acoustical Society of America, 108, 710-722
Four experiments explored the relative contributions of spectral content and phonetic labeling in effects of context on vowel perception. Two 10-step series of CVC syllables /bVb/ and /dVd/ varying acoustically in F2 midpoint frequency and varying perceptually in vowel height from /uh/ to /eh/ were synthesized. In a forced-choice identification task, listeners more often labeled vowels as /uh/ in /dVd/ context than in /bVb/ context. To examine whether spectral content predicts this effect, nonspeech–speech hybrid series were created by appending 70-ms sine-wave glides following the trajectory of CVC F2’s to 60-ms members of a steady-state vowel series varying in F2 frequency. In addition, a second hybrid series was created by appending constant-frequency sine-wave tones equivalent in frequency to CVC F2 onset/offset frequencies. Vowels flanked by frequency-modulated glides or steady-state tones modeling /dVd/ were more often labeled as /uh/ than were the same vowels surrounded by nonspeech modeling /bVb/. These results suggest that spectral content is important in understanding vowel context effects. A final experiment tested whether spectral content can modulate vowel perception when phonetic labeling remains intact. Voiceless consonants, with lower-amplitude more-diffuse spectra, were found to exert less of an influence on vowel perception than do their voiced counterparts.
The data are discussed in terms of a general perceptual account of context effects in speech perception.
-------------- back to Publications --------------
Lotto, A. J., Kluender, K. R., & Holt, L. L. (2000).
Effects of language experience on organization of vowel sounds.
In M. Broe & J. Pierrehumbert (Eds.), Laboratory Phonology V: Language Acquisition and the Lexicon .Cambridge University Press: Cambridge.
A critical reappraisal of the Perceptual Magnet Effect that offers an outline of an alternative account and directions for future research.
-------------- back to Publications --------------
Holt, L. L., Lotto, A. J., & Kluender, K. R. (1998).
Incorporating principles of general learning in theories of language acquisition.
In M. Gruber, C. Derrick Higgins, K. S. Olson & T. Wysocki (Eds.),Chicago Linguistic Society, Volume 34: The Panels. Chicago: Chicago Linguistic Society, 253-268.
Learning is fundamental to language acquisition. Infants, after all, do become speakers of the language or languages native to the community in which they are reared and experience with a native language is known to affect phonetic perception. Thus, from any theoretical standpoint, learning is pivotal to language acquisition. However,
few theories have taken the role of experience to its logical extent and examined
the degree to which language acquisition may be accounted for by principles
of general learning, without reference to innate constructs.
Principles of general learning, applied to language acquisition, generate provocative and testable predictions. For example, it has long been noted that the interior structure of phonetic categories is graded. That is, not all spoken exemplars of a speech sound
are judged to be equally effective category members. Several models, including
the Natural Language Magnet Theory, have been promoted to account for within-category
structure with specialized mechanisms for the creation of internal representations.
However, simple principles of general learning also predict that functionally-equivalent
sounds will exhibit graded structure because, with experience, organisms'
responses come to mimic the statistical distributions of input structure as
well as reinforcement contingencies. Recent experiments with avian models
have provided support for an account of graded category structure based on
general learning principles.
Principles of general learning also may be able to account for the observation that whereas young infants excel at discriminating nonnative speech contrasts, older infants (like adults) are able only to discriminate among native speech contrasts and contrasts
that are acoustically very different from speech sounds used in their native
language. Given the modest assumptions that native and nonnative speech sounds
possess different statistical probability distributions, statistical models
of learning may account for infants' changing discrimination behavior. To
determine whether a general learning theory might account for the developmental
trajectory of speech-contrast discrimination in the first year of language
acquisition, experiments to test sensitivity to distributional differences
are needed. Recent empirical investigations [Saffran, 1997] suggest
that infants are indeed sensitive to statistical distributions, at least as
they relate to the segmentation of adjacent phonemes.
Ultimately, the precise underpinnings of an experience-based phonetic acquisition will remain to be determined by nonhuman animal models, where the precise statistical input
distributions of speech sounds can be ethically controlled and manipulated.
A methodology for modeling phonetic category acquisition in a nonhuman animal
will be presented and discussed in terms of statistical learning theory.
-------------- back to Publications --------------
Kluender, K. R., Lotto, A. J., Holt, L. L., & Bloedel, S. B. (1998).
Role of experience in language-specific functional mappings for vowel sounds as inferred from human, nonhuman, and computational models.
Journal of the Acoustical Society of America, 104, 3568-3582.
Studies involving human infants and monkeys suggest that experience plays a critical role in modifying how subjects respond to vowel sounds between and within
phonemic classes. Experiments with human listeners were conducted to establish
appropriate stimulus materials. Then, eight European starlings (Sturnus vulgaris)
were trained to respond differentially to vowel tokens drawn from stylized
distributions for the English vowels /i/ and /I/, or from two distributions
of vowel sounds that were orthogonal in the F1–F2 plane. Following training,
starlings' responses generalized with facility to novel stimuli drawn from
these distributions. Responses could be predicted well on the bases of frequencies
of the first two formants and distributional characteristics of experienced
vowel sounds with a graded structure about the central "prototypical" vowel
of the training distributions. Starling responses corresponded closely to
adult human judgments of "goodness" for English vowel sounds. Finally, a simple
linear association network model trained with vowels drawn from the avian
training set provided a good account for the data. Findings suggest that little
more than sensitivity to statistical regularities of language input (probability–density distributions) together with organizational processes that serve to enhance distinctiveness may accommodate much of what is known about the functional equivalence of vowel sounds.
-------------- back to Publications --------------
Lotto, A. J., Kluender, K. R., & Holt, L. L. (1998).
Depolarizing the perceptual magnet effect.
Journal of the Acoustical Society of America, 103, 3648-3655.
In recent years there has been a great deal of interest in demonstrations of
the so-called "Perceptual-Magnet Effect" (PME). In these studies, AX-discrimination
tasks purportedly reveal that discriminability of speech sounds from a single
category varies with judged phonetic "goodness" of the sounds. However, one
possible confound is that category membership is determined by identification
of sounds in isolation, whereas, discrimination tasks include pairs of stimuli.
In the first experiment of the current study, identifications and goodness
judgments were obtained for vowels (/i/–/e/) presented in pairs. A substantial
shift in phonetic identity was evidenced with changes in the context vowel.
In a second experiment, listeners participated in an AX-discrimination task
with the vowel pairs from the first experiment. Using the contextual identification
functions from the first experiment, predictions of discriminability were
calculated using the classic tenets of Categorical Perception. Obtained discriminability
functions were well accounted for by predictions from identification. There
was no additional unexplained variance that required the proposal of "perceptual
magnets." These results suggest that PME may be nothing more than further
demonstration that general discriminability is greater for cross-category
stimulus pairs than for within-category pairs.
-------------- back to Publications --------------
Lotto, A. J., Kluender, K. R., & Holt, L. L. (1997).
Animal models of speech perception phenomena.
In K. Singer, R. Eggert, & G. Anderson (Eds.), Chicago
Linguistic Society, Volume 33, 357-367, Chicago Linguistic Society:
Chicago, IL.
Three phenomena often argued to be indicative of perceptual processes specialized for speech are: 1) the lack of invariant acoustic attributes which map directly
onto perceived phonemic identity of speech sounds; 2) the symmetry between
speech production and perception, e.g. the apparent perceptual compensation
for acoustic effects of coarticulation; 3) the ability of infants to discriminate
and categorize speech sounds in a linguistically-relevant manner. These
phenomena are often explained with reference to a putative species-specific
perceptual module which is dedicated to the perception of phonetic signals.
In recent years, the species specificity of these phenomena has been called
into question by experiments with nonhuman animals. We will present
data from two avian species which address each of the speech-perception phenomena
listed above. In the first study, Japanese quail (Coturnix japonica)
trained to respond when presented a syllable-initial [d] as opposed to syllable-initial
[b] or [g] readily generalize this learning across a broad range of following
vowel sounds. This “phonetic labeling” occurs despite a
lack of any readily apparent acoustic invariant to signal a correct response.
In a second study, Japanese quail were trained to respond differentially to
[da] and [ga]. When these CVs are presented following [al] or [ar],
there was a shift in the birds responses. This shift is similar to one
reported for human adults hearing similar speech sounds. Remarkably,
this shift in perception appears to compensate for the assimilative acoustic
effects of coarticulation. Because quail also demonstrate such a shift,
it is reasonable to question whether the symmetry between speech perception
and production may result from rather general processes of audition (spectral
contrast). Finally, we will present data from a learning study with
European starlings (Sturnus vulgaris). Starlings were trained to peck
to vowels from two different distributions of sounds (e.g. [i] vs. [I]). The
resulting structure of the birds behavior after less than 100 hours of exposure
was very similar to human adults categorizing the same stimuli. The
response structure of the birds and human adults can be well predicted from
simple models of association (Hebbian learning) and general laws of learning.
Given this demonstration of learning by birds with brains the size of almonds,
the ability of infants to respond in a manner that is relevant to their language
experience becomes less remarkable. Taken together with data from other
animal studies, these results strongly suggest that the perception of speech
sounds is accomplished by quite general perceptual processes and not by a
specialized speech module. In fact, it appears that speech, as a communication
system, takes advantage of operating characteristics of the auditory system
which are common across a number of species.
-------------- back to Publications --------------
Lotto, A. J., Kluender, K. R., & Holt, L. L. (1997).
Effect of voice quality on perceived height of English vowels.
Phonetica, 54, 76-93 .
Across a variety of languages, phonation type and vocal-tract shape systematically covary in vowel production. Breathy phonation tends to accompany vowels
produced with a raised tongue body and/or advanced tongue root. A potential
explanation for this regularity, based on an hypothesized interaction between
the acoustic effects of vocal-tract shape and phonation type, is evaluated.
It is suggested that increased spectral tilt and first-harmonic amplitude
resulting from breathy phonation interacts with the lower-frequency first
formant resulting from a raised tongue body to produce a perceptually "higher"
vowel. To test this hypothesis, breathy and modal versions of vowel
series modeled after male and female productions of English vowel pairs /i/
and /I/, /u/ and /U/, and /carrot/ and /a/ were synthesized. Results
indicate that for most cases, breathy voice quality led to more tokens being
identified as the higher vowel (i.e. /i/, /u/, /carrot/). In addition,
the effect of voice quality is greater for vowels modeled after female productions.
These results are consistent with an hypothesized perceptual explanation for
the covariation of phonation type and tongue-root advancement in West African
languages. The findings may also be relevant to gender differences in
phonation type.
-------------- back to Publications --------------
Lotto, A. J., Kluender, K. R., & Holt, L. L. (1997).
Perceptual compensation for coarticulation by Japanese quail (Coturnix coturnix japonica).
Journal of the Acoustical Society of America, 102, 1134-1140.
When members of a series of synthesized stop consonants varying in third-formant
(F3) characteristics and varying perceptually from /da/ to /ga/ are preceded
by /al/, human listeners report hearing more /ga/ syllables than when the
members of the series are preceded by /ar/. It has been suggested that this
shift in identification is the result of specialized processes that compensate
for acoustic consequences of coarticulation. To test the species-specificity
of this perceptual phenomenon, data were collected from nonhuman animals in
a syllable "labeling" task. Four Japanese quail (Coturnix coturnix japonica)
were trained to peck a key differentially to identify clear /da/ and /ga/
exemplars. After training, ambiguous members of a /da/–/ga/ series
were presented in the context of /al/ and /ar/ syllables. Pecking performance
demonstrated a shift which coincided with data from humans. These results
suggest that processes underlying "perceptual compensation for coarticulation"
are species-general. In addition, the pattern of response behavior expressed
is rather common across perceptual systems.
-------------- back to Publications --------------
|