Publications | Conference Presentations | Research Methods
Collaborators | Our Grant Support | Student Awards | Lab Documentation

Conference Presentations



2013     2012     2011     2010    2009      2008	2007    2006    2005    2004    2003    2002    2001     2000    1999    


2013

20th Annual Meeting of the Cognitive Neuroscience Society
San Francisco, California
April 13-16, 2013

Investigating the Neural Basis of Video-game-based Category Learning
Sung-Joo Lim, Julie A. Fiez, Mark E. Wheeler & Lori L. Holt Poster available in PDF format



2011

52th Annual Meeting of Psychonomic Society
Seattle, Washington
November 3-6, 2011

Acoustic similarity influences the “phonological” similarity effect
Jingyuan Huang & Lori Holt Poster available in PDF format

The phonological similarity effect (PSE) is believed to support the abstract phonological nature of short-term memory: immediate serial recall of phonologically-similar items (e.g. b, d, g, t, c) is poorer than that of dissimilar items (f, q, r, h, y; Conrad, 1964). Here, we use the PSE to investigate the existence of more graded auditory information in short-term memory. In experiment 1, perceptually-ambiguous vowels appeared in word contexts so that although acoustic information was constant, phonological information varied. Recall of items with acoustically-identical, but phonologically distinct vowels was poorer than recall of the same items with acoustically-distinct vowels. In experiment 2, six-item lists were composed of words possessing either three acoustically similar (/æ/,/e/ and /Λ/) or dissimilar (/u/, /i/ or /a/) categories. Recall was better for acoustically-dissimilar lists although each had three phonological categories. The results indicate that acoustic similarity is sufficient to elicit a “PSE” for phonologically-equivalent items (Experiment 1) and phonological similarity is not necessary for the effect (Experiment 2). The results suggest detailed acoustic information is preserved in auditory short-term memory.



-------------- top | Research --------------

161st Meeting of the Acoustical Society of America
Seattle, Washington
May 23-27, 2011

Learning acoustically complex word-like units within a video-game training paradigm.
Sung-joo Lim, Francisco Lacerda & Lori L. Holt

Poster available in PDF format

Over the course of language development, infants learn native speech categories and word boundaries from speech input. Although speech category learning and word segmentation learning occur in parallel, most investigations have focused on one, assuming somewhat mature develop of the other. To investigate the extent to which listeners can simultaneously solve the categorization and segmentation learning challenges, we created an artificial, nonlinguistic stimulus space that modeled the acoustic complexities of natural speech by recording a single talker’s multiple utterances of a set of sentences containing four keywords. There was acoustic variability across utterances, presenting a categorization challenge. The keywords were embedded in continuous speech, presenting a segmentation challenge. Sentences were spectrally rotated, rendering them wholly unintelligible, and presented within a videogame training paradigm that does not rely upon explicit feedback and yet is effective in training nonspeech and nonnative speech categorization [Wade & Holt(2005); Lim & Holt (submitted)]. With just 2 h of play, adult listeners could reliably extract word length sound categories from continuous sound streams and generalized learning to novel tokens. The amount of “sentence” variability within training did not influence learning.


Cue-weighting for speech categorization changes based on regularitiesin shortterm speech input.
Ran Liu, Howard Soh, & Lori L. Holt

Poster available in PDF format

The ability to flexibly adapt long term speech category representations to informative regularities in short term input is critical for online speech perception. The present experiment investigates how short term changes in the variability of two distinct acoustic cues affect the relative weighting of the cues for speech categorization. Native English adults distinguish the vowel categories ae and E using both spectral and duration cues. A spectral continuum from ae to E was crossed with a duration continuum between the same vowels to synthesize a 2D grid of words ranging between “set”and “sat.” Baseline categorization data were collected from trials sampling the full grid of words; for each trial, listeners selected whether they heard set or sat. Then, they received short term exposure to trials drawn only from subsets of stimuli for which one cue was held constant while the other cue exhibited the full range of variability. Results reveal that listeners shift their relative cue weights, compared to baseline, to rely more on the highly variable cue for categorization. This suggests that listeners track the variability present across multiple acoustic cues and dynamically adjust speech categorization to reflect this short term statistical regularity.


Long Term Spectral Average Spectrum Predicts Accent Normalization
Jingyuan Huang, Lori L. Holt & Andrew J. Lotto

Poster available in PDF format

Several studies have manipulated the accent of a spoken passage, observingshifts in subsequent speech perception. Such “accent normalization”may be related to remapping phonetic space or, alternatively, to tuning auditoryrepresentations via context. In the current experiments, artificial “accents”with precisely specified acoustic vowel distributions isolated the influenceof these two potential mechanisms. Participants were exposed to a spoken passagedrawn from Dr. Seuss’ The Foot Book with i shifted to I (higherF1), or U shifted to u (lowerF1),or with both manipulations and then categorized iI or Uu series.The third context had shifted phonetic categories, but little change in thecontext’s long term average spectrum (LTAS), known to influence subsequentspeech categorization. As expected, higherF1 contextsled to more lowF1 (i, u) target responseswhereas lowerF1 contexts predicted more highF1 targets (I, U) responses. However, we did notobserve shifted vowel categorization after the context with little LTAS change(i and U shifted in opposing directions). Thus, although lexical informationsuggested shifted phonetic categories in this condition, there was no accentnormalization. General auditory mechanisms sensitive to LTAS may play an importantrole in what has been thought to be accentdependent remapping of phoneticspace.



-------------- top | Research --------------

Meeting of the Minds, 2011, CMU
Pittsburgh, PA, USA
May 4, 2011

The Zen Mozart: Effects of Mindfulness Meditation and Classical Music on Visuospatial Skills
Cynthia S. Peng and Lori L. Holt, PhD

First place in the Psychology Department Poster Competition

Poster available in PDF format


Cue Weighting in Speech Categorization Based on Short-Term Cue Variability: "Set" vs. "Sat"
Howard Soh, Ran Liu, and Lori L. Holt

Runner up in the Psychology Department Poster Competition

Poster available in PDF format


-------------- top | Research --------------




2010

35th Boston University Conference on Language Development
Boston, MA, USA
November 5-7, 2010

Development of adult-like speech categorization in 3- to 5-year olds.
Dan Hufnagle and Lori L. Holt

Poster available in PDF format

Auditory context influences listeners' perception of speech, even when context consists of nonspeech tones. Explanations of speech context effects that rely on the recovery of speech-specific information make no predictions about the development of nonspeech context effects, while general auditory accounts of nonspeech context effects that rely on general characteristics of audition would predict strong effects regardless of the developmental level of the speech system. 3- and 5- year olds categorized 7 speech targets from a /da/ to /ga/ continuum to gather baseline data on which targets were ambiguous for each child. Children then categorized ambiguous speech sounds that followed nonspeech contexts that consisted of a 22-tone melody. In adults, lower-frequency contexts shift perception towards higher-energy /da/, while higher-frequency contexts shift perception to /ga/. Children exhibited strong, adult-like context effects, even though their categorization without context was not adult-like, providing support for general auditory explanations of context effects.



-------------- top | Research --------------

2009

">158th Meeting of the Acoustical Society of America
San Antonio, TX, USA
October 26-30, 2009

Autistic traits predict individual differences in speech categorization.
Dan Hufnagle, Lori L. Holt, and Erik D. Thiessen

Poster available in PDF format

Investigating individual differences in speech perception using measures of “autistic” traits in neurotypicals can gauge natural variability in speech processing [M. Stewart and M. Ota, Cognition 109, 157–162 (2008)]. Using the autism-spectrum quotient (AQ) [Baron-Cohen et al., J. Autism & Dev. Disord. 31, 5–25 (2001)], which measures autistic traits in neurotypicals, we investigated individual differences in context-dependent speech processing. Twenty-eight neurotypicals categorized a nine-step da/ga series in the context of non-speech tone precursors [following L. Holt, Psychol. Sci. 16, 305–312 (2005)] and completed the AQ. Context included three tone groups, including relatively high (shift toward ga), medium, and low (shift toward da) tones. Overall, the temporally adjacent tone grouping shifted perception more than distant context (p<0.001). Effects correlated with AQ (r<0.53). Lower AQ (fewer autistic traits) is associated with near-zero context dependence for endpoint categorization and large context-dependence for ambiguous speech-target categorization. Higher AQ is associated with intermediate influence of context across the series. Individual differences in context-dependent phonetic processing can be predicted from a personality trait scale, suggesting that phonetic processing is not immune from the influence of higher-order cognitive processes associated with these traits or that lower-level perceptual processing varies with these traits. [Work supported by NIH].


Production and perception of English /l/ and /r/ by native-speaking children.
Kaori Idemaru & Lori L. Holt

Poster available in PDF format

The English /l-r/ distinction is a difficult contrast to learn for L2 learners as well as for native speaking children. In this study, we examine the use of the second (F2) and third (F3) formants and the relative weighting of these cues in production and perception of /l/ and /r/ sounds in native-English speaking children. 3-, 4-, 5-, and 8-yr-old children produced words including /l-r/ and they were also tested for identification of words including the contrast. The results indicated that whereas young children’s productions of /l/s and /r/s were well distinguished acoustically, the children were still developing in how they integrate F3 and F2 in both production and perception. Specifically, children’s production indicated an intriguing developmental change such that F2 frequency increased in older children’s /r/ productions, thus diminishing the F2-F3 frequency difference across development. In perception, although a sharper pattern of perception of the contrast was found in older children, they were not using F2 (a secondary cue signaling the categories) in the same manner as adults even at age 8. These data are consistent with a rather long trajectory of phonetic development whereby native categories are refined and tuned well into childhood [Work supported by NSF Grant No. BCS-0746067].


Perceptual adaptation to a spoken passage’s long term spectral average.
Jingyuan Huang, Lori L. Holt, & Andrew J. Lotto

Poster available in PDF format

The current study demonstrates that listeners adapt to a passage of speech such that subsequent speech categorizations are made relative to the passage’s long-term average spectrum (LTAS). Native-English participants listened to a passage from Harry Potter for about 2 min. Next, they completed a categorization task across a series of natural speech tokens from the same talker, manipulated to vary perceptually from /ga/ to /da/. The passage was filtered to emphasize or de-emphasize regions of the LTAS without altering perceived talker identity or intelligibility. Following exposure to a passage with greater high-frequency energy, listeners more often categorized targets as /ga/ compared to target categorization following the same passage with lower high-frequency energy. Thus, listeners exhibit sensitivity to long-term spectral distributions and categorize subsequent speech relative to the LTAS of the exposure context. The spectrally contrastive directionality of the effect is consistent with earlier work demonstrating the influence of adjacent context on speech categorization, but this study extends the findings to the LTAS of a passage (across minutes) and demonstrates that context need not be adjacent to influence speech categorization. The implications for this work for talker and accent normalization will be discussed. [Work supported by NIH R01DC004674].


Sensitivity to input distributions and decision boundaries in auditory category learning.
Sung-joo Lim & Lori L. Holt

Poster available in PDF format

Previous research demonstrates the sensitivity of adults and infants to the statistical regularity of input distributions defining speech categories [D. L. Grieser and P. K. Kuhl, Dev. Psychol. 25, 577–588 (1989)] and even non- human animals exhibit such sensitivity [Kluender et al., J. Acoust. Soc. Am. 104, 3568–3580 (1998)]. Speech categories’ structure also possesses information to support the use of decision boundaries in categorization. To investigate the interaction of distribution versus decision-boundary information in auditory category learning, the current research tracked listeners learning novel non-speech categories defined by two acoustic dimensions, the center frequency and modulation frequency, via explicit training with feedback. Early in learning, listeners exhibited sensitivity to distributional regularities of categories by robustly responding to more densely sampled regions of acoustic space. However, evidence of listeners’ reliance on a decision boundary emerged once learning plateaued. The point in learning at which decision boundaries predicted categorization response was determined not only by the type of listeners’ prior experience with the sounds but also by the perceptual salience of the acoustic dimensions. [Work supported by NSF].


Investigation of the neural bases of context-dependent speech categorization.
Erika J. C. Laing, Lori L. Holt, & Anto Bagic

Poster available in PDF format

Previous research has demonstrated that simple sequences of preceding sine-wave tones affect speech categorization in a spectrally-contrastive manner [L. Holt, Psychol. Sci. 16, 305–312 (2005)]. The current research explicitly links these effects to effects commonly thought to be instances of talker normalization [P. Ladefoged and D. E. Broadbent, J. Acoust. Soc. Am. 29, 98–103 (1957)]. Synthesized sentences manipulated to sound like different talkers influence categorization of a subsequent speech target only when sentences’ long-term average spectra (LTAS) predict spectral contrast. Likewise, sequences of tones modeling these LTAS differences produce parallel context-dependent speech categorization effects. The predictiveness of LTAS, rather than perceived talker, suggests that general auditory rather than speech-specific or articulatorily-driven mechanisms may play a role in effects considered to be instances of talker normalization. The behavioral measures are paired with magnetoencephalography (MEG) to investigate the neural bases of these parallel effects. Listeners categorized a /ga/-/da/ series in the context of preceding sentences and nonspeech tone sequences varying in their LTAS while MEG signals were acquired. Analyses focus on how the speech target is encoded as a function of preceding LTAS and the status of the context as speech or nonspeech. [Work supported by NIH and NOHR.]


Examining the Neural Basis of Adaptation in Speech Perception Without Feedback
Sara Guediche, Lori L. Holt, & Julie A. Fiez

Poster available in PDF format





-------------- top | Research --------------

Society for Neuroscience 2009 Annual Meeting
Chicago, IL, USA
October 17-21, 2009

Changes in fMRI activation after naturalistic training on ringtone stimuli
Wood, T., Leech, R., Apel, T., Holt, L. L. & Dick, F.

Poster available in PDF format





-------------- top | Research --------------

Cognitive Science Society
Amsterdam, Netherlands
July 29 - August 1, 2009

Perceptual Learning of Distorted Speech with and without Feedback
Sara Guediche, Julie A. Fiez, & Lori L.Holt

Poster available in PDF format





-------------- top | Research --------------

2008

38th Annual Meeting of the Society for Neuroscience
Washington, DC, USA
November 15-19, 2008

Effects of short-term experience on auditory categorization: An event-related potential study
Liu, R. & Holt, L. L.

Poster available in PDF format

Categorization plays a major role in shaping speech perception, but the underlying mechanisms by which speech categories are learned are not well understood. Recent behavioral research has shown that listeners can acquire complex auditory categories implicitly through playing a video game in which characters' identities are linked to categories of novel non-speech sounds. This method provides a means of investigating general auditory categorization mechanisms available to speech perception. We adapted this behavioral paradigm to investigate the neural consequences of auditory category learning across five consecutive days of video-game-based training using the mismatch negativity (MMN) component of the auditory event-related potential. Before and after training, participants' MMN responses were recorded during passive listening to stimuli drawn from within a trained category or from across two distinct trained categories. Comparison of pre- and post-training MMN responses revealed a general trend of acquired similarity to within-category sounds and acquired distinctiveness to across-category sounds. Post-training MMN patterns resembled those observed for speech categories, supporting the domain-generality of processes underlying speech category learning. The results also indicate that the multi-modal video game training paradigm is effective in eliciting measurable auditory category learning at both the behavioral and neural levels.



-------------- top | Research --------------

49th Annual Meeting of the Psychonomic Society
Chicago, IL, USA
November 13-16, 2008

Learning Non-Native Speech Categories with a Video Game
Lim, Sung-joo & Holt, Lori L.

Poster available in PDF format

Various studies have attempted to train adult listeners to categorize non-native speech sounds but even with extensive response-feedback training, observed learning has been modest. The current study exploits a video game shown to be effective in training adults to categorize novel non-speech auditory categories (Wade & Holt, 2005) to train native Japanese adults to categorize English /r/ and /l/. Game characters were associated with distinct speech-sound categories and the game environment provided participants with rich visual, spatial, motor and auditory correlations with the speech categories, but no explicit feedback. Participants evidenced more native-like perception following five days of training, suggesting that experience with non-native speech categories in an immersive environment with rich cue correlations may promote speech category learning, even without explicit feedback. Learning was observed in overall pre-test versus post-test performance and also in listeners’ fine-grain use of acoustic cues in speech categorization.



-------------- top | Research --------------

156th Meeting of the Acoustical Society of America
Miami, FL, USA
November 10-14, 2008

Retuning speech sound categories: An eyetracking study
Idemaru, K. & Holt, L. L.

Poster available in PDF format

Speech categories are defined by multiple probabilistic acoustic cues. Fundamental frequency (F0) and voice onset time (VOT) are correlated in the English stop voicing contrast, for example. However, such correlations are often imperfect—especially in cases of non-native or disordered speech. The present experiments investigate listeners’ ability to adjust perceptual cue weighting in online perception in response to changes in the cue correlations experienced across time. Native-English listeners heard minimal-pair words beginning with stop consonants varying along a VOT series. The F0 of the words was gradually shifted over the course of the experiment from the canonical English correlation (higher F0 for voiceless stops) to the opposite pattern (lower F0 for voiceless stops). Categorization was assessed via explicit responses while eye gaze data were simultaneously recorded using the visual world paradigm. Both data types revealed that the influence of F0 on voicing categorization changed in response to changes in F0?VOT correlation. Some listeners use of F0 reversed such that higher F0 led to more voiced responses; other listeners discontinued use of F0 in voicing categorization. These patterns suggest that listeners are continually monitoring the input for regularity and retuning acoustic cue use in an online manner to accommodate these regularities.


Investigating the influence of context frequency on lexical tone perception.
Huang, J. & Holt, L. L.

Poster available in PDF format

Tone languages such as Mandarin use pitch variations to contrast meaning. Within tone languages, large variability exists in the pitch of tones produced by different speakers. However, previous studies of speaker normalization for contour tones have produced inconsistent results; whether speakers rely on context information in tone perception is unclear. The present study intended to provide an unambiguous test of the effect of context on contour lexical tone perception and to explore its underlying mechanisms and sources of information. In four experiments, Mandarin listeners’ perceptions of Mandarin first and second (level and rising) tones were investigated with preceding speech and nonspeech contexts. Results indicate that (1) the mean fundamental frequency (f0) of a preceding sentence affects the perception of contour lexical tones and the effect is spectrally contrastive: Following a sentence with a higher-frequency mean f0, a following word is more likely to be perceived as a low-frequency tone and vice versa; (2) nonspeech precursors also elicit this effect, suggesting general perceptual rather than articulatory-based mechanisms; (3) listeners can use information from both fundamental frequency and periodicity to normalize tone perception. [Work supported by NIH NIDCD 2 R01DC004674-04A2].



-------------- top | Research --------------

Workshop on Linguistic Variation Across the Lifespan
Columbus, OH, USA
May 2-3, 2008

6-Month Olds are More Sensitive to Variations in F1 than F2 in Vowels
Hufnagle, D., Curtin, S., Holt, L. L. & Keisling, S. L.

Infants at six months have already begun to organize vowel categories according to their native language, based on the fact that they are less able to perceive non-native vowel contrasts (Kuhl et al., 1992). However, the details of infants' native vowel organization are not well-characterized. We test infants' sensitivity to two prominent cues to vowel identity, first and second formant (F1 and F2). Ultimately we are pursuing an understanding of whether and how infants perceive variation within vowel categories such as those that occur across dialects.

Adults who were native of an English-speaking area in North America were recorded to determine the parameters of the local vowel space. We determined the center of the vowel space for each category in terms of F1 and F2 based on these measurements. We then synthesized nine /u/ and nine /I/ vowels that varied along F1 and F2: the vowel from the center of that space (centroid), four vowels that varied along F1 (with centroid F2), and four that varied along F2 (with centroid F1). Ten 6-month-old infants from the same area heard trials that contained four tokens of the /u/ or /I/ centroid vowel, followed by four instances of a variant, followed by four more centroid tokens (task adapted from Best & Jones, 1998). All infants heard two pretest trials (a non-varying trial for each phoneme) followed by all 16 variable trial types and the two non-varying trials for comparison in random orders. Looking time after the first switch was recorded using the Sequential Looking Preference procedure (Cooper & Aslin, 1990). The assumption is that infants who notice vowel changes in a given trial (and thus, discriminate among the tokens) will look longer. Other vowels were not tested due to experimental constraints stemming from the attention span of infants.

A repeated-measures ANOVA revealed a main effect of formant, with infants listening significantly longer to F1 variations than to F2 variations. The formant effect was significant for both vowels, and the identity of the vowel did not interact with the formant effect in other ways. Other effects and their interactions were not significant. These results suggest that infants are likely to be more sensitive to differences in F1. The explanation for this result could be that no vowels closely border /I/ or /u/ along F2 in this English variety, so infants should be more tolerant of this variation. On the other hand, both /I/ and /u/ have neighboring vowels along F1, making it more likely that infants would be sensitive to this dimension. Our results show that infants are quite sensitive to variation within vowel categories, even at six months of age.

These results also suggest that community variation in the F2 dimension will likely be greater than in the F1 dimension for both peripheral and non-peripheral vowels (in Labov's 1994 terminology). However, it also supports a view of sound change in which closely-realized vowels produce misunderstandings or mischaracterizations that lead to linguistic change (see Labov 1994:586-588).



-------------- top | Research --------------

15th Annual Cognitive Neuroscience Meeting
San Francisco, CA, USA
April 12-15, 2008

Changes in functional organization following naturalistic training on complex auditory categories
R. Leech, L. Holt, J. Devlin, & F. Dick

Poster available in PDF format





-------------- top | Research --------------

International Society on Infant Studies
Vancouver, BC, Canada
March 27-29

Six-month olds’ acquisition of variable and stable vowel categories
Hufnagle, D., Curtin, S., Holt, L. L., & Kiesling, S. F.

Poster available in PDF format





-------------- top | Research --------------

2nd Annual Auditory Cognitive Science Society
Tucson, AZ, USA
January 11, 2008

Cue weighting in speech perception
Lori L. Holt

On Motor Theory and Mirror Neurons
Lori L. Holt



-------------- top | Research --------------

2007

Auditory Cognitive Science Society -- Inaugural Meeting
Tuscon, AZ, USA
January 8, 2007

Speech Perception Cognition
Lori L. Holt

Talk slides available in PDF format




-------------- top | Research --------------

2006

151st Meeting of the Acoustical Society of America
Providence, Rhode Island, USA
June 5-9, 2006

A theoretical model of cochlear processing improves spectrally degraded speech perception
Evan C. Smith & Lori L. Holt

Poster available in PDF format

Smith and Lewicki, Neural Comp. 17, 19–45, 2005a; Adv. Neural. Inf. Process. Syst. 17, 1289–1296, 2005b; Nature, 439, 7079, 2006, demonstrated that mammalian hearing follows an efficient coding principle (Barlow, Sensory Communications, 217–234, 1961; Atick, Network, 3(2), 213–251, 1992; Simoncelli and Olshausen, Ann. Rev. Neurosci., 24, 1193–1216, 2001; Laughlin and Sejnowski, ``Communications in Neuronal Networks,'' Science, 301, 1870–1874 2003). Auditory neurons efficiently code for natural sounds in the environment, maximizing information rate while minimizing coding cost (Shannon, Science, 270, 303–304, 1948). Applying the same analysis to speech coding suggests that speech acoustics are optimally adapted to the mammalian auditory code (Smith and Lewicki, Neural Comp. 17, 19–45, 2005a; Adv. Neural. Inf. Process. Syst. 17, 1289–1296, 2005b; Nature, 439, 7079, 2006). The present work applies this efficient coding theory to the problem of speech perception in individuals using cochlear implants (CI), for which there exist vast individual differences in speech perception and spectral resolution (Zeng et al., Auditory Prostheses and Electric Hearing, 20, 1–14, 2004). A machine-learning method for CI filterbank design based on the efficient-coding hypothesis is presented. Further, a pair of experiments to evaluate this approach using noise-excited vocoder speech (Shannon et al., Bell Systems Technical Journal 27, 379–423, 623–656, 1995) is described. Participants' recognition of continuous speech and isolated syllables is significantly more accurate for speech filtered through the theoretically-motivated efficient-coding filterbank relative to the standard cochleotopic filterbank, particularly for speech transients. These findings offer insight in CI design and provide behavioral evidence for efficient coding in human perception.


Experience-driven effects of visual cues in speech perception
Joseph D. W. Stephens & Lori L. Holt

Poster available in PDF format

The integration of information across modalities is a key component of behavior in everyday settings. The current study examined the extent to which experience drives multimodal speech integration. Two groups of participants were trained on combinations of speech sounds with corresponding videos of an animated robot, whose movements and features bore no resemblance to speech articulators. Participants' identification of acoustically presented consonants was influenced by simultaneous presentation of learned visual stimuli in a manner that reflected the correlation structure of auditory and visual cues in training. The influence of novel non face visual cues on speech perception developed over the course of training, suggesting that experience altered the perceptual mechanisms used in combining this cross-modal information. Pairings of auditory and visual cues given to two groups of participants resulted in patterns of bimodal perception that differed in systematic ways. Perceptual integration of the newly learned visual cues with auditory speech was not optimal, reflecting a bias toward the sound cue. The findings are discussed in reference to current models of speech perception and multimodal integration.



-------------- top | Research --------------

2005

46th Meeting of the Psychonomic Society
Toronto, ON, Canada
November 10-13, 2005

Learned cross-modal integration of novel visual cues with auditory speech
J.D.W. Stephens & L.L. Holt

Poster available in PDF format

The integration of information across modalities is a key component of behavior in everyday settings. However, little is known about the extent to which experience affects mechanisms of multimodal integration. In the current study, participants were trained for more than ten sessions on audiovisual combinations of speech sounds and corresponding movements of an animated robot, whose features bore no resemblance to speech articulators. Participants' use of auditory and visual information was tested periodically throughout the experiment. During training, participants' identification of acoustically-presented consonants began to be influenced by simultaneous presentation of trained visual stimuli. The nature of this influence changed by the end of training, suggesting that further experience altered perceptual mechanisms for combining information. A subsequent experiment manipulated relations between the trained visual stimuli, such that they were more incompatible with the structure of natural visual speech. The findings are relevant to theories of speech perception and multimodal integration.



-------------- top | Research --------------

149th Meeting of the Acoustical Society of America
Vancouver, BC, Canada
May 16-20, 2005

Perception of coarticulated speech with contrastively enhanced spectrotemporal patterns
L.L. Holt & T. Wade

Poster available in PDF format

High-level contrastive mechanisms cause perception of auditory events to be influenced by spectral and temporal properties of surrounding acoustic context, and may play a role in perceptual compensation for coarticulation in human speech. However, it is unknown whether auditory contrast is incorporated optimally to compensate for different speakers, languages and situations or whether amplification of the processes involved would provide additional benefit, for example, in the perception of hypoarticulated speech, under adverse listening conditions, or in an incompletely acquired language. This study examines effects of artificial contrastive modification of spectrotemporal trajectories on the intelligibility of connected speech in noise by native and non-native listeners. Adopting methods known to improve automatic classification of speech sounds, we model contrast-providing context as an averaged estimated vocal tract function (LPC-derived log area ratio coefficient vector) over a Gaussian-weighted temporal window. Local coefficient values are adjusted from this context based on previously observed contrastive perceptual tendencies, and the intelligibility of the resulting speech is compared with that of unmodified trajectories across listener language backgrounds. Results are discussed with respect to implementation and applicability of general auditory processes.


Categorization of spectrally complex non-invariant auditory stimuli in a computer game task
T. Wade & L.L. Holt

Poster available in PDF format

This study examined perceptual learning of spectrally complex nonspeech auditory categories in an interactive multi-modal training paradigm. Participants played a computer game in which they navigated through a three-dimensional space while responding to animated characters encountered along the way. Characters appearances in the game correlated with distinctive sound category distributions, exemplars of which repeated each time the characters were encountered. As the game progressed, the speed and difficulty of required tasks increased and characters became harder to identify visually, so quick identification of approaching characters by sound patterns was, although never required or encouraged, of gradually increasing benefit. After thirty minutes of play, participants performed a categorization task, matching sounds to characters. Despite not being informed of audio-visual correlations, participants exhibited reliable learning of these patterns at post-test. Categorization accuracy was related to several measures of game performance and category learning was sensitive to category distribution differences modeling acoustic structures of speech categories. Category knowledge resulting from the game was qualitatively different from that gained from an explicit unsupervised categorization task involving the same stimuli. Results are discussed with respect to information sources and mechanisms involved in acquiring complex, context-dependent auditory categories, including phonetic categories, and to multi-modal statistical learning.


How auditory discontinuities and linguistic experience affect the perception of speech and non-speech in English- and Spanish-speaking listeners.
Jessica F. Hay, Lori L. Holt, Andrew J. Lotto & Randy L. Diehl

Poster available in PDF format

The present study was designed to investigate the effects of long-term linguistic experience on the perception of non-speech sounds in English and Spanish speakers. Research using tone-onset-time (TOT) stimuli, a type of non-speech analogue of voice-onset-time (VOT) stimuli, has suggested that there is an underlying auditory basis for the perception of stop consonants based on a threshold for detecting onset asynchronies in the vicinity of +20 ms. For English listeners, stop consonant labeling boundaries are congruent with the positive auditory discontinuity, while Spanish speakers place their VOT labeling boundaries and discrimination peaks in the vicinity of 0 ms VOT. The present study addresses the question of whether long-term linguistic experience with different VOT categories affects the perception of non-speech stimuli that are analogous in their acoustic timing characteristics. A series of synthetic VOT stimuli and TOT stimuli were created for this study. Using language appropriate labeling and ABX discrimination tasks, labeling boundaries (VOT) and discrimination peaks (VOT and TOT) are assessed for 24 monolingual English speakers and 24 monolingual Spanish speakers. The interplay between language experience and auditory biases are discussed. [Work supported by NIDCD.]



-------------- top | Research --------------

Annual Meeting of the Cognitive Neuroscience Society
New York, NY
April 9-12, 2005

Experience-driven audio-visual integration in speech perception
J. D. W. Stephens & L. L. Holt

Poster available in PDF format

Integration of auditory and visual cues greatly affects speech perception. Current theories of speech perception make different assumptions about the perceptual processes underlying cross-modal integration. A method was developed for testing these theoretical assumptions by training participants on novel visual speech cues that can be controlled and manipulated in ways that normal visual speech cues (i.e., speakers' faces) cannot. In a preliminary experiment, participants played an hour-long video game on each of five consecutive days. During the game they learned to identify consonants based on the movements of an animated robot, whose features bore no resemblance to speech articulators. Subsequent to training, participants accurately identified consonants based solely on the newly-learned visual cues, and their accuracy in identifying consonants presented in noise was improved when the visual cues were present. Additionally, participants' identification of acoustically ambiguous consonants was influenced by simultaneous presentation of the trained visual stimuli. A subsequent experiment studied the development of these audiovisual effects over longer time frames. The data from both studies are relevant to current theoretical issues and provide a basis for investigating the development of cross-modal integration through learning.



-------------- top | Research --------------

28th Midwinter Meeting of the Association for Research in Otolaryngology
New Orleans, LA
February 19-24, 2005

Incidental Complex Auditory Category Learning in a Computer Game Task
Travis Wade, Lori Holt

Poster available in PDF format

This study examined the perceptual learning of spectrally complex non-speech auditory categories in a novel incidental, interactive multi-modal training paradigm. Participants played a computer game in which they were required to navigate through a three-dimensional space while responding appropriately to animated characters encountered along the way. Each character's appearance in the game correlated with a sound category distribution, a randomly selected member of which was repeated each time the character was encountered. As the game progressed, the speed and difficulty of required tasks increased and characters became gradually more difficult to identify by visual patterns alone. As a result, quick identification of approaching characters by means of sound patterns was, while never required or explicitly encouraged, of gradually increasing benefit. After a thirty minute session, participants performed a categorization task, matching sounds to characters encountered during game play. Despite not being informed of audio-visual correlations beforehand, participants showed reliable learning of these patterns at post-test. Post-test performance was shown to be related to several measures of success at the game task, and learning was also sensitive to differences in category structure analogous to patterns seen in speech categories. Category knowledge resulting from the game was shown to be quantitatively different from that gained from an explicit grouping task involving the same categories. Results are discussed with respect to the mechanisms and information sources involved in the acquisition of complex, context-dependent phonetic categories.


Factors Affecting Perceptual Weighting of Acoustic Cues in a Categorization Task
Lori L. Holt, Andrew J. Lotto

The perception of complex sounds, such as speech, often requires the integration of information across multiple dimensions. The present experiments investigate the perceptual effectiveness or "weighting" of acoustic dimensions in a categorization task. Human listeners categorized sounds drawn from two input distributions lying within a two-dimensional acoustic space defining the center frequency (CF) and modulation frequency (MF) of frequency-modulated sinewaves. The 2-d acoustic space was scaled such that each dimension was psychophysically matched to be equally discriminable and, in the first experiment, equally informative for accurate categorization. Despite this normalization, listeners' category responses reflected a bias for use of CF. The CF bias was moderated when training distribution overlap was increased along the CF dimension, thereby decreasing the informativeness of CF for the task. A reversal of weighting (MF over CF) was obtained when distribution variance was increased along the CF dimension. These results demonstrate that even when equally informative and equally discriminable, acoustic cues are not necessarily equivalently weighted in perception; listeners exhibit biases when integrating multiple acoustic dimensions. Drastic changes in cue weighting strategies can be effected by changes in input distribution parameters. Moreover, a final experiment demonstrates that listeners can be encouraged to re-weight acoustic dimensions by mere exposure to acoustic tokens varying along the less preferred acoustic dimension. These methods provide potential insights into acquisition of speech sound categories, particularly second language categories for which cue weighting is a critical issue.



-------------- top | Research --------------

2004

45th Annual Meeting of the Psychonomic Society
Minneapolis, MN
November 18-21, 2004

Learning to integrate auditory and visual information in speech perception
Joseph D. Stephens & Lori L. Holt

Poster available in PDF format

Integration of auditory and visual cues greatly affects speech perception. Current theories of speech perception make different assumptions about the perceptual representations underlying cross-modal integration. A method was developed for testing these theoretical assumptions by training participants on novel visual speech cues that can be controlled and manipulated in ways that normal visual speech cues (i.e., speakers' faces) cannot. Participants played an hour-long video game on each of five consecutive days. During the game they learned to identify consonants based on the movements of an animated robot, whose features bore no resemblance to speech articulators. Subsequent to training, participants accurately identified consonants based solely on the newly-learned visual cues. Additionally, participants' identification of acoustically-presented consonants was influenced by simultaneous presentation of the trained visual stimuli. The data are relevant to current theoretical issues and provide a basis for investigating the development of cross-modal integration through learning.



-------------- top | Research --------------

148th Meeting of the Acoustical Society of America
San Diego, CA
November 15-19, 2004

Contrastive backward effects of nonspeech tones on speech perception
Travis Wade & Lori L. Holt

Poster available in PDF format

Nonspeech stimuli influence phonetic categorization, but effects observed so far have been limited to precursors' influence on perception of following sounds. However, both preceding and following speech affect phonetic categorization. This asymmetry in nonspeech and speech effects raises questions about whether general auditory processes play a role in context-dependent speech perception. Here, experiments test whether the asymmetry stems from methodological issues rather than genuine mechanistic limitations. To determine whether backward effects of nonspeech on speech may be achieved when listeners are sufficiently encouraged to incorporate later-occurring acoustic events, a series of experiments examined perception of CVC words with [da]--[ga] series onsets followed by embedded tones and one of two possible final consonants. When the final consonant was required for word identification, subjects showed clear contrastive effects; more [d]-initial words were heard with higher-frequency tones approximating a [g] third formant location, and vice versa. More limited effects were observed when subjects identified only the initial consonant and when no final consonant was present. Results are discussed with respect to a contrastive mechanism not speech specific but operating at a relatively high level, taking into account spectral patterns occurring over extended periods before and after a target event. [Work supported by NIH.]


Auditory categorization: Cue weighting and dimension bias
Lori L. Holt & Andrew J. Lotto

Slides available in PDF format

The ability to integrate and weight information across dimensions is central to perception. The present experiments investigate this issue by training participants to categorize sounds drawn from two input distributions in a two-dimensional acoustic space defined by frequency-modulated sine waves center frequency (CF) and modulation frequency (MF). These dimensions were psychophysically matched to be equally discriminable and, in the first experiment, were equally informative for accurate categorization. Nevertheless, listeners' category responses reflected a bias for use of CF. This bias was moderated when the informativeness of CF was decreased by shifting distributions to create more overlap in CF. A reversal of weighting (MF over CF) was obtained when distribution variance was increased for CF. These results demonstrate that even when equally informative and discriminable, acoustic cues are not necessarily equally weighted in categorization; listeners exhibit biases when integrating multiple acoustic dimensions. Moreover, drastic changes in weighting strategies can be affected by changes in input distribution parameters. This methodology provides potential insights into acquisition of speech sound categories, particularly second language categories. One implication is that ineffective cue weighting strategies for phonetic categories may be alleviated by adding variance to noninformative dimensions in training stimuli. [Work supported by NIH.]



-------------- top | Research --------------

26th Annual Meeting of the Cognitive Science Society
Chicago, IL
August 2004

Attentional modulation of lexical effects in an interactive model of speech perception
Mirman, D., McClelland, J. M., & Holt, L. L.

Abstract available in PDF format
Poster available in PDF format


-------------- top | Research --------------

From Sound to Sense: Fifty+ Years of Discoveries in Speech Communication
Massachusetts Institute of Technology
Cambridge, MA
June 2004

Non-linguistic sentence-length precursors affect speech perception: Implications for speaker and rate normalization
Holt, L. L. & Wade, T.

Proceedings paper available in PDF format
Poster available in PDF format

Speech contexts can influence phonetic perception considerably, even across extended temporal windows. For example, manipulating spectral or temporal characteristics of precursor sentences leads to dramatic changes in categorization of subsequent vowels and consonants (e.g., Ladefoged & Broadbent, 1957; Summerfield, 1981). These findings often have been discussed in terms of speaker and rate normalization. The present study aimed to discover precisely which types of information in the speech signal subserve such shifts in speech categorization. A series of experiments examined the influence of sentence-length non-speech precursors—series of brief pure tones—on the perception of speech segments with which they shared critical spectral and temporal properties. Across multiple experimental manipulations, the non-speech precursors affected the perceived place (alveolar, velar) and manner (stop, glide) of articulation of synthesized English consonants. Effects were observed even when non-speech precursor series were temporally-nonadjacent to the speech categorization targets and even when multiple interrupting acoustic events separated precursor and target. Both category boundary shifts and changes in graded internal category structure were observed. These results indicate that the auditory system is sensitive to both spectral and temporal information conveyed by non-linguistic sounds across sentence-length temporal windows. Moreover, this sensitivity influences speech categorization, highlighting that general auditory processing may play a role in the speech categorization shifts described as rate and speaker normalization.


-------------- top | Research --------------

147th Meeting of the Acoustical Society of America
New York, NY
May 2004

Perception of correlations between acoustic cues in category tuning and speaker adaptation
Lori L. Holt & Travis Wade

Poster available in PDF format

In English and many other languages, fundamental frequency (f0) varies with voicing such that voiced consonants are produced with lower f0s than their voiceless counterparts. This regularity robustly influences perception, such that sounds synthesized or spoken with a low f0 are more often perceived as voiced than are sounds with a higher f0. This series of studies exploited these observations to investigate category tuning as a function of incidental exposure to correlations among speech cues and adaptation to speaker idiosyncrasies or accent. Manipulation of f0 across sets of natural speech utterances produced stimulus sets varying in their inherent f0/voicing relationship. Listeners were exposed to these different f0/voicing patterns via spoken word and non-word items in a lexical decision task, and their resulting categorization of ambiguous consonants varying in f0 and voice onset time (VOT) was measured. The results suggest listeners adapt quickly to speaker-specific cues but also remain influenced by more global, naturally-occurring covariance patterns of f0 and voicing in English. This pattern contrasts somewhat with studies where idiosyncrasy is represented instead by manipulation of primary, first-order cues to speech sounds, in which listeners are seen to adapt more straightforwardly to the cues they are presented.


Perceptual effects of preceding non-speech rate information on temporal properties of speech categories
Travis Wade & Lori L. Holt

Poster available in PDF format

The rate of context speech presentation can influence speech perception. This study investigated the bounds of rate-dependent speech categorization, observing influences of non-speech precursor rate on speech perception. Two experiment sets tested effects of pure-tone presentation rate on perception of following speech continua involving duration-varying formant transitions that shared critical temporal and spectral characteristics with the tones. Results showed small but consistent shifts in the stop-continuant boundary distinguishing /ba/ and /wa/ syllables based on the rate of precursor tones, across differences in amplitude of tones and despite variability in their duration. Additionally, the shift was shown to involve the entire graded structure of the [w] category and was not limited to an ambiguous boundary region, affecting goodness judgments on both sides of an estimated best exemplar range. These results are problematic for accounts of rate-dependent processing that explicitly reference speech categories or articulatory events and are consistent with a contrast account.


-------------- top | Research --------------



2003

146th Meeting of the Acoustical Society of America
Austin, TX
November, 2003

What are the statistics in statistical learning?
Lori L. Holt & Andrew J. Lotto

Lay Person Press Article

The idea that speech perception is shaped by the statistical structure of the input is gaining wide enthusiasm and growing empirical support. Nonetheless, statistics and statistical learning are broad terms with many possible interpretations and, perhaps, many potential underlying mechanisms. In order to define the role of statistics in speech perception mechanistically, we will need to more precisely define the statistics of statistical learning and examine similarities and differences across subgroups. In this talk, we examine learning of four types of information: (1) acoustic variance that is defining for contrastive categories, (2) the correlation between acoustic attributes or linguistic features, (3) the probability or frequency of events or a series of events, (4) the shape of input distributions. We present representative data from online speech perception and speech development and discuss inter-relationships among the subgroups. [Work supported by NSF, NIH and the James S. McDonnell Foundation.]


-------------- top | Research --------------

James S. McDonnell Foundation
21st Century Science Initiative

Tarrytown, NY
June, 2003

The Role of Experience in Speech Perception
Lori L. Holt

Poster available in PDF format


-------------- top | Research --------------

8th Annual Carnegie Mellon Meeting of the Minds
Pittsburgh, PA
May, 2003

SIMON SOUNDS: A new paradigm for studying auditory categories
Lori L. Holt & Seth A. Liber

Poster available in PDF format

Categorization refers to a perceiver's ability to treat discriminably different stimuli equivalently. The goal of this project was to develop a new method for studying categorization in the auditory domain. Most auditory categories are learned incidentally, without an explicit teacher. We developed a new experimental protocol for incidentally teaching listeners novel auditory categories. Using the framework of the memorization game Simon, we have created adaptable hardware and software to incidentally teach subjects two to four categories. As participants play Simon, they are exposed to underlying auditory categories associated as warning sounds with Simon response buttons. This new protocol will open new arenas of research in speech and non-speech category learning.


-------------- top | Research --------------

10th Annual Meeting of the Cognitive Neuroscience Society
New York, NY
March, 2003

Perceptual discontinuities and categorization: Implications for speech perception
Lori L. Holt, Andrew J. Lotto, & Randy L. Diehl

Poster available in PDF format

Behavioral experiments with infants, adults and nonhuman animals converge with findings from neurophysiological investigations of the VIIIth nerve, inferior colliculus and primary auditory cortex to suggest that there is a discontinuity in auditory processing of stimulus components differing in onset time by about 20 ms. This discontinuity has been implicated as a basis for boundaries between speech categories distinguished by voice onset time (VOT, e.g., /ba/ versus /pa/). Here, we investigate how this discontinuity interacts with the learning of novel perceptual categories. Adult listeners were trained to respond based on a non-speech acoustic cue that mimics the temporal distinction of VOT. One group of listeners learned categories with a boundary coincident with the perceptual discontinuity. Another group learned categories defined such that the perceptual discontinuity fell within a category. Listeners in the latter group required significantly more experience to reach criterion categorization performance. Evidence of interactions between the perceptual discontinuity and the learned categories extended to discrimination and generalization tests as well. It has been hypothesized that languages make use of perceptual discontinuities to promote perceptual distinctiveness among sounds within a language inventory. The present data suggest that these influences interact with category learning. As such, "learnability" may play a predictive role in selection of language sound inventories. Moreover, since some categories are more easily learned than others, it may be possible to observe predictable learning effects in infant speech perception. Finally, the data have implications for the neural processing of speech.


-------------- top | Research --------------



2002

1st Annual Auditory Perception, Cognition, and Action Meeting
Kansas City, MO
November, 2002

Speech Perception as a Paradigmatic Case of Auditory Cognition
Andrew J. Lotto & Lori L. Holt

Talk available in PDF format

Traditionally, the perception of speech sounds has been described as a special process that is different in kind from general auditory cognition. This speech-is-special approach has robbed auditory cognitive science of an important theoretical testing ground. Just as text recognition has been essential to the development of visual cognitive science, the study of speech perception has potential to significantly advance auditory cognitive science. Recent evidence suggests that much of the perception of speech can be explained by appealing to general processes of audition and learning. We will present data from new work on auditory category formation and statistical learning that demonstrates the proposed symbiosis between speech and general audition research. Results from categorization tasks using speech and non-speech sounds and human and animal subjects reveal patterns of responses that are consistent with computational models of learning and recent theoretical proposals from visual categorization.


-------------- top | Research --------------

43rd Annual Meeting of the Psychonomic Society
Kansas City, MO
November, 2002

Are context effects in speech perception modulated by visual information?
Joseph D. Stephens & Lori L. Holt

Poster available in PDF format

An important goal in speech perception research is to understand the means by which the perception of speech sounds is influenced by surrounding context. Fowler, Brown, & Mann (2000) reported a shift in perception of a consonant-vowel syllable as a function of visual speech information accompanying a preceding syllable. That finding was interpreted as contradictory to a general auditory account of such context effects (e.g., Lotto & Kluender, 1998). The present study attempted to replicate that finding. Replication was only possible using the stimulus materials of the original study, and data from a modification of the original experiment suggest the effect may have been caused by unintended variation in the visual tracks of the original stimuli. Results will be described in terms of a general perceptual account of context effects in speech perception.


-------------- top | Research --------------

14th Annual Beckman Frontiers of Science Symposium
Los Angeles, CA
November, 2002

Perceptual context effects in speech perception
Lori L. Holt

Poster available in PDF format


-------------- top | Research --------------

Acoustical Society of America Special Session
Pittsburgh, PA
2002

Models of phonetic category formation and structure
Organized by Andrew J. Lotto & Lori L. Holt

See coverage in Scientific American.


-------------- top | Research --------------

Acoustical Society of America
Pittsburgh, PA
2002

Speech perception as complex auditory categorization
Lori L. Holt

Despite a long and rich history of categorization research in cognitive psychology, very little work has addressed the issue of complex auditory category formation. This is especially unfortunate because the general underlying cognitive and perceptual mechanisms that guide auditory category formation are of great importance to understanding speech perception. I will discuss a new methodological approach to examining complex auditory category formation that specifically addresses issues relevant to speech perception. This approach utilizes novel nonspeech sound stimuli to gain full experimental control over listeners' history of experience. As such, the course of learning is readily measurable. Results from this methodology indicate that the structure and formation of auditory categories are a function of the statistical input distributions of sound that listeners hear, aspects of the operating characteristics of the auditory system, and characteristics of the perceptual categorization system. These results have important implications for phonetic acquisition and speech perception.


-------------- top | Research --------------

Acoustical Society of America
Pittsburgh, PA
2002

Formation of categories for complex novel auditory stimuli
Daniel Mirman, Lori L. Holt, and James L. McClelland

Categorization of complex sounds with multiple, imperfectly valid cues is fundamental to phonetic perception. To study the general perceptual and cognitive processes that support complex sound categories, a novel stimulus set was created that allows tight control of category structure and input distributions. Stimuli were created from 300-ms noise bursts by applying bandstop filters at varying center frequencies and manipulating rise/fall time of stimulus onset and offset. Stimuli were assigned to one of two categories and presented to participants in a category identification and an AX discrimination task. Feedback was provided during identification trials, but not during discrimination tasks. Participants quickly learned to apply the category labels with high accuracy. Identification reaction times followed a pattern typical of speech stimuli with an apex in reaction time at category boundary. These results are consistent with formation of new auditory categories. Preliminary results indicate that discrimination performance is not tightly coupled with development of sharp identification functions and response-time peaks at category boundaries. Implications for mechanisms of speech categorization and category formation will be discussed. [Work supported by CNBC, NIH, and NSF.]


-------------- top | Research --------------

Acoustical Society of America
Pittsburgh, PA
2002

Poster available in PDF format

Effect of preceding speech on nonspeech sound perception
Joseph D. Stephens and Lori L. Holt

Data from Japanese quail suggest that the effect of preceding liquids (/l/ or /r/) on response to subsequent stops (/g/ or /d/) arises from general auditory processes sensitive to the spectral structure of sound [A. J. Lotto, K. R. Kluender, and L. L. Holt, J. Acoust. Soc. Am. 102, 1134-1140 (1997)]. If spectral content is key, appropriate nonspeech sounds should influence perception of speech sounds and vice versa. The former effect has been demonstrated [A. J. Lotto and K. R. Kluender, Percept. Psychophys. 60, 602-619 (1998)]. The current experiment investigated the influence of speech on the perception of nonspeech sounds. Nonspeech stimuli were 80-ms chirps modeled after the F2 and F3 transitions in /ga/ and /da/. F3 onset was increased in equal steps from 1800 Hz (/ga/ analog) to 2700 Hz (/da/ analog) to create a ten-member series. During AX discrimination trials, listeners heard chirps that were three steps apart on the series. Each chirp was preceded by a synthesized /al/ or /ar/. Results showed context effects predicted from differences in spectral content between the syllables and chirps. These results are consistent with the hypothesis that spectral contrast influences context effects in speech perception. [Work supported by ONR, NOHR, and CNBC.]


-------------- top | Research --------------



2001

Cognitive Neuroscience Society
New York, NY
March 2001

Poster available in PDF format

Perceptual Context Effects in Speech Perception
Lori L. Holt, Carnegie Mellon Univ., Dept. Psych., Pittsburgh, PA
Andrew J. Lotto, Washington State University, Dept. Psych., Pullman, WA

Perceptual identification of speech sounds is greatly influenced by spectral characteristics of adjacent sound.  For example, listeners will label an ambiguous syllable as /da/ when it is preceded by /ar/ and as /ga/ when it is preceded by /al/.  Recent work demonstrates that context effects in speech identification can be induced by adjacent non-speech sounds that mimic spectral characteristics of /al/ and /ar/ (Lotto & Kluender, 1998). Non-human animals trained to “label” syllables as /ga/ or /da/ also exhibit context-dependent response shifts (Lotto, Kluender & Holt, 1997).  These results have led to speculation that speech context effects arise from general perceptual mechanisms. To examine the nature of these putative mechanisms, the temporal course and frequency range of this context effect was examined.  Identification functions for varied context conditions were collected from adult native-English-speaking listeners.  Results suggest peripheral sensory processes play little or no role.  Shifts in identification are present even when context and target syllable are separated by 400 ms and nearly identical shifts are obtained for dichotic presentation of target and context.  Also, the context effect is strongly related to spectral content of the context and not to the phonemic label assigned to the context, suggesting that the effect is due to general auditory mechanisms and not to cognitive or speech-specific processes.  These data will be discussed in terms of possible physiological explanations.


-------------- top | Research --------------



2000

The 140th Meeting of the Acoustical Society of America
Newport Beach, CA
December 4-8, 2000

Context-dependent neural coding in the chinchilla cochlear nucleus.
Holt, L. L., Ventura, V., Rhode, W. R., Behesta, S., & Rinaldo, A.

One of the most challenging dilemmas for theories of speech perception is the lack of invariance between acoustic signal and perception. Due to physical constraints upon articulators, there is a good deal of context-dependency both in speech production and in the resulting acoustic speech signal. Consequently, the acoustic pattern most closely related to a given speech sound varies dramatically depending on context. Yet, by some means, the perceptual system perceives these unique acoustic events as linguistically equivalent. This phenomenon is observed experimentally as perceptual context effects whereby adjacent speech can modulate perceived identity of a given speech sound. Recent perceptual results suggest that this phenomenon may be governed by general auditory mechanisms [Holt, Lotto, & Kluender,1996; Lotto, Kluender, & Holt, 1997; Lotto& Kluender, 1998; Holt,1999] rather than speech-specific processes. In the present study, we sought to explore how such context-dependencies might be encoded. We recorded responses of ventral cochlear nucleus (VCN)neurons of anesthetized chinchillas to nonspeech stimulus targets with adjacent context stimuli that varied in spectral content. Results demonstrate context-dependent effects of frequency and intensity.


-------------- top | Research --------------

Association for Research in Otolaryngology
St. Petersburg Beach, FL
February 20-24, 2000

Examining Context-Dependent Speech Perception in the Chinchilla Cochlear Nucleus
Lori L. Holt & William R. Rhode

One of the most challenging dilemmas for theories of speech perception is the lack of invariance between acoustic signal and perception. Due to physical constraints upon articulators, there is a good deal of context-dependency both in speech production and in the resulting acoustic speech signal. Consequently, the acoustic pattern most closely related to a given speech sound varies dramatically depending on context. Yet, by some means, the perceptual system perceives these unique acoustic events as linguistically equivalent. This phenomenon is observed experimentally as perceptual context effects whereby adjacent speech can modulate perceived identity of a given speech sound. Recent perceptual results suggest that this phenomenon may be governed by general auditory mechanisms rather than speech-specific processes. In the present study, we sought to explore how such context-dependencies might be encoded. We recorded responses of ventral cochlear nucleus (VCN) neurons of anesthetized chinchillas to speech and non-speech stimulus targets with adjacent context stimuli that varied in spectral content. Though neural responses captured properties of the target stimulus, there was little evidence that adjacent context critically influenced how the target was encoded by VCN neurons. These results, along with recent perceptual findings (Holt, 1999), suggest more central levels of the auditory system may be responsible for encoding context-dependencies in speech perception.


-------------- top | Research --------------



1999

The 138th Meeting of the Acoustical Society of America
Columbus, OH
November 1-5, 1999

Influence of fundamental frequency on stop-consonant voicing perception: A case of learned covariation or auditory enhancement?
Lori L. Holt, Andrew J. Lotto, & Keith R. Kluender

Poster available in PDF format

Listeners labeling members of an acoustic series modeling VOT (e.g., /ba/-/pa/) are more likely to identify tokens with higher f0 as voiceless than they are for otherwise-identical tokens with lower f0s. This pattern of results may arise because a high f0 enhances perception of voicelessness, in line with auditory enhancement accounts of speech perception. Alternatively, because f0 and VOT covary in English production, it is possible that listeners respond in this manner due to experience with VOT/f0 covariation in the speech signal. The present investigation was designed to tease apart the relative contributions of these two potential mechanisms. Japanese quail (Coturnix coturnix japonica) were trained to "label" stimuli drawn from VOT series by pecking a key. During training, each quail experienced one of three styles of VOT/f0 covariation. For one group of quail, VOT and f0 covaried naturally with voiceless series members having higher f0s than voiced members. Another group of quail heard the inverse, "unnatural" covariation. A final group experienced stimuli for which there was no covariation between VOT and f0. Results indicate that experience with VOT/f0 covariation is the predominant force in shaping perception. Thus, general learning mechanisms may account for this symmetry between perception and production. 


-------------- top | Research --------------

The 138th Meeting of the Acoustical Society of America
Columbus, OH
November 1-5, 1999

Structure of phonetic categories produced by general learning mechanisms.
Andrew J. Lotto, Lori L. Holt, & Keith R. Kluender

The development of categories for complex auditory stimuli is an interest for both studies of general category learning and language acquisition. Previous work [Kluender et al., J. Acoust. Soc. Am. 104, 3568–3582 (1998)] demonstrated that avian species can learn to respond differentially to sounds from two vowel categories and the structure of their responses correlate well with human adult ratings of the vowels. In the current study, Japanese quail (Coturnix japonica) were trained to respond to either members of an /i/ or /E/ distribution and to refrain, in both cases, from responding to members of an /I/ and /æ/ distribution. Birds responding to /E/ (surrounded by /I/ and /æ/ in the vowel space) showed a prominent peak or "prototype" in their responses. Birds responding to /i/ (extreme in the vowel space) showed a weak or no "prototype," but showed a strong gradient with response rate increasing for tokens further away from the other vowel distributions in the F1–F2 space. These data demonstrate that internal structure of (phonetic) categories is strongly influenced by relations to the competing stimulus set (vowel space). This is particularly important for theories of categorization or language acquisition that rely heavily on the existence of a "prototype."


-------------- top | Research --------------


Home | Lori Holt | Research | People | Teaching | Contact | Gallery | Email Webmaster
Site designed by Seth Liber, maintained by Anthony Kelly.