Relevant Degree Programs
Graduate Student Supervision
Doctoral Student Supervision (Jan 2008 - Nov 2019)
Perceivers receive a constant influx of information about the natural world via their different senses. In recent years, speech researchers have begun to situate speech more firmly within this multisensory experience, moving progressively away from the traditional focus on audition toward a more multisensory approach. In doing so, speech researchers have discovered that, in addition to audition and vision, many somatosenses are all highly relevant modalities for experiencing and/or conveying speech. The current dissertation focuses on the integration of aerotactile somatosensation—the feeling of speech-related airflow on the skin—and whether prior experience with specific speech information modulates aerotactile influence on visual and auditory speech cues to English stops. In Chapter 2, I used a two-alternative forced choice visuo-aerotactile perception task to show that adult English perceivers can integrate aerotactile speech information from a novel visual source. In Chapter 3, I used a two-alternative forced choice audio-aerotactile perception task to demonstrate integration occurs for this population even when the auditory and aerotactile speech cues are presented in a way that does not conform with previous prior experience in the natural world. Finally, in Chapter 4 I used a looking time procedure to test prelinguistic infants on their sensitivity to speech-related airflow during auditory perception and found no evidence that infant stop perception can be influenced by airflow before infants begin babbling. Taken together, these three experiments suggest that while adult perceivers can integrate aerotactile speech information with speech information from other modalities without specific prior experience with the cues, some developmental experience may be required for this ability to emerge.
While speech planning has long been a topic of discussion in the literature, the specific content of speech plans has remained largely conjectural. The present dissertation brings to this problem a methodology using startling auditory stimulus (SAS) to examine the contents of prepared movement plans unaltered by feedback regulation. The startling auditory stimulus (SAS, > 120dB) has been found to elicit rapid release of prepared movements with high accuracy and largely unaltered EMG muscle activity patterns. Because the response latency of these SAS-triggered movements is too short to allow for feedback or correction processes, the executed movements have been used to reveal the contents of the movement plans with little or no feedback information influencing the prepared motor behaviours. In the present dissertation, the first experiment applied this methodology to CV syllable production to test whether English CV syllables can be elicited in the same manner as other limb movements. Results show that a SAS can trigger an early release of a well-formed prepared English CV syllable, including intact lip kinematics and vowel formants. The second experiment investigated whether the observed short latency and additional lip compression are speech-specific or generic to any oral movement. Results show that while prepared speech-like and non-speech movements are subject to early release by SAS, lip compression does not occur as frequently as it does in Spoken speech, suggesting that this preparatory compression may be speech-specific, likely relating to aerodynamic factors. The third experiment further tests whether lip compression that is independent of aerodynamic factors is observed in all speech-related tasks and is elicited at a short latency by SAS. Results show that comparable lip compression resulting from movement overshoot was observed for both Spoken and Mouthed speech. The fourth experiment looked into the level of suprasegmental gestures in speech planning. The results show that while both pitch contour and formants were maintained in the SAS-induced responses, pitch levels were compromised, suggesting that a prepared syllable ought minimally to include phonemic contrasts. SAS provides a useful tool for observing the contents of speech plans.
No abstract available.
There are many ways that cultures with tone languages may deal with the interaction of linguistic tone and music. Contemporary vocal music in both Mandarin and Cantonese stems from a common source, a new style of Chinese music that developed in the late nineteenth and early twentieth centuries, but the two musics realize linguistic tone differently. This thesis examines experimentally the differences in the phonetic manifestation of tone in singing in both Cantonese and Mandarin as well as examining the comprehensibility of the sung words. One set of experiments asked native speakers to sing songs containing minimal sets by tone. The second set of experiments had native speakers try to recognize the set of words extracted from the songs. Cantonese singers included a rising contour when singing words with rising tones and Cantonese listeners were attuned to this. Mandarin singers did not add in contour information and Mandarin listeners had difficulty recognizing the words out of context. The thesis also expands the discussion of singing in tone languages by examining some of the sociological and political factors which appear to have influenced the ways in which tone is expressed (or not) in these two varieties of Chinese.
This thesis tests the theory that the sensory content of inner speech is constituted by corollary discharge. Corollary discharge is a signal generated by the motor system and is a “prediction” of the sensory consequences of the motor system’s actions. Corollary discharge normally functions in the nervous system to segregate self-caused sensations from externally-caused sensations. It does this, partially, by attenuating the nervous system’s response to self-caused sensations. This thesis argues that corollary discharge has been co-opted in humans to provide the sensory content of speech imagery. The thesis further tests the claim that the sensory detail contained in speech imagery is sufficiently rich and sufficiently similar to the representations of external speech sounds that the perception of external speech sounds can be influenced by inner speech. This thesis claims that the perception of external speech is altered because corollary discharge prepares the auditory system to hear those sensory features which the corollary-discharge signal carries.These claims were tested experimentally by having participants engage in specific forms of speech imagery while categorizing external sounds. In one set of experiments, when external sound and speech imagery were in synchrony and were similar in content, the perception of the external sound was altered — the external sound came to be heard as matching the content of the speech imagery. In a second set of experiments, the presence of corollary discharge in speech imagery was tested. When a sensation matches a corollary discharge signal, the sensation tends to have an attenuated impact. This attenuation is a hallmark of corollary discharge. In this set of experiments, when participants’ speech imagery matched an external sound, the perceptual impact of the external sound was attenuated. Proper controls ensured that it was the degree of match between the speech imagery and the external sound that was responsible for this attenuation, rather than some extraneous factor.
Psychological researchers have found evidence for speech planning down to the syllable, with some evidence for planning at the level of the phoneme (Levelt, 1989) or feature (Bernhardt and Stemberger, 1998). Speech scientists who examine coarticulation argue for no speech planning (Saltzman and Munhall, 1989), or limited planning (Whalen, 1990). I provide evidence for subphonemic speech planning based on B/M ultrasound to measure tongue shape and motion, identifying four categorical variants of flap/- taps (‘T’) in North American English, [alveolar taps ([ɾ↕]), down-flaps ([ɾ↘]), up-flaps ([ɾ↖]), and postalveolar taps ([ɾ⃡])], and two broad categories of rhotic vowels (‘R’) [tongue tip-up rhotics ([ɻ̩]) and tongue tip-down rhotics ([ɹ̩])], even across repetitions of the same utterance in identical phonetic contexts.I explain the pattern of variation in terms of hypothesized constraints on rapid articulation. These include articulatory conflicts (Gick and Wilson, 2006) between segments that require an articulator to be in two places at once, and the end-state comfort effect (Rosenbaum et al., 1992), where an articulator begins a complex sequence in an awkward position in order to end comfortably.Speakers who can repeat syllables quickly are more likely to avoid articulatory conflicts during normal speech production. Speakers who repeat syllables more slowly produce ‘T’ variants involving fewer changes in motion, sometimes forcing non-rhotic vowels in the middle of ‘T’ sequences to become rhotacized in exchange for canonical vowels at the end. These results provide evidence for planning across syllable, morpheme and word boundaries.Other hypothesized constraints on speech planning such as gravity and tissue elasticity are also examined, and demonstrate a mismatch between the number of distinct articulatory actions and the number of phoneme units in a given speech sequence.The results support a theory of subphonemic speech planning that takes into account potential upcoming articulatory conflicts, a person’s motor skills, and the effects of gravity and elasticity.
Master's Student Supervision (2010 - 2018)
Integration of speech information is evident in audio-visual (McGurk & MacDonald, 1976) and audio-tactile (Gick & Derrick, 2009) combinations and an asymmetric window of multimodal integration exists which is consistent with the relative speeds of the various signals (Munhall et al., 1996; Gick et al., 2010). It is presently unclear whether integration is possible if the audio speech signal is removed. The current thesis utilizes synchronous and asynchronous visual and aero-tactile speech stimuli to investigate potential integration effects of this modality combination and explores the shape of the potential window of visual-tactile integration. Results demonstrate that the aero-tactile stimulus significantly affects categorization of speech segments so that individuals are more likely to perceive a voiceless aspirated stop when they experience a combination of visual-tactile stimuli, as opposed to experiencing a visual stimulus in isolation. A window of visual-tactile integration which reflects relative speeds of light and speech airflow is also evident. These results add to our knowledge of multimodal speech integration and support notions that speech is perceived as a holistic, modality neutral event.Children with Autism Spectrum Disorder (ASD) have exhibited differential multimodal integration behaviour (Gelder et al., 1991; Mongillo, et al., 2008; Irwin et al., 2011; Stevenson, et al., 2014) and differences in temporal acuity (Stevenson, et al., 2014) as compared to typically developing children, however it is unclear whether these differential findings are specific to this clinical population or can be considered part of a continuum of multimodal integration behaviour which includes typically developed adults. The current thesis examines individual differences in visual-tactile integration based on temporal acuity and behavioural traits associated with ASD, in a typically developed adult population. Results show that temporal acuity and behavioural traits associated with ASD, especially the trait of imagination, significantly influence the range of asynchronous stimuli over which visual-tactile integration occurs and also affect individuals’ abilities to differentiate visually similar speech stimuli. These results reveal a relationship between visual-tactile integration rates, traits associated with ASD and temporal acuity and suggest that the differential behaviour observed in child ASD populations forms part of a continuum which extends to typically developed adults.
This study investigates the ability of observers to discriminate between French and English using visual-only stimuli. This study differs from prior studies because it specifically uses inter-speech(ISP) and speech-ready tokens rather than full sentences. The main purpose of this research was to answer if observers could successfully discriminate French from English by watching video clips of speakers engaged in ISP and speech-ready positions with the audio removed. Two experiments were conducted; the first experiment focuses on native English vs. non-native English speakers and the second experiment focuses on native English vs. native French speakers which expands further on the data in the first experiment. The results support the view that observers can visually distinguish their native language even in the absence of segmental information.