Doctor of Philosophy in Linguistics (PhD)
Investigating how Cantonese-English bilinguals produce and perceive speech sounds
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
A fundamental task of linguistics is to accurately describe the sound patterns of a language. In the field of phonology, this often starts with identifying the set of contrastive sounds in the language, its phoneme inventory. If the language under investigation is a tone language, then identifying the contrastive tones in the language, its tone inventory, is also needed. Historically, phonologists have identified phoneme and tone inventories through lengthy elicitation sessions in order to determine contrasting units. Yet, given the recent advances in machine learning, there may be another way. In this thesis, I argue, by way of demonstration, that machine learning has become a valuable tool for field and theoretical linguists in the description of language and in the development of linguistic theory. Specifically, I present empirical support, using machine learning methods, for the theory of Emergent Phonology, which holds that phonology emerges as the "consequence of accumulated phonetic experience'' (Lindblom, 1999, p. 195). This support comes in the form of hypothesized tone inventories (part of one's phonology) that emerge, via an unsupervised learning model, from acoustic-phonetic data for a given language. Since the hypothesized inventories match fairly well with the tone inventories standardly reported in the literature, an aspect of phonology is shown to have emerged from phonetics and support for Emergent Phonology is achieved. To test the robustness of the unsupervised learning method, it is applied to four languages: Mandarin, Cantonese, Fungwa and English. Finally, since the identification of tone inventories has hitherto been under the purview of human linguists, success in this project provides a first step towards creating a grammaticus ex machina -- a linguist (grammarian) from the machine.
Psychophysical studies of perceptual learning find that perceivers only improve the accuracy of their perception on stimuli similar to what they were trained on. In contrast, speech perception studies of perceptual learning find generalization to novel contexts when words contain a modified ambiguous sound. This dissertation seeks to resolve the apparent conflict between these findings by framing the results in terms of attentional sets. Attention can be oriented towards comprehension of the speaker’s intended meaning or towards perception of a speaker’s pronunciation. Attention is proposed to affect perceptual learning as follows. When attention is oriented towards comprehension, more abstract and less context-dependent representations are updated and the perceiver shows generalized perceptual learning, as seen in the speech perception literature. When attention is oriented towards perception, more finely detailed and more context-dependent representations are updated and the perceiver shows less generalized perceptual learning, similar to what is seen in the psychophysics literature. This proposal is supported by three experiments. The first two implement a standard paradigm for perceptual learning in speech perception. In these experiments, promoting a more perception-oriented attentional set causes less generalized perceptual learning. The final experiment uses a novel paradigm where modified sounds are embedded in sentences during exposure. Perceptual learning is found only when the modified sound is embedded in words that are not predictable from the sentence. When modified sounds are in predictable words, no perceptual learning is observed. To account for this lack of perceptual learning, I hypothesize that sounds in predictable sentences are less reliable than sounds in words in isolation or unpredictable sentences. In the cases where perceptual learning is present, contexts which support comprehension-oriented attentional sets show larger perceptual learning effects than contexts promoting perception-oriented attentional sets. I argue that attentional sets are a key component to the generalization of perceptual learning to new contexts.
Speech convergence is the tendency of talkers to become more similar to someone they are listening or talking to, whether that person is a conversational partner or merely a voice heard repeating words. The cause of this phenomenon is unknown: it may be related to a general link between perception and behaviour (Dijksterhuis & Bargh, 2001), a coupling between speech production and speech perception systems (Pickering & Garrod, 2013), or an effort to minimize social distance between interlocutors (Giles et al., 1991). How convergence is facilitated or inhibited by various factors (e.g., gender, dialect, level of attention) can help pinpoint the reasons behind it. One as-yet unexamined factor in this regard is cognitive workload, i.e., the information processing load a person experiences when performing a task. The harder the task, the greater the cognitive workload. This study examines the effect of different levels of task difficulty on speech convergence within dyads collaborating on a task. Dyad members had to build identical LEGO® constructions without being able to see each other’s construction, and with each member having half of the instructions required to complete the construction. Three levels of task difficulty were created, with five dyads at each level (30 participants total). Listeners (n = 62) who heard pairs of utterances from each dyad judged convergence to be occurring in the Easy condition and to a lesser extent in the Medium condition, but not in the Hard condition. Acoustic similarity analyses of the same utterance pairs using amplitude envelopes and mel-frequency cepstral coefficients showed convergence on the part of some dyads but divergence on the part of others, with no clear effect of difficulty. Speech rate and pausing behaviour, both of which can demonstrate convergence (e.g., Pardo et al., 2013a) and be affected by workload (e.g., Lively et al., 1993; Khawaja, 2010), also showed both convergence and divergence, with difficulty possibly playing a role. The results suggest that difficulty affects speech convergence, but that it may do so differently for different talkers. Factors such as whether talkers are giving or receiving instructions also seem to interact with difficulty in affecting convergence.
Psycholinguistic studies on bilingualism generally investigate how linguistic information is shared between a listener's first language (L1) and second language (L2) at the conceptual level and in the lexicon. At the same time speech perception studies examine how social information affects language processing and representation. This dissertation brings these two lines of research together and demonstrates that the L1 and L2 are connected through a social category activation link, in addition to previously proposed conceptual and lexical links. In particular, I show that the activation of ethnicity operates under a shared system across the L1 and L2 during both immediate speech processing and long-term abstract representations. This claim is supported by sensitivity and reaction time results from two priming experiments. In a novel cross-language / cross-dialect paradigm, English (L1) - Maori (L2) bilingual New Zealanders participated in a short-term and a long-term auditory lexical decision task (72 and 45 subjects respectively), where critical prime and target pairs were made up of English-to-Maori and Maori-to-English translation equivalents. Half of the English target words were pronounced by standard New Zealand Pakeha English speakers and half by Maori English speakers, thus creating nine test conditions: four bilingual conditions, four English-only conditions, and a within-Maori repetition priming condition. Each critical English word contained one of four sociophonetic variables: theta, final /z/, and the GOOSE or GOAT vowels. The results reveal a stronger connection between Maori and Maori English representations than between Maori and Pakeha English representations both in short-term processing and long-term mental representations. I argue for the existence of an ethnicity activation link between the L1 and L2. The strength of this link varies based on the directionality and time-course of activation, the sociophonetic variable in the word, and the listener's previous experience with the social category.