Doctor of Philosophy in Linguistics (PhD)
Investigating the psycholinguistic connections between Pidgin and ʻŌlelo Hawaiʻi
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
Spoken language presents listeners with a range of phonetic variation. Systematic categorical variation within/across languages/dialects exposes listeners to different pronunciation variants. This dissertation examines the pronunciation variants of a Cantonese sound change where syllable-initial/n/ (nou5 腦“brain”) is pronounced with [l] (occasionally producing homophones; lou5 腦“brain”/ 老“old”). Sociolinguistic work suggests that historical [n]-initial pronunciations are prestige variants, used in more formal contexts, and innovative [l]-initial pronunciations, while socially stigmatized,are more frequent and used in more casual contexts. Little work has examined the consequences of this sound change for speech perception and lexical processing. I test Cantonese listeners on the perception, recognition, and encoding of these sound change pronunciation variants acrosssix experiments. An immediate repetition priming paradigm with [l]-initial targets (Experiment 1) demonstrates recognition equivalence between [n] and [l] forms, in spite of phonetic sensitivity to [n] and [l] evidenced in AX discrimination (Experiments 2a, 6a) and categorization tasks (Experiments2b, 6b). A long distance repetition priming task (Experiment 3) establishes equivalence between [n] and [l] forms in long term recognition as well, with slightly more priming by historical [n], which I examine in an old-new recognition task (Experiment 4). The recognition task data with [l]-initial targets suggest that listeners dually map [n]- and [l]-initial pronunciation variants to a single lexical representation. An immediate priming task with [n]-initial targets (Experiment 5) demonstrates the same overall recognition equivalence, though, with slightly less priming across the board. This provides further evidence in favour of dual mapping, as [n] and [l] act to facilitate the recognition of each other. This work contributes to our understanding of the [n]-[l] sound change and uniquely situates the study of phonetic variation, traditionally studied through the lens of within-/cross-language/dialect pronunciation variants, in the context of diachronic sound change variants.
Bilingual speech production is highly variable. This variability arises from numerous sources, ranging from the heterogeneity of linguistic experiences to crosslinguistic influence and more. This area has historically been challenging to study, given the relative lack of high-quality bilingual speech corpora and scientific inquiry that such resources enable. This dissertation introduces the SpiCE corpus of bilingual Speech in Cantonese and English and reports on two studies with the corpus. Chapter 2 describes how SpiCE was designed, collected, transcribed, and annotated. Broadly, it comprises recordings of 34 early Cantonese-English bilinguals conversing in both languages, hand-corrected orthographic transcripts, and force-aligned phone level annotations. Chapters 3 and 4 are motivated by a desire to understand how crosslinguistic similarity shapes phonetic variation in speech production. Chapter 3 addresses this question at the level of voice. Using 24 filter and source-based acoustic measurements over all voiced speech in the interviews, principal components and canonical redundancy analyses demonstrate that while talkers vary in the degree to which they have the same ``voice'' across languages, all talkers show strong similarity with themselves. To a lesser extent, talkers exhibit similarities with one another, providing further support for prototype models of voice. Chapter 4 pivots to the level of sound categories. Prior work in this area emphasizes detecting crosslinguistic influence for phonetically distinct yet phonologically similar sounds. This chapter leverages the uniformity framework to assess underlying phonetic similarity for the long-lag stop series in Cantonese and English. Results indicate moderate patterns of uniformity within and across languages but suggest that a slightly coarser view of uniformity is more appropriate. Additionally, there was a clear difference across languages, supporting simultaneous roles for talker and language. Together, Chapters 3 and 4 give shape to how crosslinguistic similarity is structured and offer a solid ground for generating perceptual hypotheses for areas like multilingual talker identification. Altogether, this dissertation provides a novel resource and highlights the importance of corpus research, both for understanding production processes and for guiding perception research.
A fundamental task of linguistics is to accurately describe the sound patterns of a language. In the field of phonology, this often starts with identifying the set of contrastive sounds in the language, its phoneme inventory. If the language under investigation is a tone language, then identifying the contrastive tones in the language, its tone inventory, is also needed. Historically, phonologists have identified phoneme and tone inventories through lengthy elicitation sessions in order to determine contrasting units. Yet, given the recent advances in machine learning, there may be another way. In this thesis, I argue, by way of demonstration, that machine learning has become a valuable tool for field and theoretical linguists in the description of language and in the development of linguistic theory. Specifically, I present empirical support, using machine learning methods, for the theory of Emergent Phonology, which holds that phonology emerges as the "consequence of accumulated phonetic experience'' (Lindblom, 1999, p. 195). This support comes in the form of hypothesized tone inventories (part of one's phonology) that emerge, via an unsupervised learning model, from acoustic-phonetic data for a given language. Since the hypothesized inventories match fairly well with the tone inventories standardly reported in the literature, an aspect of phonology is shown to have emerged from phonetics and support for Emergent Phonology is achieved. To test the robustness of the unsupervised learning method, it is applied to four languages: Mandarin, Cantonese, Fungwa and English. Finally, since the identification of tone inventories has hitherto been under the purview of human linguists, success in this project provides a first step towards creating a grammaticus ex machina -- a linguist (grammarian) from the machine.
Psychophysical studies of perceptual learning find that perceivers only improve the accuracy of their perception on stimuli similar to what they were trained on. In contrast, speech perception studies of perceptual learning find generalization to novel contexts when words contain a modified ambiguous sound. This dissertation seeks to resolve the apparent conflict between these findings by framing the results in terms of attentional sets. Attention can be oriented towards comprehension of the speaker’s intended meaning or towards perception of a speaker’s pronunciation. Attention is proposed to affect perceptual learning as follows. When attention is oriented towards comprehension, more abstract and less context-dependent representations are updated and the perceiver shows generalized perceptual learning, as seen in the speech perception literature. When attention is oriented towards perception, more finely detailed and more context-dependent representations are updated and the perceiver shows less generalized perceptual learning, similar to what is seen in the psychophysics literature. This proposal is supported by three experiments. The first two implement a standard paradigm for perceptual learning in speech perception. In these experiments, promoting a more perception-oriented attentional set causes less generalized perceptual learning. The final experiment uses a novel paradigm where modified sounds are embedded in sentences during exposure. Perceptual learning is found only when the modified sound is embedded in words that are not predictable from the sentence. When modified sounds are in predictable words, no perceptual learning is observed. To account for this lack of perceptual learning, I hypothesize that sounds in predictable sentences are less reliable than sounds in words in isolation or unpredictable sentences. In the cases where perceptual learning is present, contexts which support comprehension-oriented attentional sets show larger perceptual learning effects than contexts promoting perception-oriented attentional sets. I argue that attentional sets are a key component to the generalization of perceptual learning to new contexts.
Speech convergence is the tendency of talkers to become more similar to someone they are listening or talking to, whether that person is a conversational partner or merely a voice heard repeating words. The cause of this phenomenon is unknown: it may be related to a general link between perception and behaviour (Dijksterhuis & Bargh, 2001), a coupling between speech production and speech perception systems (Pickering & Garrod, 2013), or an effort to minimize social distance between interlocutors (Giles et al., 1991). How convergence is facilitated or inhibited by various factors (e.g., gender, dialect, level of attention) can help pinpoint the reasons behind it. One as-yet unexamined factor in this regard is cognitive workload, i.e., the information processing load a person experiences when performing a task. The harder the task, the greater the cognitive workload. This study examines the effect of different levels of task difficulty on speech convergence within dyads collaborating on a task. Dyad members had to build identical LEGO® constructions without being able to see each other’s construction, and with each member having half of the instructions required to complete the construction. Three levels of task difficulty were created, with five dyads at each level (30 participants total). Listeners (n = 62) who heard pairs of utterances from each dyad judged convergence to be occurring in the Easy condition and to a lesser extent in the Medium condition, but not in the Hard condition. Acoustic similarity analyses of the same utterance pairs using amplitude envelopes and mel-frequency cepstral coefficients showed convergence on the part of some dyads but divergence on the part of others, with no clear effect of difficulty. Speech rate and pausing behaviour, both of which can demonstrate convergence (e.g., Pardo et al., 2013a) and be affected by workload (e.g., Lively et al., 1993; Khawaja, 2010), also showed both convergence and divergence, with difficulty possibly playing a role. The results suggest that difficulty affects speech convergence, but that it may do so differently for different talkers. Factors such as whether talkers are giving or receiving instructions also seem to interact with difficulty in affecting convergence.
Psycholinguistic studies on bilingualism generally investigate how linguistic information is shared between a listener's first language (L1) and second language (L2) at the conceptual level and in the lexicon. At the same time speech perception studies examine how social information affects language processing and representation. This dissertation brings these two lines of research together and demonstrates that the L1 and L2 are connected through a social category activation link, in addition to previously proposed conceptual and lexical links. In particular, I show that the activation of ethnicity operates under a shared system across the L1 and L2 during both immediate speech processing and long-term abstract representations. This claim is supported by sensitivity and reaction time results from two priming experiments. In a novel cross-language / cross-dialect paradigm, English (L1) - Maori (L2) bilingual New Zealanders participated in a short-term and a long-term auditory lexical decision task (72 and 45 subjects respectively), where critical prime and target pairs were made up of English-to-Maori and Maori-to-English translation equivalents. Half of the English target words were pronounced by standard New Zealand Pakeha English speakers and half by Maori English speakers, thus creating nine test conditions: four bilingual conditions, four English-only conditions, and a within-Maori repetition priming condition. Each critical English word contained one of four sociophonetic variables: theta, final /z/, and the GOOSE or GOAT vowels. The results reveal a stronger connection between Maori and Maori English representations than between Maori and Pakeha English representations both in short-term processing and long-term mental representations. I argue for the existence of an ethnicity activation link between the L1 and L2. The strength of this link varies based on the directionality and time-course of activation, the sociophonetic variable in the word, and the listener's previous experience with the social category.