Doctor of Philosophy in Linguistics (PhD)
Inclusive by Design: Natural Language Technology for Africa
Deep Learning. Deep learning of natural language. Natural Language Processing. Computational Linguistics. Natural Language Inference. Machine Translation. Misinformation. Detection of Negative and Abusive Content Online. Applications of deep learning in health and well-being.
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
Arabic dialect identification (ADI) is an important aspect of the Arabic speech processing pipeline, and in particular dialectal Arabic automatic speech recognition (ASR) models. In this work, we present an overview of corpora and methods applicable to both ADI and dialectal Arabic ASR, then we benchmark two approaches to using pre-trained speech representation models for ADI. Namely, we first employ direct fine-tuning, and then use fixed-representations extracted from pre-trained models as an intermediate step in the ADI process. We train and evaluate our models on the granular ADI-17 Arabic dialect corpus (92% F1 for our fine-tuned HuBERT model), and further probe generalization by evaluating our trained models on coarse-grained ADI-5, (80% F1 for fine-tuned HuBERT).
Natural language processing (NLP) has pervasive applications in everyday life, and has recently witnessed rapid progress. Incorporating latent variables in NLP systems can allow for explicit representations of certain types of information. In neural machine translation systems, for example, latent variables have the potential of enhancing semantic representations. This could help improve general translation quality. Previous work has focused on using variational inference with diagonal covariance Gaussian distributions, which we hypothesize cannot sufficiently encode latent factors of language which could exhibit multi-modal distributive behavior. Normalizing flows are an approach that enables more flexible posterior distribution estimates by introducing a change of variables with invertible functions. They have previously been successfully used in computer vision to enable more flexible posterior distributions of image data. In this work, we investigate the impact of normalizing flows in autoregressive neural machine translation systems. We do so in the context of two currently successful approaches, attention mechanisms, and language models. Our results suggest that normalizing flows can improve translation quality in some scenarios, and require certain modelling assumptions to achieve such improvements.