Relevant Thesis-Based Degree Programs
Graduate Student Supervision
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
In this thesis, we investigate the ability of neural networks, particularly Transformers, to reason and memorize. First, we focus on graph neural networks and Transformers, and analyze their performance on algorithmic reasoning tasks. We show that while models can achieve high accuracy on data from the same distribution as their training data, their performance drops significantly when faced with new, out-of-distribution data. We further show that even high performance on benchmark numbers may be misleading and true reasoning capability of these models remains limited. We identify several challenges involved in achieving true reasoning abilities and generalization to new data. We propose solutions to some of these challenges, including fixing input representation issues, hybrid models, and enlarging the training dataset. We also examine the expressivity of Transformers, providing a theoretical analysis of their ability to memorize data points. The results show a linear relationship between a Transformer's memory capacity and both the number of its attention heads as well as the input's context size.
Machine learning has trended toward training overparameterized networks that can exactly fit the training set. In this regime, when the training data is class-imbalanced, traditional methods for mitigating imbalances are not effective and yield poor performance on minorities. Tailored to the over-parameterized setting, one successful technique to combat imbalances involves adjusting the logits during training by introducing several hyper-parameters in the cross-entropy loss. The idea behind these adjustments is rooted in the implicit bias analysis which, for linear models, explains why they successfully induce bias on the optimization path towards solutions that favor minorities. However, the impact of these adjustments is not well-understood for deep, non-linear models that simultaneously learn features from the data and fit classifiers on them. In this work, we take a step towards formalizing the impact of data imbalances and the choice of loss function when learning in over-parameterized setups.At the core of our analysis is the unconstrained features model (UFM), which recently provided partial theoretical justification for the empirical finding known as Neural-collapse. Limited to the balanced setting, Neural-collapse suggests that over-parameterized models learn features and classifiers that are arranged in a perfectly symmetric and maximally-separated geometry. Leveraging the UFM in our analysis, we analytically characterize the changes in the learned geometry as data becomes imbalanced or the loss function is modified. Our analysis characterizes the properties of the learned features and classifiers, justifying previously observed empirical findings on imbalanced training. To verify the accuracy of our theoretical predictions, we conduct experiments on benchmark networks and vision datasets. Additionally, we adopt the UFM to conduct a preliminary study on supervised contrastive loss, a recently proposed alternative to the cross-entropy. We observe that these two losses, under UFM, exhibit certain similarities. Furthermore, with a slight modification to the model, we discover that we can make the training geometry invariant to imbalances in this case. Overall, despite its simplicity, UFM uncovers certain biases of over-parameterized models at the training stage. Indeed, this simplification is not without its own limitations, which will become clear throughout our analysis.
In many machine learning problems, an objective is required to be optimizedwith respect to some constraints. In many cases, these constraints are unknownto us but we can search and take measurements within an exploration radius. Inthe machine learning community, we call this safe optimization. One importantpoint in safe optimization algorithms is how fast we can converge to optima andget in a neighborhood of the optimal solution. In this thesis, we introduce a novelsafe optimization algorithm that is fast. The experiments are performed througha software package we developed called ASFW which is available for download.On the application side, we demonstrate how safe optimization techniques can beapplied to a real-world problem and make an attempt to employ safe optimizationapproaches to solve a real-world problem and propose a frame work that may beapplied in various industrial settings.