Doctor of Philosophy in Computer Science (PhD)
Optimization for Machine Learning
This thesis makes contributions to two problems in learning theory: prediction with expert advice and learning mixtures of Gaussians.The problem of prediction with expert advice can be cast as a sequential game between an algorithm and an adversary as follows. At each time step, an algorithm chooses one of n options (or experts) and the adversary sets a cost for each expert. The algorithm's goal is to minimize its regret, i.e. its cost relative to the best expert in hindsight. The celebrated multiplicative weights algorithm is known to be optimal if the the game is terminated at a fixed, known time and the number of experts is large. Optimal algorithms are also known when the number of experts is 2, 3, or 4.If the game does not terminate at a known time or is run indefinitely, the optimal algorithm is not known for any number of experts. We contribute to this problem by giving the optimal algorithm when there are 2 experts. Our algorithm is designed by considering a continuous analogue of the problem, which is solved using ideas from stochastic calculus.In the second part of the thesis, we look at distribution learning, which is a fundamental task in statistics that has been studied for over a century. We consider such a problem where the distribution is a mixture of k Gaussians in d dimensions. The objective is density estimation: given i.i.d. samples from the unknown distribution, produce a distribution whose total variation from the unknown distribution is within some desired accuracy. We contribute to this problem by designing an algorithm with near-optimal sample complexity.
Low-stretch trees are spanning trees which provide approximate distance preservation for edges in the original graph by minimizing stretch. We explore the application of these trees to network visualization. In particular, we present a novel edge bundling technique, LSTB, that computes edge bundles explicitly and efficiently and does not rely on fixed vertex positions. This approach is in contrast to previous methods, which require the user to provide a layout of the input graph. We introduce an abstract framework for edge bundling methods, which provides a unifying formalization of bundling terminology and techniques, as well as a classification of such methods. Based on this framework, LSTB provides algorithmic support for sophisticated visual encodings, including dynamic layout adjustment and interactive bundle querying.In addition, we explore the use of the multiplicative weights update method to compute a distribution over low-stretch trees in order to achieve low stretch for all edges in expectation, rather than on average. We present the results of using this distribution in place of a single low-stretch tree as a routing graph for LSTB. While the distribution provides better stretch guarantees, we find that from a visual perspective a single low-stretch tree provides a better routing graph for the LSTB edge bundling application.
Herding is an algorithm of recent interest in the machine learning community, motivated by inference in Markov random fields. It solves the following Sampling Problem: given a set Χ \subset R^d with mean μ, construct an infinite sequence of points from Χ such that, for every t ≥ 1, the mean of the first t points in that sequence lies within Euclidean distance O(1/t) of μ. The error of a solution to Sampling Problem is defined to be the distance between the empirical mean of the first t samples and the original mean μ.The O(1/t) error bound suppresses the dependence on d and Χ. In this thesis, we study the best dependence on d and |Χ| that can be achieved for the error in Sampling Problem. Known analysis of the Herding algorithm give an error bound that depends on geometric properties of Χ but, even under favorable conditions, this bound depends linearly on d.We first show that any algorithm for the Sampling Problem must have error Ω(√d/t). Afterward, we present a new polynomial-time algorithm that solves the Sampling Problem with error O(√d log^2.5|Χ|/t) assuming that Χ is finite. This implies that our algorithm is optimal to within logarithmic factors. Finally, we prove that the actual error of the Herding Algorithm is strictly worse than the error of our algorithm if we measure the error in the infinity-norm. Our algorithm is randomized and based on recent algorithmic results in discrepancy theory. We implement our algorithm and other potential solutions for the Sampling Problem and evaluate them on various inputs.