Relevant Thesis-Based Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Graduate Student Supervision
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
A Bayesian coreset is a small, weighted subset of data that replaces the full dataset during Bayesian inference, with the goal of reducing computational cost. Although past work has shown empirically that there often exists a coreset of high quality, efficiently constructing such a coreset remains a challenge. Current methods tend to be slow, require a secondary inference step after coreset construction, and do not provide bounds on the data marginal evidence. In this work, we introduce a new method—sparse Hamiltonian flows—that addresses all three of these challenges. The method involves first subsampling the data uniformly, and then optimizing a Hamiltonian flow parametrized by coreset weights and including periodic momentum quasi-refreshment steps. Theoretical results show that the method enables an exponential compression of the dataset in a representative model, and that the quasi-refreshment steps reduce the KL divergence to the target. Real and synthetic experiments demonstrate that sparse Hamiltonian flows provide accurate posterior approximations with significantly reduced runtime compared with competing dynamical-system-based inference methods.
Boosting variational inference (BVI) approximates Bayesian posterior distributions by iteratively building a mixture of components. However, BVI requires greedily optimizing the next component—an optimization problem that becomes increasingly computationally expensive as more components are added to the mixture. Furthermore, previous work has only used simple (i.e., Gaussian) component distributions; in practice, many of these components are needed to obtain a reasonable approximation. These shortcomings can be addressed by considering components that adapt to the target density. However, natural choices such as MCMC chains do not have tractable densities and thus require a density-free divergence for training. As a first contribution, we show that the kernelized Stein discrepancy—which to the best of our knowledge is the only density-free divergence feasible for VI—cannot detect when an approximation is missing modes of the target density. Hence, it is not suitable for boosting components with intractable densities. As a second contribution, we develop locally-adaptive boosting variational inference (LBVI), in which each component distribution is a Sequential Monte Carlo (SMC) sampler, i.e., a tempered version of the posterior initialized at a given simple reference distribution. Instead of greedily optimizing the next component, we greedily choose to add components to the mixture and perturb their adaptivity, thereby causing them to locally converge to the target density; this results in refined approximations with considerably fewer components. Moreover, because SMC components have tractable density estimates, LBVI can be used with common divergences (such as the Kullback–Leibler divergence) for model learning. Experiments show that, when compared to previous BVI methods, LBVI produces reliable inference with fewer components and in less computation time.
Completely random measures provide a principled approach to creating flexible unsupervised models, where the number of latent features is infinite and the number of features that influence the data grows with the size of the data set. Due to the infinity the latent features, posterior inferencerequires either marginalization---resulting in dependence structures that prevent efficient computation via parallelization and conjugacy---or finite truncation, which arbitrarily limits the flexibility of the model. In this paper we present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables, enabling efficient, parallelized computation without sacrificing flexibility. In contrast to past work that achieved this on a model-by-model basis, we provide a general recipe that is applicable to the broad class of completely random measure-based priors. The efficacy of the proposed algorithmis evaluated on several popular nonparametric models, demonstrating a higher effective sample size per second compared to algorithms using marginalization as well as a higher predictive performance compared to models employing fixed truncations.
Variational inference is a popular alternative to Markov chain Monte Carlo methodsthat constructs a Bayesian posterior approximation by minimizing a discrepancy tothe true posterior within a pre-specified family. This converts Bayesian inferenceinto an optimization problem, enabling the use of simple and scalable stochas-tic optimization algorithms. However, a key limitation of variational inferenceis that the optimal approximation is typically not tractable to compute; even insimple settings the problem is nonconvex. Thus, recently developed statisticalguarantees—which all involve the (data) asymptotic properties of the optimal varia-tional distribution—are not reliably obtained in practice. In this work, we providetwo major contributions: a theoretical analysis of the asymptotic convexity prop-erties of variational inference in the popular setting with a Gaussian family; andconsistent stochastic variational inference (CSVI), an algorithm that exploits theseproperties to find the optimal approximation in the asymptotic regime. CSVI con-sists of a tractable initialization procedure that finds the local basin of the optimalsolution, and a scaled gradient descent algorithm that stays locally confined to thatbasin. Experiments on nonconvex synthetic examples show that compared withstandard stochastic gradient descent, CSVI improves the likelihood of obtaining theglobally optimal posterior approximation.