Relevant Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Graduate Student Supervision
Master's Student Supervision (2010 - 2021)
Boosting variational inference (BVI) approximates Bayesian posterior distributions by iteratively building a mixture of components. However, BVI requires greedily optimizing the next component—an optimization problem that becomes increasingly computationally expensive as more components are added to the mixture. Furthermore, previous work has only used simple (i.e., Gaussian) component distributions; in practice, many of these components are needed to obtain a reasonable approximation. These shortcomings can be addressed by considering components that adapt to the target density. However, natural choices such as MCMC chains do not have tractable densities and thus require a density-free divergence for training. As a first contribution, we show that the kernelized Stein discrepancy—which to the best of our knowledge is the only density-free divergence feasible for VI—cannot detect when an approximation is missing modes of the target density. Hence, it is not suitable for boosting components with intractable densities. As a second contribution, we develop locally-adaptive boosting variational inference (LBVI), in which each component distribution is a Sequential Monte Carlo (SMC) sampler, i.e., a tempered version of the posterior initialized at a given simple reference distribution. Instead of greedily optimizing the next component, we greedily choose to add components to the mixture and perturb their adaptivity, thereby causing them to locally converge to the target density; this results in refined approximations with considerably fewer components. Moreover, because SMC components have tractable density estimates, LBVI can be used with common divergences (such as the Kullback–Leibler divergence) for model learning. Experiments show that, when compared to previous BVI methods, LBVI produces reliable inference with fewer components and in less computation time.
Completely random measures provide a principled approach to creating flexible unsupervised models, where the number of latent features is infinite and the number of features that influence the data grows with the size of the data set. Due to the infinity the latent features, posterior inferencerequires either marginalization---resulting in dependence structures that prevent efficient computation via parallelization and conjugacy---or finite truncation, which arbitrarily limits the flexibility of the model. In this paper we present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables, enabling efficient, parallelized computation without sacrificing flexibility. In contrast to past work that achieved this on a model-by-model basis, we provide a general recipe that is applicable to the broad class of completely random measure-based priors. The efficacy of the proposed algorithmis evaluated on several popular nonparametric models, demonstrating a higher effective sample size per second compared to algorithms using marginalization as well as a higher predictive performance compared to models employing fixed truncations.
Variational inference is a popular alternative to Markov chain Monte Carlo methodsthat constructs a Bayesian posterior approximation by minimizing a discrepancy tothe true posterior within a pre-specified family. This converts Bayesian inferenceinto an optimization problem, enabling the use of simple and scalable stochas-tic optimization algorithms. However, a key limitation of variational inferenceis that the optimal approximation is typically not tractable to compute; even insimple settings the problem is nonconvex. Thus, recently developed statisticalguarantees—which all involve the (data) asymptotic properties of the optimal varia-tional distribution—are not reliably obtained in practice. In this work, we providetwo major contributions: a theoretical analysis of the asymptotic convexity prop-erties of variational inference in the popular setting with a Gaussian family; andconsistent stochastic variational inference (CSVI), an algorithm that exploits theseproperties to find the optimal approximation in the asymptotic regime. CSVI con-sists of a tractable initialization procedure that finds the local basin of the optimalsolution, and a scaled gradient descent algorithm that stays locally confined to thatbasin. Experiments on nonconvex synthetic examples show that compared withstandard stochastic gradient descent, CSVI improves the likelihood of obtaining theglobally optimal posterior approximation.