David Poole: Professor at Department of Computer Science, UBC Faculty of Science

Professor

Faculty of Science

Research Classification

Computer and information sciences

Research Interests

Artificial Intelligence

Decision Analysis

Knowledge Representation

Machine Learning

Preference Elicitation

Probabilistic Graphical Models

Reasoning under Uncertainty

Relational Learning

Relevant Thesis-Based Degree Programs

View all programs

Affiliations to Research Centres, Institutes & Clusters

CAIDA: UBC ICICS Centre for Artificial Intelligence Decision-making and Action

Institute for Computing, Information and Cognitive Systems (ICICS)

Open All

Research Methodology

machine learning

Probabilistic Inference

Multiattribute Uitility

Recruitment

Looking to recruit:

Master's students

Doctoral students

Desired start dates: Any time / year round

Other options:

I support public scholarship, e.g. through the Public Scholars Initiative, and am available to supervise students and Postdocs interested in collaborating with external partners as part of their research.

I support experiential learning experiences, such as internships and work placements, for my graduate students and Postdocs.

Complete these steps before you reach out to a faculty member!

Check requirements

Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.

Focus your search

Identify specific faculty members who are conducting research in your specific area of interest.
Establish that your research interests align with the faculty member’s research interests.
- Read up on the faculty members in the program and the research being conducted in the department.
- Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.

Make a good impression

Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
- Do not send non-specific, mass emails to everyone in the department hoping for a match.
- Address the faculty members by name. Your contact should be genuine rather than generic.
Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
Demonstrate that you are familiar with their research:
- Convey the specific ways you are a good fit for the program.
- Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
Be enthusiastic, but don’t overdo it.

Attend an information session

G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.

ADVICE AND INSIGHTS FROM UBC FACULTY ON REACHING OUT TO SUPERVISORS

These videos contain some general advice from faculty across UBC on finding and reaching out to a potential thesis supervisor.

Supervision Enquiry

If you have reviewed some of this faculty member's publications, understand their research interests and have reviewed the admission requirements, you may .

Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

Representation learning with explicit and implicit graph structures (2023)

The world around us is composed of objects each having relations with other objects. The objects and relations form a (hyper)graph with objects as nodes and relations between objects as (hyper)edges. When learning, the underlying structure representing the relations between the nodes is either given explicitly in the training set or is implicit and needs to be inferred. This dissertation studies graph representation learning with both explicit and implicit structures. For explicit structure, we first tackle the challenge of enforcing taxonomic information while embedding entities and relations. We prove that some fully expressive models cannot respect subclass and subproperty information. With minimal modifications to an existing knowledge graph completion method, we enable the injection of taxonomic information. A second challenge is in representing explicit structures in relational hypergraphs that contain relations defined on an arbitrary number of entities. While techniques, such as reification, exist that convert non-binary relations into binary ones, we show that current embedding-based methods do not work well out of the box for knowledge graphs obtained through these techniques. We introduce embedding-based methods that work directly with relations of arbitrary arity. We also develop public datasets, benchmarks, and baselines and show experimentally that the proposed models are more effective than the baselines. We further bridge the gap between relational algebra and knowledge hypergraphs by proposing an embedding-based model that can represent relational algebra operations. Having introduced novel architectures for explicitly graph-structured data, we further investigate how models with relational inductive biases can be developed and applied to problems with implicit structures. Graph representation learning models work well when the structure is explicit. However, this structure may not always be available in real-world applications. We propose the Simultaneous Learning of Adjacency and graph neural network Parameters with Self-supervision, or SLAPS, a method that provides more supervision for inferring a graph structure through self-supervision. An experimental study demonstrates that SLAPS scales to large graphs with hundreds of thousands of nodes and outperforms several baselines on established benchmarks.

View record

Representing and learning relations and properties under uncertainty (2019)

The world around us is composed of entities, each having various properties and participating in relationships with other entities. Consequently, data is often inherently relational. This dissertation studies probabilistic relational representations, reasoning and learning with a focus on three common prediction problems for relational data: link prediction, property prediction, and joint prediction. For link prediction, we develop a tensor factorization model called SimplE which is simple, interpretable, fully-expressive, and integratable with certain types of domain expert knowledge. On two standard benchmarks for knowledge graph completion, we show how SimplE outperforms the state-of-the-art models. For property prediction, first we study the limitations of the existing StaRAI models when being used for property prediction. Based on this study, we develop relational neural networks which combine ideas from lifted relational models with deep learning and perform well empirically. We base the joint prediction on lifted relational models for which parameter learning typically requires inference over a highly-connected graphical model. The inference step is usually the bottleneck for learning. We study a class of inference algorithms known as lifted inference which makes inference tractable by exploiting both conditional independence and symmetries. We study two ways of speeding up lifted inference algorithms: 1- through proposing heuristics for elimination ordering and 2- through compiling the lifted operations to low-level languages. We also expand the largest known class of models for which we know how to do efficient lifted inference. Thus, structure learning algorithms for lifted relational models that restrict the search space to models for which efficient inference algorithms exist can perform their search over a larger space.

View record

Equilibrium policy for gradients for spatiotemporal planning (2012)

In spatiotemporal planning, agents choose actions at multiple locations in spaceover some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actionsfor thousands of locations in the forest each year. The actions at each locationcould include harvesting trees, treating trees against disease and pests, ordoing nothing. A utility model could place value on sale of forest products,ecosystem sustainability or employment levels, and could incorporate legal andlogistical constraints such as avoiding large contiguous areas of clearcuttingand managing road access. Planning requires a model of the dynamics. Existingsimulators developed by forestry researchers can provide detailed models of thedynamics of a forest over time, but these simulators are often not designed foruse in automated planning.This thesis presents spatiotemoral planning in terms of factored Markov decision processes. A policy gradient planning algorithm optimizes a stochastic spatial policy using existing simulators for dynamics.When a planning problem includes spatial interaction between locations, deciding on an action to carry out at one location requires considering the actions performed at other locations. This spatial interdependence is common in forestry and other environmental planning problems and makes policy representation and planning challenging. We define a spatial policy in terms of local policies defined as distributions over actions at one location conditioned upon actions at other locations.A policy gradient planning algorithm using this spatial policy is presented whichuses Markov Chain Monte Carlo simulation to sample the landscape policy, estimate its gradient and use this gradient to guide policy improvement. Evaluation is carried out on a forestry planning problem with 1880 locations using a variety of value models and constraints.The distribution over joint actions at all locations can be seen as the equilibrium of a cyclic causal model. This equilibrium semantics is compared to Structural Equation Models. We also define an algorithm for approximating the equilibrium distribution for cyclic causal networks which exploits graphical structure and analyse when the algorithm is exact.

View record

Learning Latent Theories of Relations and Individuals (2011)

Inductive learning of statistical models from relational data is a key problem in artificial intelligence. Two main approaches exist for learning with relational data, and this thesis shows how they can be combined in a uniform framework. The first approach aims to learn dependencies amongst features (relations and properties), e.g. how users' purchases of products depend on users' preferences of the products and associated properties of users and products. Such models abstract over individuals, and are compact and easy to interpret.The second approach learns latent properties of individuals that explain the observed features, without modelling interdependencies amongst features. Latent-property models have demonstrated good predictive accuracy in practise, and are especially useful when few properties and relations are observed. Interesting latent groupings of individuals can be discovered.Our approach aims to learn a unified representation for dependency structures for both observed features and latent properties. We develop a simple approximate EM algorithm for learning the unified representation, and experiments demonstrate cases when our algorithm can generate models that predicts better than dependency-based models of observed features as well as a state-of-the-art latent-property model. We extend our approximate EM algorithm to handle uncertainty about the number of values for latent properties. We search over the number of values and return error bounds, as an alternative to existing proposals based on sampling in the posterior distribution over the number of values.We also solve a specific case where dependencies involve functional relations, which induces a verbose model with many parameters. In comparison, the standard solution of aggregating over all values of the function yields a simple model that predicts poorly. We show how to learn an optimal intermediate-size representation efficiently by clustering the values of the function. The proposed method generates models that capture interesting clusters of function values, dominates the simple model in prediction, and can surpass the verbose model using much fewer parameters.

View record

Aggregation and constraint processing in lifted probabilistic inference (2010)

Representations that mix graphical models and first-order logic - called either first-orderor relational probabilistic models — were proposed nearly twenty years ago and many more have since emerged. In these models, random variables are parameterized by logical variables.One way to perform inference in first-order models is to propositionalize the model, that is, to explicitly consider every element from the domains of logical variables. This approach might be intractable even for simple first-order models.The idea behind lifted inference is to carry out as much inference as possible without propositionalizing.An exact lifted inference procedure for first-order probabilistic models was developed by Poole [2003] and later extended to a broader range of problems by de Salvo Braz et al. [2007]. The C-FOVE algorithm by Milch et al. [2008] expanded the scope of lifted inference and is currently the state of the art in exact lifted inference.In this thesis we address two problems related to lifted inference: aggregation in directed first-order probabilistic models and constraint processing during lifted inference.Recent work on exact lifted inference focused on undirected models. Directed first-order probabilistic models require an aggregation operator when a parent random variable is parameterized by logical variables that are not present in a child random variable. We introduce a new data structure, aggregation parfactors, todescribe aggregation in directed first-order models. We show how to extend the C-FOVE algorithm to perform lifted inference in the presence of aggregation parfactors. There are cases where the polynomial time complexity (in the domain size of logical variables) of the C-FOVE algorithm can be reduced to logarithmic time complexity using aggregation parfactors.First-order models typically contain constraints on logical variables. Constraints are important for capturing knowledge regarding particular individuals. However, the impact of constraint processing on computational efficiency of liftedinference has been largely overlooked. In this thesis we develop an efficient algorithmfor counting the number of solutions to the constraint satisfaction problems encountered during lifted inference. We also compare, both theoretically and empirically, different ways of handling constraints during lifted inference.

View record

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

Transporting and evaluating predictive models in different environments (2022)

In many data-driven applications, transporting a model from one environment to another has always been challenging. When experiments are conducted on a specific location or population, the problem of whether the results can be applied to a new location or population arises. The majority of the work in this area assumes that some conditional probabilities are transportable, even though they can be affected by unmeasured variables. We propose methods for transporting predictive models learned in a source population to target when we have limited information about the target population. These methods can cancel out the effect of the unmeasured variables using the ratio of conditional probabilities. We test the proposed methods using forest fire and stroke datasets.

View record

Hierarchical structure and ordinal features in class-based linear models (2021)

In many real world datasets, we seek to make predictions about entities, where the entities are in classes that are interrelated. A commonly studied problem, known as the reference class problem, is how to combine information from relevant classes to make predictions about entities.The intersection of all classes that an entity is a member of constitutes the most specific class for that entity. When seeking to make predictions about such intersection classes for which we have not observed much (or any) data, we would like to combine information from more general classes to create a prior. If there is no data for the intersection, we would have to rely entirely on the prior. However, if data exists but is scarce, we seek to balance the prior with the available data. We first investigate a model where we assign weights to classes, and additively combine weights to make predictions. The use of regularisation forces generalisation; the signal gets pushed up to more general classes. To make a prediction for an unobserved intersection of classes, we would use the weights from the individual classes that comprise the intersection.We introduce several variants that average the predictions, as well as a probabilistic mix of these variants.We then propose a bounded ancestor method, which balances the creation of an informed prior with observed data for classes varying amounts of observations.When dealing with ordinal properties, such as shoe size, we can dynamically create new classes and subclasses in ways that are conducive to creating more informative priors. We do this by splitting the ordinal properties. Throughout, we test on the MovieLens and UCSD Fashion datasets.We found that a combination of the three bounded ancestor method variants resulted in the best performance, and the best combination varied between datasets. We found that a simple model that assigns weights to classes and additively makes predictions slightly outperformed the bounded ancestor method for supervised classification. For the bounded ancestor method, we found that splitting ordinal properties in different ways had minimal impact on the error metrics we used.

View record

Predicting landslides using contour aligning convolutional neural networks (2020)

Landslides are movement of soil and rock under the influence of gravity. They are common phenomena that cause significant human and economic losses every year. To reduce the impact of landslides, experts have developed tools to identify areas that are more likely to generate landslides. We propose a novel statistical approach for predicting landslides using deep convolutional networks. Using a standardized dataset of georeferenced images consisting of slope, elevation, land cover, lithology, rock age, and rock family as inputs, we deliver a landslide susceptibility map as output. We call our model a Locally Aligned Convolutional Neural Network, LACNN, as it follows the ground surface at multiple scales to predict possible landslide occurrence for a single point. To validate our method, we compare it to several baselines, including linear regression, a neural network, and a convolutional network, using log-likelihood error and Receiver Operating Characteristic curves on the test set. We show that our model performs better than the other proposed baselines, suggesting that such deep convolutional models are effective in heterogenous datasets for improving landslide susceptibility maps, which has the potential to reduce the human and economic cost of these events.

View record

Prediction and anomaly detection in water quality with explainable hierarchical learning through parameter sharing (2020)

Decisions made on water quality have high implications for diverse industries and general population. In a 2020 study, Guo et al. report that the current literature on modeling spatiotemporal variabilities in surface water quality at large scales across multiple catchments is very poor. In this thesis, we introduce a simple, explainable, and transparent machine learning model that is derived from linear regression with hierarchical features for efficient prediction and for anomaly detection on large scale spatiotemporal datasets. Our model learns offsets for various features in the dataset while utilizing a hierarchy among the features. These offsets can enable generalization and be used in anomaly detection. We show some interesting theoretical results on such hierarchical models. We built a water pollution platform for exploratory data analysis of water quality data in large scales. We evaluate the predictions of our model on the Waterbase - Water Quality dataset by the European Environmental Agency. We also investigate the explainability of our model. Finally, we investigate the performance of our model in classification tasks while analyzing its ability to do regularization and smoothing as the number of observations grows in the dataset.

View record

Machine learning of lineaments from magnetic, gravity and elevation maps (2019)

Minerals exploration is becoming more difficult, particularly because most mineral deposits at the surface of the earth have been found. While there may be a lot of sensing data, there is a shortage of expertise to interpret that data. This thesis aims to bring some of the recent advances in AI to the interpretation of sensing data. Our AI model learns one-dimensional features (lineaments) from two-dimensional data (in particular, magnetics surveys, maps of gravity and digital elevation maps), which surprisingly has not had a great deal of attention (whereas getting two-dimensional or zero-dimensional features is very common). We define a convolutional neural network to predict the probability that a lineament passes through each location on the map. Then, using these probabilities, cluster analysis, and regression models, we develop a post-processing method to predict lineaments. We train and evaluate our model on large real-world datasets in BC and Australia.

View record

Finding a record in a database (2017)

Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem because people do not consistently use the official title of a company, but use abbreviations, synonyms, different orders of terms, and the title can contain typos. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query, and find the best record(s). Our model addresses many of challenges of the record linkage problem and provides good results when exact term matching search algorithms fail. We evaluate our model on a large real-world data set. Obtained results show that the model is a promising probabilistic record linkage model.

View record

Interactive Visualization for Group Decision-Making (2014)

In infrastructure planning, identifying ‘the best solution’ out of a given set of alternatives is a context-dependent multi-dimensional multi-stakeholder challenge in which competing criteria must be identified and trade-offs made. In a recent study, colleagues from Institute of Resources, Sustainability and Environment found that there is a need for a visualization tool that enables planners and decision makers to collectively explore individual preferences among those involved in the decision. This thesis concerns designing and evaluating an interactive visualization toolthat facilitates group decisions by making the problem analysis more participatory, transparent, and comprehensible. To do so, we extend the interactive visualization tool ValueCharts to create Group ValueCharts. We conducted studies withtwo different groups to evaluate the effectiveness of Group ValueCharts in group decision-making. The first group was university staff in leading positions in different departments, presently engaged in and responsible for water infrastructureplanning. The second group was employees of an analytics company who are involved in buying scientific software licenses. Each group was instructed on how to use the tool in application to their current decision problem. The discussions were audio recorded and the participants were surveyed to evaluate usability. The results indicate that participants felt the tool improved group interaction and information exchange, and made the discussion more participatory. Additionally, the participants strongly concur that the tool reveals disagreements and agreements within the group. These results suggest that Group ValueCharts has the ability to enhance transparency and comprehensibility in group decision-making.

View record

Modeling Ordinal Data for Recommendation System (2014)

In this work we investigate the problem of making personalized recommendations by creating models for predicting user-item rating, such as in movie recommendations. The study is based on the Movielens data set which has ratings on an ordinal scale. In the past, partly due to motivation gained by the Netflix challenge, researchers have constructed models that make point predictions to minimize the root mean square error (RMSE) on test sets, typically by learning latent user and movie feature structure. In such models, the difference between ratings of 2 and 3 stars is the same as the difference between ratings of 4 and 5 stars, etc., which is a strong prior assumption. We construct probabilistic models which also learn latent user and movie feature structure but do not make this assumption. These models interpret the ratings as categories (nominal and ordinal) and return a probability distribution over the ratings for each user-movie pair instead of making a point prediction. We evaluate and compare our models with other models for making personalized recommendations for the top-n task and comparing the precision vsrecall, receiver operating characteristic and cost curves. Our results show that our ordinal data model performs better than a nominal data model, a state-of-the-art point prediction model, and other baselines.

View record

Relational Logistic Regression (2014)

Aggregation is a technique for representing conditional probability distributions as an analytic function of parents. Logistic regression is a commonly used representation for aggregators in Bayesian belief networks when a child has multiple parents. In this thesis, we consider extending logistic regression to directed relational models, where there are objects and relations among them, and we want to model varying populations and interactions among parents. We first examine the representational problems caused by population variation. We show how these problems arise even in simple cases with a single parametrized parent, and propose a linear relational logistic regression which we show can represent arbitrary linear (in population size) decision thresholds, whereas the traditional logistic regression cannot. Then we examine representing interactions among the parents of a child node, and representing non-linear dependency on population size. We propose a multi-parent relational logistic regression which can represent interactions among parents and arbitrary polynomial decision thresholds. We compare our relational logistic regression to Markov logic networks and represent their analogies and differences. Finally, we show how other well-known aggregators can be represented using relational logistic regression.

View record

Sensing and Sorting Ore Using a Relational Influence Diagram (2014)

Mining companies typically process all the material extracted from a mine site using processes which are extremely consumptive of energy and chemicals. Sorting this material more effectively would reduce the resources required. A high-throughput rock-sorting machine developed by MineSense™ Technologies Ltd. provides the sensors and diverting equipment. After receiving noisy sensor data, the sorting system has 400 ms to decide whether to activate the diverters which will divert the rocks into either a keep or a discard bin. The problem tackled in this thesis is to sort an unknown number of rocks by sensing their mineralogy, position, and size using electromagnetic sensors and diverting them according to how valuable the mineral is to the mine. In real-time we must interpret the sensor data and compute the best action to take. We model the problem with a relational influence diagram which shows relations between random variables, decision variables, and utility nodes. We learn the model offline and do online inference. Inference is achieved using a combination of exhaustive and random search. The model parameters are learned using Sequential Model-based Algorithm Configuration (SMAC). We simulate the diverters for offline evaluation and evaluate our solution on recorded sensor data. Our result improves over the current state-of-the-art across the entire range of utility.

View record

Probabilistic reasoning with undefined properties in ontologically-based belief networks (2013)

This thesis concerns building probabilistic models with an underlying ontology that defines the classes and properties used in the model. In particular, it considers the problem of reasoning with properties that may not always be defined. Furthermore, we may even be uncertain about whether a property is defined for a given individual. One approach is to explicitly add a value "undefined" to the range of random variables, forming (what we call) extended belief networks. Adding an extra value to a random variable's range, however, has a large computational overhead. In this work, we propose an alternative, ontologically-based belief networks, where properties are only used when they are defined. We show how probabilistic reasoning can be carried out without explicitly using the value "undefined" during inference. This, in general, requires that we perform two probabilistic queries to determine (1) the probability that the hypothesis is defined and (2) the probabilities of the hypothesis given it is defined. We prove this is equivalent to reasoning with the corresponding extended belief network and empirically demonstrate on synthetic models that inference becomes more efficient.

View record