Relevant Degree Programs
Affiliations to Research Centres, Institutes & Clusters
Complete these steps before you reach out to a faculty member!
- Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
- Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Admission Information & Requirements" - "Prepare Application" - "Supervision" or on the program website.
- Identify specific faculty members who are conducting research in your specific area of interest.
- Establish that your research interests align with the faculty member’s research interests.
- Read up on the faculty members in the program and the research being conducted in the department.
- Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
- Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
- Do not send non-specific, mass emails to everyone in the department hoping for a match.
- Address the faculty members by name. Your contact should be genuine rather than generic.
- Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
- Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
- Demonstrate that you are familiar with their research:
- Convey the specific ways you are a good fit for the program.
- Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
- Be enthusiastic, but don’t overdo it.
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
Graduate Student Supervision
Doctoral Student Supervision
Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.
The world around us is composed of entities, each having various properties and participating in relationships with other entities. Consequently, data is often inherently relational. This dissertation studies probabilistic relational representations, reasoning and learning with a focus on three common prediction problems for relational data: link prediction, property prediction, and joint prediction. For link prediction, we develop a tensor factorization model called SimplE which is simple, interpretable, fully-expressive, and integratable with certain types of domain expert knowledge. On two standard benchmarks for knowledge graph completion, we show how SimplE outperforms the state-of-the-art models. For property prediction, first we study the limitations of the existing StaRAI models when being used for property prediction. Based on this study, we develop relational neural networks which combine ideas from lifted relational models with deep learning and perform well empirically. We base the joint prediction on lifted relational models for which parameter learning typically requires inference over a highly-connected graphical model. The inference step is usually the bottleneck for learning. We study a class of inference algorithms known as lifted inference which makes inference tractable by exploiting both conditional independence and symmetries. We study two ways of speeding up lifted inference algorithms: 1- through proposing heuristics for elimination ordering and 2- through compiling the lifted operations to low-level languages. We also expand the largest known class of models for which we know how to do efficient lifted inference. Thus, structure learning algorithms for lifted relational models that restrict the search space to models for which efficient inference algorithms exist can perform their search over a larger space.
In spatiotemporal planning, agents choose actions at multiple locations in spaceover some planning horizon to maximize their utility and satisfy various constraints. In forestry planning, for example, the problem is to choose actionsfor thousands of locations in the forest each year. The actions at each locationcould include harvesting trees, treating trees against disease and pests, ordoing nothing. A utility model could place value on sale of forest products,ecosystem sustainability or employment levels, and could incorporate legal andlogistical constraints such as avoiding large contiguous areas of clearcuttingand managing road access. Planning requires a model of the dynamics. Existingsimulators developed by forestry researchers can provide detailed models of thedynamics of a forest over time, but these simulators are often not designed foruse in automated planning.This thesis presents spatiotemoral planning in terms of factored Markov decision processes. A policy gradient planning algorithm optimizes a stochastic spatial policy using existing simulators for dynamics.When a planning problem includes spatial interaction between locations, deciding on an action to carry out at one location requires considering the actions performed at other locations. This spatial interdependence is common in forestry and other environmental planning problems and makes policy representation and planning challenging. We define a spatial policy in terms of local policies defined as distributions over actions at one location conditioned upon actions at other locations.A policy gradient planning algorithm using this spatial policy is presented whichuses Markov Chain Monte Carlo simulation to sample the landscape policy, estimate its gradient and use this gradient to guide policy improvement. Evaluation is carried out on a forestry planning problem with 1880 locations using a variety of value models and constraints.The distribution over joint actions at all locations can be seen as the equilibrium of a cyclic causal model. This equilibrium semantics is compared to Structural Equation Models. We also define an algorithm for approximating the equilibrium distribution for cyclic causal networks which exploits graphical structure and analyse when the algorithm is exact.
Inductive learning of statistical models from relational data is a key problem in artificial intelligence. Two main approaches exist for learning with relational data, and this thesis shows how they can be combined in a uniform framework. The first approach aims to learn dependencies amongst features (relations and properties), e.g. how users' purchases of products depend on users' preferences of the products and associated properties of users and products. Such models abstract over individuals, and are compact and easy to interpret.The second approach learns latent properties of individuals that explain the observed features, without modelling interdependencies amongst features. Latent-property models have demonstrated good predictive accuracy in practise, and are especially useful when few properties and relations are observed. Interesting latent groupings of individuals can be discovered.Our approach aims to learn a unified representation for dependency structures for both observed features and latent properties. We develop a simple approximate EM algorithm for learning the unified representation, and experiments demonstrate cases when our algorithm can generate models that predicts better than dependency-based models of observed features as well as a state-of-the-art latent-property model. We extend our approximate EM algorithm to handle uncertainty about the number of values for latent properties. We search over the number of values and return error bounds, as an alternative to existing proposals based on sampling in the posterior distribution over the number of values.We also solve a specific case where dependencies involve functional relations, which induces a verbose model with many parameters. In comparison, the standard solution of aggregating over all values of the function yields a simple model that predicts poorly. We show how to learn an optimal intermediate-size representation efficiently by clustering the values of the function. The proposed method generates models that capture interesting clusters of function values, dominates the simple model in prediction, and can surpass the verbose model using much fewer parameters.
Representations that mix graphical models and first-order logic - called either first-orderor relational probabilistic models — were proposed nearly twenty years ago and many more have since emerged. In these models, random variables are parameterized by logical variables.One way to perform inference in first-order models is to propositionalize the model, that is, to explicitly consider every element from the domains of logical variables. This approach might be intractable even for simple first-order models.The idea behind lifted inference is to carry out as much inference as possible without propositionalizing.An exact lifted inference procedure for first-order probabilistic models was developed by Poole  and later extended to a broader range of problems by de Salvo Braz et al. . The C-FOVE algorithm by Milch et al.  expanded the scope of lifted inference and is currently the state of the art in exact lifted inference.In this thesis we address two problems related to lifted inference: aggregation in directed first-order probabilistic models and constraint processing during lifted inference.Recent work on exact lifted inference focused on undirected models. Directed first-order probabilistic models require an aggregation operator when a parent random variable is parameterized by logical variables that are not present in a child random variable. We introduce a new data structure, aggregation parfactors, todescribe aggregation in directed first-order models. We show how to extend the C-FOVE algorithm to perform lifted inference in the presence of aggregation parfactors. There are cases where the polynomial time complexity (in the domain size of logical variables) of the C-FOVE algorithm can be reduced to logarithmic time complexity using aggregation parfactors.First-order models typically contain constraints on logical variables. Constraints are important for capturing knowledge regarding particular individuals. However, the impact of constraint processing on computational efficiency of liftedinference has been largely overlooked. In this thesis we develop an efficient algorithmfor counting the number of solutions to the constraint satisfaction problems encountered during lifted inference. We also compare, both theoretically and empirically, different ways of handling constraints during lifted inference.
Master's Student Supervision
Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.
In many real world datasets, we seek to make predictions about entities, where the entities are in classes that are interrelated. A commonly studied problem, known as the reference class problem, is how to combine information from relevant classes to make predictions about entities.The intersection of all classes that an entity is a member of constitutes the most specific class for that entity. When seeking to make predictions about such intersection classes for which we have not observed much (or any) data, we would like to combine information from more general classes to create a prior. If there is no data for the intersection, we would have to rely entirely on the prior. However, if data exists but is scarce, we seek to balance the prior with the available data. We first investigate a model where we assign weights to classes, and additively combine weights to make predictions. The use of regularisation forces generalisation; the signal gets pushed up to more general classes. To make a prediction for an unobserved intersection of classes, we would use the weights from the individual classes that comprise the intersection.We introduce several variants that average the predictions, as well as a probabilistic mix of these variants.We then propose a bounded ancestor method, which balances the creation of an informed prior with observed data for classes varying amounts of observations.When dealing with ordinal properties, such as shoe size, we can dynamically create new classes and subclasses in ways that are conducive to creating more informative priors. We do this by splitting the ordinal properties. Throughout, we test on the MovieLens and UCSD Fashion datasets.We found that a combination of the three bounded ancestor method variants resulted in the best performance, and the best combination varied between datasets. We found that a simple model that assigns weights to classes and additively makes predictions slightly outperformed the bounded ancestor method for supervised classification. For the bounded ancestor method, we found that splitting ordinal properties in different ways had minimal impact on the error metrics we used.
Landslides are movement of soil and rock under the influence of gravity. They are common phenomena that cause significant human and economic losses every year. To reduce the impact of landslides, experts have developed tools to identify areas that are more likely to generate landslides. We propose a novel statistical approach for predicting landslides using deep convolutional networks. Using a standardized dataset of georeferenced images consisting of slope, elevation, land cover, lithology, rock age, and rock family as inputs, we deliver a landslide susceptibility map as output. We call our model a Locally Aligned Convolutional Neural Network, LACNN, as it follows the ground surface at multiple scales to predict possible landslide occurrence for a single point. To validate our method, we compare it to several baselines, including linear regression, a neural network, and a convolutional network, using log-likelihood error and Receiver Operating Characteristic curves on the test set. We show that our model performs better than the other proposed baselines, suggesting that such deep convolutional models are effective in heterogenous datasets for improving landslide susceptibility maps, which has the potential to reduce the human and economic cost of these events.
Decisions made on water quality have high implications for diverse industries and general population. In a 2020 study, Guo et al. report that the current literature on modeling spatiotemporal variabilities in surface water quality at large scales across multiple catchments is very poor. In this thesis, we introduce a simple, explainable, and transparent machine learning model that is derived from linear regression with hierarchical features for efficient prediction and for anomaly detection on large scale spatiotemporal datasets. Our model learns offsets for various features in the dataset while utilizing a hierarchy among the features. These offsets can enable generalization and be used in anomaly detection. We show some interesting theoretical results on such hierarchical models. We built a water pollution platform for exploratory data analysis of water quality data in large scales. We evaluate the predictions of our model on the Waterbase - Water Quality dataset by the European Environmental Agency. We also investigate the explainability of our model. Finally, we investigate the performance of our model in classification tasks while analyzing its ability to do regularization and smoothing as the number of observations grows in the dataset.
Consider the following problem: given a database of records indexed by names (e.g., of companies or restaurants) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is called record linkage. Record linkage is a challenging problem because people do not consistently use the official title of a company, but use abbreviations, synonyms, different orders of terms, and the title can contain typos. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query, and find the best record(s). Our model addresses many of challenges of the record linkage problem and provides good results when exact term matching search algorithms fail. We evaluate our model on a large real-world data set. Obtained results show that the model is a promising probabilistic record linkage model.
In infrastructure planning, identifying ‘the best solution’ out of a given set of alternatives is a context-dependent multi-dimensional multi-stakeholder challenge in which competing criteria must be identified and trade-offs made. In a recent study, colleagues from Institute of Resources, Sustainability and Environment found that there is a need for a visualization tool that enables planners and decision makers to collectively explore individual preferences among those involved in the decision. This thesis concerns designing and evaluating an interactive visualization toolthat facilitates group decisions by making the problem analysis more participatory, transparent, and comprehensible. To do so, we extend the interactive visualization tool ValueCharts to create Group ValueCharts. We conducted studies withtwo different groups to evaluate the effectiveness of Group ValueCharts in group decision-making. The first group was university staff in leading positions in different departments, presently engaged in and responsible for water infrastructureplanning. The second group was employees of an analytics company who are involved in buying scientific software licenses. Each group was instructed on how to use the tool in application to their current decision problem. The discussions were audio recorded and the participants were surveyed to evaluate usability. The results indicate that participants felt the tool improved group interaction and information exchange, and made the discussion more participatory. Additionally, the participants strongly concur that the tool reveals disagreements and agreements within the group. These results suggest that Group ValueCharts has the ability to enhance transparency and comprehensibility in group decision-making.
In this work we investigate the problem of making personalized recommendations by creating models for predicting user-item rating, such as in movie recommendations. The study is based on the Movielens data set which has ratings on an ordinal scale. In the past, partly due to motivation gained by the Netflix challenge, researchers have constructed models that make point predictions to minimize the root mean square error (RMSE) on test sets, typically by learning latent user and movie feature structure. In such models, the difference between ratings of 2 and 3 stars is the same as the difference between ratings of 4 and 5 stars, etc., which is a strong prior assumption. We construct probabilistic models which also learn latent user and movie feature structure but do not make this assumption. These models interpret the ratings as categories (nominal and ordinal) and return a probability distribution over the ratings for each user-movie pair instead of making a point prediction. We evaluate and compare our models with other models for making personalized recommendations for the top-n task and comparing the precision vsrecall, receiver operating characteristic and cost curves. Our results show that our ordinal data model performs better than a nominal data model, a state-of-the-art point prediction model, and other baselines.
Aggregation is a technique for representing conditional probability distributions as an analytic function of parents. Logistic regression is a commonly used representation for aggregators in Bayesian belief networks when a child has multiple parents. In this thesis, we consider extending logistic regression to directed relational models, where there are objects and relations among them, and we want to model varying populations and interactions among parents. We first examine the representational problems caused by population variation. We show how these problems arise even in simple cases with a single parametrized parent, and propose a linear relational logistic regression which we show can represent arbitrary linear (in population size) decision thresholds, whereas the traditional logistic regression cannot. Then we examine representing interactions among the parents of a child node, and representing non-linear dependency on population size. We propose a multi-parent relational logistic regression which can represent interactions among parents and arbitrary polynomial decision thresholds. We compare our relational logistic regression to Markov logic networks and represent their analogies and differences. Finally, we show how other well-known aggregators can be represented using relational logistic regression.
Mining companies typically process all the material extracted from a mine site using processes which are extremely consumptive of energy and chemicals. Sorting this material more effectively would reduce the resources required. A high-throughput rock-sorting machine developed by MineSense™ Technologies Ltd. provides the sensors and diverting equipment. After receiving noisy sensor data, the sorting system has 400 ms to decide whether to activate the diverters which will divert the rocks into either a keep or a discard bin. The problem tackled in this thesis is to sort an unknown number of rocks by sensing their mineralogy, position, and size using electromagnetic sensors and diverting them according to how valuable the mineral is to the mine. In real-time we must interpret the sensor data and compute the best action to take. We model the problem with a relational influence diagram which shows relations between random variables, decision variables, and utility nodes. We learn the model offline and do online inference. Inference is achieved using a combination of exhaustive and random search. The model parameters are learned using Sequential Model-based Algorithm Configuration (SMAC). We simulate the diverters for offline evaluation and evaluate our solution on recorded sensor data. Our result improves over the current state-of-the-art across the entire range of utility.
This thesis concerns building probabilistic models with an underlying ontology that defines the classes and properties used in the model. In particular, it considers the problem of reasoning with properties that may not always be defined. Furthermore, we may even be uncertain about whether a property is defined for a given individual. One approach is to explicitly add a value "undefined" to the range of random variables, forming (what we call) extended belief networks. Adding an extra value to a random variable's range, however, has a large computational overhead. In this work, we propose an alternative, ontologically-based belief networks, where properties are only used when they are defined. We show how probabilistic reasoning can be carried out without explicitly using the value "undefined" during inference. This, in general, requires that we perform two probabilistic queries to determine (1) the probability that the hypothesis is defined and (2) the probabilities of the hypothesis given it is defined. We prove this is equivalent to reasoning with the corresponding extended belief network and empirically demonstrate on synthetic models that inference becomes more efficient.