Harry Sue Wah Joe
Relevant Degree Programs
Complete these steps before you reach out to a faculty member!
- Familiarize yourself with program requirements. You want to learn as much as possible from the information available to you before you reach out to a faculty member. Be sure to visit the graduate degree program listing and program-specific websites.
- Check whether the program requires you to seek commitment from a supervisor prior to submitting an application. For some programs this is an essential step while others match successful applicants with faculty members within the first year of study. This is either indicated in the program profile under "Requirements" or on the program website.
- Identify specific faculty members who are conducting research in your specific area of interest.
- Establish that your research interests align with the faculty member’s research interests.
- Read up on the faculty members in the program and the research being conducted in the department.
- Familiarize yourself with their work, read their recent publications and past theses/dissertations that they supervised. Be certain that their research is indeed what you are hoping to study.
- Compose an error-free and grammatically correct email addressed to your specifically targeted faculty member, and remember to use their correct titles.
- Do not send non-specific, mass emails to everyone in the department hoping for a match.
- Address the faculty members by name. Your contact should be genuine rather than generic.
- Include a brief outline of your academic background, why you are interested in working with the faculty member, and what experience you could bring to the department. The supervision enquiry form guides you with targeted questions. Ensure to craft compelling answers to these questions.
- Highlight your achievements and why you are a top student. Faculty members receive dozens of requests from prospective students and you may have less than 30 seconds to pique someone’s interest.
- Demonstrate that you are familiar with their research:
- Convey the specific ways you are a good fit for the program.
- Convey the specific ways the program/lab/faculty member is a good fit for the research you are interested in/already conducting.
- Be enthusiastic, but don’t overdo it.
G+PS regularly provides virtual sessions that focus on admission requirements and procedures and tips how to improve your application.
Great Supervisor Week Mentions
My supervisors are extremely supportive. Natalia is a wonderful supervisor as she has a level of emotional intelligence that I have not seen in most people. Both her and Harry have always put my goals at the forefront and helped me achieve them. Their doors have always been open to me and they have been extremely generous with their time and their advice. I have learned so much from working with them throughout my time in my master's program. It has been an immense pleasure and I look forward to having an opportunity to collaborate with them in the future.
Graduate Student Supervision
Doctoral Student Supervision (Jan 2008 - Nov 2019)
Copulas are widely used in high-dimensional multivariate applications where the assumption of Gaussian distributed variables does not hold. Vine copulas are a flexible family of copulas built from a sequence of bivariate copulas to represent bivariate dependence and bivariate conditional dependence. The vine structures consist of a hierarchy of trees to express conditional dependence. The contributions of this thesis are(a) improved methods for finding parsimonious truncated vine structures when the number of variables is moderate to large;(b) diagnostic methods to help in decisions for bivariate copulas in the vine; (c) applications to predictions based on conditional distributions of the vine copula.The vine structure learning problem has been challenging due to the large search space. Existing methods are based on greedy algorithms and do not in general produce a solution that is near the global optimum. It is an open problem to choose a good truncated vine structure when there are many variables. We propose a novel approach to learning truncated vine structures using Monte Carlo tree search, a method that has been widely adopted in game and planning problems. The proposed method has significantly better performance over the existing methods under various experimental setups.Moreover, diagnostic methods based on measures of dependence and tail asymmetry are proposed to guide the choice of parametric bivariate copula families assigned to the edges of the trees in the vine and to assess whether a copula is constant over the conditioning value(s) for trees 2 and higher. If the diagnostic methods suggest the existence of reflection asymmetry, permutation asymmetry, or asymmetric tail dependence, then three- or four-parameter bivariate copula families might be needed. If the conditional dependence measures or asymmetry measures in trees 2 and up are not constant over the conditioning value(s), then non-constant copulas with parameters varying over conditioning values should be considered. Finally, for data from an observational study, we propose a vine copula regression method that uses regular vines and handles mixed continuous and discrete variables. This method can efficiently compute the conditional distribution of the response variable given the explanatory variables.
Forecasts of extreme events are useful in order to prepare for disaster. Such forecasts are usefully communicated as an upper quantile function, and in the presence of predictors, can be estimated using quantile regression techniques. This dissertation proposes methodology that seeks to produce forecasts that (1) are consistent in the sense that the quantile functions are valid (non-decreasing); (2) are flexible enough to capture the dependence between the predictors and the response; and (3) can reliably extrapolate into the tail of the upper quantile function. To address these goals, a family of proper scoring rules is first established that measure the goodness of upper quantile function forecasts. To build a model of the conditional quantile function, a method that uses pair-copula Bayesian networks or vine copulas is proposed. This model is fit using a new class of estimators called the composite nonlinear quantile regression (CNQR) family of estimators, which optimize the scores from the previous scoring rules. In addition, a new parametric copula family is introduced that allows for a non-constant conditional extreme value index, and another parametric family is introduced that reduces a heavy-tailed response to a light tail upon conditioning. Taken together, this work is able to produce forecasts satisfying the three goals. This means that the resulting forecasts of extremes are more reliable than other methods, because they more adequately capture the insight that predictors hold on extreme outcomes. This work is applied to forecasting extreme flows of the Bow River at Banff, Alberta, for flood preparation, but can be used to forecast extremes of any continuous response when predictors are present.
Statistical models with parsimonious dependence are useful for high-dimensional modelling as they offer interpretations relevant to the data being fitted and may be computationally more manageable. We propose parsimonious models for multivariate extremes; in particular, extreme value (EV) copulas with factor and truncated vine structures are developed, through (a) taking the EV limit of a factor copula, or (b) structuring the underlying correlation matrix of existing multivariate EV copulas. Through data examples, we demonstrate that these models allow interpretation of the respective structures and offer insight on the dependence relationship among variables. The strength of pairwise dependence for extreme value copulas can be described using the extremal coefficient. We consider a generalization of the F-madogram estimator for the bivariate extremal coefficient to the estimation of tail dependence of an arbitrary bivariate copula. This estimator is tail-weighted in the sense that the joint upper or lower portion of the copula is given a higher weight than the middle, thereby emphasizing tail dependence. The proposed estimator is useful when tail heaviness plays an important role in inference, so that choosing a copula with matching tail properties is essential. Before using a fitted parsimonious model for further analysis, diagnostic checks should be done to ensure that the model is adequate. Bivariate extremal coefficients have been used for diagnostic checking of multivariate extreme value models. We investigate the use of an adequacy-of-fit statistic based on the difference between low-order empirical and model-based features (dependence measures), including the extremal coefficient, for this purpose. The difference is computed for each of the bivariate margins and a quadratic form statistic is obtained, with large values relative to a high quantile of the reference distribution suggesting model inadequacy. We develop methods to determine the appropriate cutoff values for various parsimonious models, dimensions, dependence measures and methods of model fitting that reflect practical situations. Data examples show that these diagnostic checks are handy complements to existing model selection criteria such as the AIC and BIC, and provide the user with some idea about the quality of the fitted models.
In this dissertation we propose factor copula models where dependence is modeled via one or several common factors. These are general conditional independence models for $d$ observed variables, in terms of $p$ latent variables and the classical multivariate normal model with a correlation matrix having a factor structure is a special case. We also propose and investigate dependence properties of the extended models that we call structured factor copula models. The extended models are suitable for modeling large data sets when variables can be split into non-overlapping groups such that there is homogeneous dependence within each group. The models allow for different types of dependence structure including tail dependence and asymmetry. With appropriate numerical methods, efficient estimation of dependence parameters is possible for data sets with over 100 variables. The choice of copula is essential in the models to get correct inferences in the tails. We propose lower and upper tail-weighted bivariate measures of dependence as additional scalar measures to distinguish bivariate copulas with roughly the same overall monotone dependence. These measures allow the efficient estimation of strength of dependence in the joint tails and can be used as a guide for selection of bivariate linking copulas in factor copula models as well as for assessing the adequacy of fit of multivariate copula models. We apply the structured factor copula models to analyze financial data sets, and compare with other copula models for tail inference. Using model-based interval estimates, we find that some commonly used risk measures may not be well discriminated by copula models, but tail-weighted dependence measures can discriminate copula models with different dependence and tail properties.
Overlooking non-Gaussian and tail dependence phenomena has emerged as an important reason of underestimating aggregate financial or insurance risks. For modeling the dependence structures between non-Gaussian random variables, the concept of copula plays an important role and provides practitioners with promising quantitative tools. In order to study copula families that have different tail patterns and tail asymmetry than multivariate Gaussian and t copulas, we introduce the concepts of tail order and tail order functions. These provide a unified way to study three types of dependence in the tails: tail dependence, intermediate tail dependence and tail orthant independence. Some fundamental properties of tail order and tail order functions are obtained. For multivariate Archimedean copulas, we relate the tail heaviness of a positive random variable to the tail behavior of the Archimedean copula constructed by the Laplace transform of the random variable.Quantitative risk measurements pay more attention on large losses. A good statistical approach for the whole data does not guarantee a good way for risk assessments. We use tail comonotonicity as a conservative dependence structure for modeling multivariate dependent losses. By this way, we do not lose too much accuracy but gain reasonable conservative risk measures, especially when we consider high-risk scenarios. We have conducted a thorough investigation on the properties and constructions of tail comonotonicity, and found interesting properties such as asymptotic additivity properties of risk measures. Sufficient conditions have also been obtained to justify the conservativity of tail comonotonicity.For large losses, tail behavior of loss distributions is more critical than the whole distributions. Asymptotic study assuming that each marginal risk goes to infinity is more mathematically tractable. However, the asymptotic study that leads to a first order approximation is only a crude way and may not be sufficient. To this end, we study the second order conditions for risk measures of sub-extremal multiple risks. Some relationships between Value at Risk and Conditional Tail Expectation have been obtained under the condition of Second Order Regular Variation. We also find that the second order parameter determines whether a higher order approximation is necessary.
Master's Student Supervision (2010 - 2018)
Credit rating is an ordinal categorical label that serves as an important measure ofa financial institution’s credit worthiness. It is frequently used to decide whether ornot to grant loans as well as how much interest to charge. Companies with highercredit ratings often enjoy lower interest rate and more flexibility in obtaining loans.Due to the increased competition in the lending market, there is renewed interestin the business community in applying statistical and machine learning methodsto assign credit ratings. The challenge of adapting and generalizing these methodsoften lies in understanding and interpreting them in addition to matching ratingsaccurately.Our goal is to compare the classification performance and interpretability offour statistical learning methods on a credit rating dataset from the industry, wherethe rating variable comes from human expert opinions. We fit the ordinal regression,ordinal gradient boosting, multinomial gradient boosting and random forestmethods with the goal of finding an interpretable method that can replicate the humanexpert ratings as closely as possible. We find that while the linear ordinalregression is the most interpretable, it fails to achieve high classification accuracyduring cross-validation. Furthermore, the ordinal models (ordinal regression andordinal gradient boosting) produce significant amount of negative fitted probabilitiesin practice due to the lack of numerical constraints. While ordinal gradientboosting and random forest perform the best in our three measures of classificationaccuracy: perfect match rate, within one-class match rate and 80% prediction intervals,ordinal gradient boosting produces high proportions of negative values andnon-unimodality in the fitted probability mass function. Thus we choose randomforest as the most preferred method and focus on its interpretation using variable importance ranking, partial derivative of the probability mass function and cumulativeprobability function, as well as local interpretable model-agnostic explanationplots.