Harry Sue Wah Joe

 
Prospective Graduate Students / Postdocs

This faculty member is currently not looking for graduate students or Postdoctoral Fellows. Please do not contact the faculty member with any such requests.

Professor

Research Classification

Research Interests

Statistics and Probabilities
copula construction
dependence modelling
extreme value inference
non-normal time series
parsimonous high-dimensional dependence

Relevant Thesis-Based Degree Programs

 
 

Research Methodology

Statistics: modern multivariate statistics beyond multivariate normal and exponential family, estimating equations
Mathematics: probability (working with stochastic representations and asymptotics), linear algebra, logical reasoning for proofs.
Computing: numerical optimization and integration, simulation, pseudo-code, algorithms; R, C++, Fortran90 for scientific computing
Application areas such as statistical finance, insurance, psychometrics.

Great Supervisor Week Mentions

Each year graduate students are encouraged to give kudos to their supervisors through social media and our website as part of #GreatSupervisorWeek. Below are students who mentioned this supervisor since the initiative was started in 2017.

 

My supervisors are extremely supportive. Natalia is a wonderful supervisor as she has a level of emotional intelligence that I have not seen in most people. Both her and Harry have always put my goals at the forefront and helped me achieve them. Their doors have always been open to me and they have been extremely generous with their time and their advice. I have learned so much from working with them throughout my time in my master's program. It has been an immense pleasure and I look forward to having an opportunity to collaborate with them in the future.

Abdullah Farouk (2019)

 

Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

Conditional inferences and predictions based on copula models (2023)

Copulas combined with univariate distributions are a flexible tool for modeling distributions beyond Gaussian. Vine copulas based on a nested sequence of trees and a sequence of bivariate copulas can be used to construct high-dimensional copula models with flexible dependence structures. A multivariate model based on vine copulas assumes that variables are observed simultaneously in a sample. The contributions of this thesis are the new conditional inference and prediction methods based on vine copulas, including: (a) conditional distribution of one variable given others, (b) conditional distribution when the response variable is right-censored, (c) conditional distribution when some explanatory variables are nominal categorical. For (a), an algorithm is developed to compute arbitrary conditional distributions of one variable given the others for cross prediction from a single joint distribution fitted by vine copulas. An existing algorithm is also modified to simulate data from a vine copula given that one variable takes extreme values. For (b), in time-to-event and survival studies, the goal is to model the right-censored response variable with the explanatory variables to obtain point and interval predictions. The existing vine copula regression methodology is extended with a censored response variable and a set of discrete or continuous explanatory variables. For (c), for a nominal variable with three or more unordered categories, there is a PMF but no CDF. For use within vine copulas, the nominal variable is either converted to an ordinal variable, or encoded as binary dummy variables, similar to other regression models. The existing vine copula regression method is extended to allow some of the explanatory variables to be binary dummy variables with positive dependence converted from nominal variables. When fitting copula models with previous settings, there can be pairs of mixed continuous-discrete variables on the edges of a vine. The existing diagnostic methods for two continuous variables are not valid, and new diagnostic methods are developed. When parametric copula families do not provide adequate fits, nonparametric copulas can be used with adaptations for mixed continuous-ordinal variables. Allowing nonparametric copulas for mixed continuous-ordinal variables can improve the performance of vine copulas when applied to conditional inference.

View record

Vine copulas: dependence structure learning, diagnostics, and applications to regression analysis (2019)

Copulas are widely used in high-dimensional multivariate applications where the assumption of Gaussian distributed variables does not hold. Vine copulas are a flexible family of copulas built from a sequence of bivariate copulas to represent bivariate dependence and bivariate conditional dependence. The vine structures consist of a hierarchy of trees to express conditional dependence. The contributions of this thesis are(a) improved methods for finding parsimonious truncated vine structures when the number of variables is moderate to large;(b) diagnostic methods to help in decisions for bivariate copulas in the vine; (c) applications to predictions based on conditional distributions of the vine copula.The vine structure learning problem has been challenging due to the large search space. Existing methods are based on greedy algorithms and do not in general produce a solution that is near the global optimum. It is an open problem to choose a good truncated vine structure when there are many variables. We propose a novel approach to learning truncated vine structures using Monte Carlo tree search, a method that has been widely adopted in game and planning problems. The proposed method has significantly better performance over the existing methods under various experimental setups.Moreover, diagnostic methods based on measures of dependence and tail asymmetry are proposed to guide the choice of parametric bivariate copula families assigned to the edges of the trees in the vine and to assess whether a copula is constant over the conditioning value(s) for trees 2 and higher. If the diagnostic methods suggest the existence of reflection asymmetry, permutation asymmetry, or asymmetric tail dependence, then three- or four-parameter bivariate copula families might be needed. If the conditional dependence measures or asymmetry measures in trees 2 and up are not constant over the conditioning value(s), then non-constant copulas with parameters varying over conditioning values should be considered. Finally, for data from an observational study, we propose a vine copula regression method that uses regular vines and handles mixed continuous and discrete variables. This method can efficiently compute the conditional distribution of the response variable given the explanatory variables.

View record

Forecasting of nonlinear extreme quantiles using copula models (2017)

Forecasts of extreme events are useful in order to prepare for disaster. Such forecasts are usefully communicated as an upper quantile function, and in the presence of predictors, can be estimated using quantile regression techniques. This dissertation proposes methodology that seeks to produce forecasts that (1) are consistent in the sense that the quantile functions are valid (non-decreasing); (2) are flexible enough to capture the dependence between the predictors and the response; and (3) can reliably extrapolate into the tail of the upper quantile function. To address these goals, a family of proper scoring rules is first established that measure the goodness of upper quantile function forecasts. To build a model of the conditional quantile function, a method that uses pair-copula Bayesian networks or vine copulas is proposed. This model is fit using a new class of estimators called the composite nonlinear quantile regression (CNQR) family of estimators, which optimize the scores from the previous scoring rules. In addition, a new parametric copula family is introduced that allows for a non-constant conditional extreme value index, and another parametric family is introduced that reduces a heavy-tailed response to a light tail upon conditioning. Taken together, this work is able to produce forecasts satisfying the three goals. This means that the resulting forecasts of extremes are more reliable than other methods, because they more adequately capture the insight that predictors hold on extreme outcomes. This work is applied to forecasting extreme flows of the Bow River at Banff, Alberta, for flood preparation, but can be used to forecast extremes of any continuous response when predictors are present.

View record

Models and Diagnostics for Parsimonious Dependence with Applications to Multivariate Extremes (2017)

Statistical models with parsimonious dependence are useful for high-dimensional modelling as they offer interpretations relevant to the data being fitted and may be computationally more manageable. We propose parsimonious models for multivariate extremes; in particular, extreme value (EV) copulas with factor and truncated vine structures are developed, through (a) taking the EV limit of a factor copula, or (b) structuring the underlying correlation matrix of existing multivariate EV copulas. Through data examples, we demonstrate that these models allow interpretation of the respective structures and offer insight on the dependence relationship among variables. The strength of pairwise dependence for extreme value copulas can be described using the extremal coefficient. We consider a generalization of the F-madogram estimator for the bivariate extremal coefficient to the estimation of tail dependence of an arbitrary bivariate copula. This estimator is tail-weighted in the sense that the joint upper or lower portion of the copula is given a higher weight than the middle, thereby emphasizing tail dependence. The proposed estimator is useful when tail heaviness plays an important role in inference, so that choosing a copula with matching tail properties is essential. Before using a fitted parsimonious model for further analysis, diagnostic checks should be done to ensure that the model is adequate. Bivariate extremal coefficients have been used for diagnostic checking of multivariate extreme value models. We investigate the use of an adequacy-of-fit statistic based on the difference between low-order empirical and model-based features (dependence measures), including the extremal coefficient, for this purpose. The difference is computed for each of the bivariate margins and a quadratic form statistic is obtained, with large values relative to a high quantile of the reference distribution suggesting model inadequacy. We develop methods to determine the appropriate cutoff values for various parsimonious models, dimensions, dependence measures and methods of model fitting that reflect practical situations. Data examples show that these diagnostic checks are handy complements to existing model selection criteria such as the AIC and BIC, and provide the user with some idea about the quality of the fitted models.

View record

Structured Factor Copulas and Tail Inference (2014)

In this dissertation we propose factor copula models where dependence is modeled via one or several common factors. These are general conditional independence models for $d$ observed variables, in terms of $p$ latent variables and the classical multivariate normal model with a correlation matrix having a factor structure is a special case. We also propose and investigate dependence properties of the extended models that we call structured factor copula models. The extended models are suitable for modeling large data sets when variables can be split into non-overlapping groups such that there is homogeneous dependence within each group. The models allow for different types of dependence structure including tail dependence and asymmetry. With appropriate numerical methods, efficient estimation of dependence parameters is possible for data sets with over 100 variables. The choice of copula is essential in the models to get correct inferences in the tails. We propose lower and upper tail-weighted bivariate measures of dependence as additional scalar measures to distinguish bivariate copulas with roughly the same overall monotone dependence. These measures allow the efficient estimation of strength of dependence in the joint tails and can be used as a guide for selection of bivariate linking copulas in factor copula models as well as for assessing the adequacy of fit of multivariate copula models. We apply the structured factor copula models to analyze financial data sets, and compare with other copula models for tail inference. Using model-based interval estimates, we find that some commonly used risk measures may not be well discriminated by copula models, but tail-weighted dependence measures can discriminate copula models with different dependence and tail properties.

View record

Multivariate Extremal Dependence and Risk Measures (2012)

Overlooking non-Gaussian and tail dependence phenomena has emerged as an important reason of underestimating aggregate financial or insurance risks. For modeling the dependence structures between non-Gaussian random variables, the concept of copula plays an important role and provides practitioners with promising quantitative tools. In order to study copula families that have different tail patterns and tail asymmetry than multivariate Gaussian and t copulas, we introduce the concepts of tail order and tail order functions. These provide a unified way to study three types of dependence in the tails: tail dependence, intermediate tail dependence and tail orthant independence. Some fundamental properties of tail order and tail order functions are obtained. For multivariate Archimedean copulas, we relate the tail heaviness of a positive random variable to the tail behavior of the Archimedean copula constructed by the Laplace transform of the random variable.Quantitative risk measurements pay more attention on large losses. A good statistical approach for the whole data does not guarantee a good way for risk assessments. We use tail comonotonicity as a conservative dependence structure for modeling multivariate dependent losses. By this way, we do not lose too much accuracy but gain reasonable conservative risk measures, especially when we consider high-risk scenarios. We have conducted a thorough investigation on the properties and constructions of tail comonotonicity, and found interesting properties such as asymptotic additivity properties of risk measures. Sufficient conditions have also been obtained to justify the conservativity of tail comonotonicity.For large losses, tail behavior of loss distributions is more critical than the whole distributions. Asymptotic study assuming that each marginal risk goes to infinity is more mathematically tractable. However, the asymptotic study that leads to a first order approximation is only a crude way and may not be sufficient. To this end, we study the second order conditions for risk measures of sub-extremal multiple risks. Some relationships between Value at Risk and Conditional Tail Expectation have been obtained under the condition of Second Order Regular Variation. We also find that the second order parameter determines whether a higher order approximation is necessary.

View record

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

Statistical and machine learning classification methods for credit ratings (2018)

Credit rating is an ordinal categorical label that serves as an important measure ofa financial institution’s credit worthiness. It is frequently used to decide whether ornot to grant loans as well as how much interest to charge. Companies with highercredit ratings often enjoy lower interest rate and more flexibility in obtaining loans.Due to the increased competition in the lending market, there is renewed interestin the business community in applying statistical and machine learning methodsto assign credit ratings. The challenge of adapting and generalizing these methodsoften lies in understanding and interpreting them in addition to matching ratingsaccurately.Our goal is to compare the classification performance and interpretability offour statistical learning methods on a credit rating dataset from the industry, wherethe rating variable comes from human expert opinions. We fit the ordinal regression,ordinal gradient boosting, multinomial gradient boosting and random forestmethods with the goal of finding an interpretable method that can replicate the humanexpert ratings as closely as possible. We find that while the linear ordinalregression is the most interpretable, it fails to achieve high classification accuracyduring cross-validation. Furthermore, the ordinal models (ordinal regression andordinal gradient boosting) produce significant amount of negative fitted probabilitiesin practice due to the lack of numerical constraints. While ordinal gradientboosting and random forest perform the best in our three measures of classificationaccuracy: perfect match rate, within one-class match rate and 80% prediction intervals,ordinal gradient boosting produces high proportions of negative values andnon-unimodality in the fitted probability mass function. Thus we choose randomforest as the most preferred method and focus on its interpretation using variable importance ranking, partial derivative of the probability mass function and cumulativeprobability function, as well as local interpretable model-agnostic explanationplots.

View record

 

Membership Status

Member of G+PS
View explanation of statuses

Program Affiliations

Academic Unit(s)

 

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.

 
 

Explore our wide range of course-based and research-based program options!