Lang Wu

Prospective Graduate Students / Postdocs

This faculty member is currently not looking for graduate students or Postdoctoral Fellows. Please do not contact the faculty member with any such requests.


Research Classification

Research Interests

Longitudinal data analysis, mixed effects models, missing data, hypothesis testing, biostatistics

Relevant Thesis-Based Degree Programs

Affiliations to Research Centres, Institutes & Clusters

Research Options

I am available and interested in collaborations (e.g. clusters, grants).
I am interested in and conduct interdisciplinary research.
I am interested in working with undergraduate students on research projects.

Research Methodology

statistical methods, computer simulations, data analysis

Graduate Student Supervision

Doctoral Student Supervision

Dissertations completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest dissertations.

Joint modelling of complex longitudinal and survival data, with applications to HIV studies (2019)

In HIV vaccine studies, a major research objective is to identify immune responsebiomarkers measured longitudinally that may be associated with risk of HIV infection.This objective can be assessed via joint modelling of longitudinal and survivaldata. Joint models for HIV vaccine data are complicated by the following issues:(i) left censoring of some longitudinal data due to lower limits of quantification;(ii) mixed types of longitudinal variables; (iii) measurement errors, missing values,and outliers in longitudinal data; and (iv) computational challenges associatedwith likelihood inference. In this thesis, we propose innovative joint models andmethods for complex longitudinal and survival data to address the foregoing issuessimultaneously. Specifically, we consider two approaches to handle left censoreddata and a robust method to address b-outliers and e-outliers in longitudinal data.For parameter estimation, we propose approximate likelihood estimation methodsbased on so-called h-likelihood, which are computationally much more efficientthan “exact” or Monte Carlo methods such as Monte Carlo EM algorithms. Weevaluate the performances of the models and methods via comprehensive simulationstudies. Real data analyses are carried out in depth for a HIV vaccine study.

View record

Multivariate one-sided tests for multivariate normal and mixed effects regression models with missing data, semi-continuous data and censored data (2018)

In many applications, statistical models for real data often have natural constraints or restrictions on some model parameters. For example, the growth rate of a child is expected to be positive, and patients receiving anti-HIV treatments are expected to exhibit a decline in their viral loads. Hypothesis testing for certain model parameters incorporating the natural constraints is expected to be more powerful than testing ignoring the constraints. Although constrained statistical inference, especially multi-parameter order-restricted hypothesis testing, has been studied in the literature for several decades, methods for models for complex longitudinal data are still very limited. In this thesis, we develop innovative multi-parameter orderrestricted (or one-sided) hypothesis testing methods for modelling the following complex data: (1) multivariate normal data with non-ignorable missing values; (2) semi-continuous longitudinal data; and (3) left censored or truncated longitudinal data due to detection limits. We focus on testing mean parameters in the models, and the approaches are based on the likelihood methods. Some asymptotic results are obtained, and some computational challenges are discussed. Simulation studies are conducted to evaluate the proposed methods. Several real datasets are analyzed to illustrate the power advantages of proposed new tests.

View record

Joint Inference of NLME and GLMM Models with Informative Censoring (2015)

Non-linear mixed effects models (NLME) and generalized linear mixed effects models (GLMM)are commonly used to model longitudinal process. This thesis goes beyond the single processmodelling and focuses on jointly modelling multiple longitudinal processes with different typesof variables. In particular, we investigate methods on joint inference of NLME and GLMMmodels for the following three problems: (1) joint models of NLME and GLMM for completedata with NLME for the time-dependent mis-measured covariate and GLMM for discretelongitudinal response; (2) joint models with covariate subject to informative left censoring;and (3) joint models with informative right censoring with respect to both response andcovariate. For each problem, we propose two joint modelling methods to obtain "exact" andapproximate maximum likelihood estimates (MLEs) of all model parameters. Measurementerrors and missing data are addressed simultaneously in a unified way. Some asymptotic resultsare also developed. The proposed methods are illustrated with a HIV data. Simulation resultsshow that the joint modelling methods perform better than the commonly used naive methodand two-step method.

View record

Multivariate one-sided tests for multivariate normal and nonlinear mixed effects models with complete and incomplete data (2011)

Multivariate one-sided hypotheses testing problems arise frequently in practice. Various tests haven been developed for multivariate normal data. However only limited literatures are available for multivariate one-sided testing problems in regression models. In particular, one-sided tests for nonlinear mixed effects (NLME) models, whichare popular in many longitudinal studies, have not been studied yet, even in the cases of complete data. In practice, there are often missing values in multivariate data and longitudinal data. In this case, standard testing procedures based on complete data may not be applicable or may perform poorly if the observations that contain missing data are discarded. In this thesis, we propose testing methods for multivariate one-sided testing problems in multivariate normal distributions with missing data and for NLME models with complete and incomplete data. In the missing data case, testing methods are based on multiple imputations. Some theoretical results are presented. The proposedmethods are evaluated using simulations. Real data examples are presented to illustrate the methods.

View record

Master's Student Supervision

Theses completed in 2010 or later are listed below. Please note that there is a 6-12 month delay to add the latest theses.

Nonlinear mixed-effects models for HIV viral load trajectories before and after antiretroviral therapy interruption, incorporating left censoring (2021)

Longitudinal studies are common in biomedical research, such as an HIV study. In an HIV study, the viral decay during an anti-HIV treatment and the viral rebound after the treatment is interrupted can be viewed as two longitudinal processes, and they may be related to each other. In this thesis, we investigate if key features of HIV viral decay and CD4 trajectories during antiretroviral therapy (ART) are associated with characteristics of HIV viral rebound following ART interruption. Motivated by a real AIDS dataset, two non-linear mixed effects (NLME) models are used to model the viral load trajectories before and following ART interruption, respectively, incorporating left censoring due to lower detection limits of viral load assays. A linear mixed effects (LME) model is used to model CD4 trajectories. The models may be linked through shared random effects, since these random effects reflect individual characteristics of the longitudinal processes. A stochastic approximation EM (SAEM) method is used for parameter estimation and inference. To reduce the computation burden associated with maximizing the joint likelihood, an easy-to-implement three-step (TS) method is proposed by using SAEM algorithm and bootstrap. Data analysis results show that some key features of viral load and CD4 trajectories during ART (e.g., viral decay rate) are significantly associated with important characteristics of viral rebound following ART interruption (e.g., viral set point). Simulation studies are conducted to evaluate the performances of the proposed TS method and the naive method, which still uses SAEM algorithm but substitutes the censored viral load values with half the detection limit and without bootstrap. It is concluded that the proposed TS method outperforms the naive method.

View record

Exploring inverse probability weighted per-protocol estimates to adjust for non-adherence using post-randomization covariates: a simulation study (2020)

In pragmatic trials, treatment strategies are randomly assigned at baseline, but patients may not adhere to their assigned treatments during follow-up. In the presence of non-adherence, we aim to compare the conventionally used analyses (e.g. intention-to-treat (ITT) and naive per-protocol (PP)) with inverse probability weighted (IPW) and baseline adjusted PP analyses. We have conducted comprehensive simulation studies to generate realistic two-armed pragmatic trial data with a baseline covariate and post-randomization time-varying covariates. Our simulation was applied to understand the impact of trial characteristics (e.g., nonadherence rates, event rates, trial size), varying the causal relationships (e.g., if the baseline covariate is unmeasured or a risk factor), and varying the measurement schedule for adherence rates and time-varying covariates in the follow-up period. We also assessed the key statistical properties of these estimators. In the presence of non-adherence, our results suggest that ITT, IPW-PP and baseline adjusted PP estimates can recover the true null treatment effect. For non-null treatment effects, only the IPW-PP and baseline adjusted estimates were reasonably unbiased. If adherence and time-varying covariates are assessed less frequently, the bias and variability of effect estimates increase. This study demonstrates the feasability of using adjusted PP estimates to recover the true effect of treatment in the presence of non-adherence and the necessity of designing pragmatic trials that measure both pre-and-post-randomization covariates to reduce bias in the estimation of the treatment effect.

View record

Jointly modeling longitudinal process with measurement errors, missing data, and outliers (2013)

In many longitudinal studies, several longitudinal processes may be associated. For example, a time-dependent covariate in a longitudinal model may be measured with errors or have missing data, so it needs to be modeled together with the response process in order to address the measurement errors and missing data. In such cases, a joint inference is appealing since it can incorporate information of all processes simultaneously. The joint inference is not only more efficient than separate inferences but it may also avoid possible biases. In addition, longitudinal data often contain outliers, so robust methods for the joint models are necessary. In this thesis, we discuss joint models for two correlated longitudinal processes with measurement errors, missing data, and outliers. We consider two-step methods and joint likelihood methods for joint inference, and propose robust methods based on M-estimators to address possible outliers for joint models. Simulation studies are conducted to evaluate the performances of the proposed methods, and a real AIDS dataset is analyzed using the proposed methods.

View record

Two-Step and Jointliklihood Methods for Joint Models (2012)

Survival data often arise in longitudinal studies, and the survival process and the longitudinal process may be related to each other. Thus, it is desirable to jointly model the survival process and the longitudinal process to avoid possible biased and inefficient inferences from separate inferences. We consider mixed effects models (LME, GLMM, and NLME models) for the longitudinal process, and Cox models and accelerated failure time (AFT) models for the survival process. The survival model and the longitudinal model are linked through shared parameters or unobserved variables. We consider joint likelihood method and two-step methods to make joint inference for the survival model and the longitudinal model. We have proposed linear approximation methods to joint models with GLMM and NLME submodels to reduce computation burden and use existing software. Simulation studies are conducted to evaluate the performances of the joint likelihood method and two-step methods. It is concluded that the joint likelihood method outperforms the two-step methods.

View record

Approximate methods for joint models in longitudinal studies (2010)

Longitudinal studies often contain several statistical issues, suchas longitudinal process and time-to-event process, the associationamong which requires joint modeling strategy.We firstly review the recent researches on the joint modeling topic. After that, four popular inference methods are introduced for jointly analyzing longitudinal data and time-to-event data based on a combination of typical parametric models. However, some of them may suffer from non-ignorable bias of the estimators. Others may be computationally intensive or even lead to convergence problems.In this thesis, we propose an approximate likelihood-based simultaneous inference method for jointly modeling longitudinalprocess and time-to-event process with covariate measurement errors problem. By linearizing the joint model, we design a strategy for updating the random effects that connect the two processes, and propose two algorithm frameworks for different scenarios of joint likelihood function. Both frameworks approximate the multidimensional integral in the observed-data joint likelihood by analytic expressions, which greatly reduce the computational intensity of the complex joint modeling problem.We apply this new method to a real dataset along with some available methods. The inference result provided by our new method agrees with those from other popular methods, and makes sensible biological interpretation. We also conduct a simulation study for comparing these methods. Our new method looks promising in terms of estimation precision, as well as computation efficiency, especially when more subjects are given. Conclusions and discussions for future research are listed in the end.

View record

Joint inference for longitudinal and survival data with incomplete time-dependent covariates (2010)

In many longitudinal studies, individual characteristics associated with their repeated measures may be covariates for the time to an event of interest. Thus, it is desirable to model both the survival process and the longitudinal process together. Statistical analysis may be complicated with missing data or measurement errors in the time-dependent covariates. This thesis considers a nonlinearmixed-effects model for the longitudinal process and the Cox proportional hazards model for the survival process. We provide a method based on the joint likelihood for nonignorable missing data, and we extend the method to the case of time-dependent covariates. We adapt a Monte Carlo EM algorithm to estimate the model parameters. We compare the method with the existing two-step method with some interesting findings. A real example from a recent HIV study is used as an illustration.

View record

Wood Property Relationships and Survival Models in Reliability (2010)

It has been a topic of great interest in wood engineering tounderstand the relationships between the different strengthproperties of lumber and the relationships between the strengthproperties and covariates such as visual grading characteristics. Inour mechanical wood strength tests, each piece fails (breaks) aftersurviving a continuously increasing load to a level. The response ofthe test is the wood strength property --load-to-failure, which is in a verydifferent context from the standardtime-to-failure data in Biostatistics. Thistopic is also called reliability analysis inengineering.In order to describe the relationships among strength properties, wedevelop joint and conditional survival functions by both aparametric method and anonparametric approach. However,each piece of lumber can only be tested to destruction with onemethod, which makes modeling these joint strengths distributionschallenging. In the past, this kind of problem has been solved bysubjectively matching pieces of lumber, but the quality of thisapproach is then an issue.We apply the methodologies in survival analysis to the wood strengthdata collected in the FPInnovations (FPI) laboratory. The objectiveof the analysis is to build a predictive model that relates thestrength properties to the recorded characteristics (i.e. a survivalmodel in reliability). Our conclusion is that a type of wood defect(knot), a lumber grade status (off-grade: Yes/No) and a lumber'smodule of elasticity (moe) have statistically significant effects onwood strength. These significant covariates can be used to matchpieces of lumber. This paper also supports use of the acceleratedfailure time (AFT) model as an alternative to the Coxproportional hazard (Cox PH) model in the analysis ofsurvival data. Moreover, we conclude that the Weibull AFT modelprovides a much better fit than the Cox PH model in our data setwith a satisfying predictive accuracy.

View record


Membership Status

Member of G+PS
View explanation of statuses

Program Affiliations

Academic Unit(s)


If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Follow these steps to apply to UBC Graduate School!