Jiahua Chen

Prospective Graduate Students / Postdocs

This faculty member is currently not looking for graduate students or Postdoctoral Fellows. Please do not contact the faculty member with any such requests.


Research Classification

Research Interests

asymptotic theory
empirical likelihood
finite mixture model
sample survey

Relevant Degree Programs


Graduate Student Supervision

Doctoral Student Supervision (Jan 2008 - May 2021)
Sequential ED-design for binary dose-response experiments (2018)

Dose-response experiments and subsequent data analyses are often carried out according to optimal designs for the purpose of accurately determining a specific effective dose (ED) level. If the interest is the dose-response relationship over a range of ED levels, many existing optimal designs are not accurate. In this dissertation, we propose a new design procedure, called two-stage sequential ED-design, which directly and simultaneously targets several ED levels. We use a small number of trials to provide a tentative estimation of the model parameters. The doses of the subsequent trials are then selected sequentially, based on the latest model information, to maximize the efficiency of the ED estimation over several ED levels. Although the commonly used logistic and probit models are convenient summaries of the dose-response relationship, they can be too restrictive. We introduce and study a more flexible albeit slightly more complex three-parameter logistic dose-response model. We explore the effectiveness of the sequential ED-design and the D-optimal design under this model, and develop an effective model fitting strategy. We develop a two-step iterative algorithm to compute the maximum likelihood estimate of the model parameters. We prove that the algorithm iteration increases the likelihood value, and therefore will lead to at least a local maximum of the likelihood function. We also study the numerical solution to the D-optimal design for the three-parameter logistic model. Interestingly, all our numerical solutions to the D-optimal design are three-point-support distributions.We also discuss the use of the ED-design when experimental subjects become available in groups. We introduce the group sequential ED-design, and demonstrate how to construct this design. The ED-design has a natural extension to more complex model and can satisfy a broad range of the demands that may arise in applications.

View record

On Dual Empirical Likelihood Inference under Semiparametric Density Ratio Models in the Presence of Multiple Samples: With Applications to Long Term Monitoring of Lumber Quality (2014)

Maintaining a high quality of lumber products is of great social and economic importance. This thesis develops theories as part of a research program aimed at developing a long term program for monitoring change in the strength of lumber. These theories are motivated by two important tasks of the monitoring program, testing for change in strength populations of lumber produced over the years and making statistical inference on strength populations based on Type I censored lumber samples. Statistical methods for these inference tasks should ideally be efficient and nonparametric. These desiderata lead us to adopt a semiparametric density ratio model to pool the information across multiple samples and use the nonparametric empirical likelihood as the tool for statistical inference.We develop a dual empirical likelihood ratio test for composite hypotheses about the parameter of the density ratio model based on independent samples from different populations. This test encompasses testing differences in population distributions as a special case. We find the proposed test statistic to have a classical chi-square null limiting distribution. We also derive the power function of the test under a class of local alternatives. It reveals that the local power is often increased when strength is borrowed from additional samples even when their underlying distributions are unrelated to the hypothesis of interest. Simulation studies show that this test has better power properties than all potential competitors adopted to the multiple sample problem under the investigation, and is robust to model misspecification. The proposed test is then applied to assess strength properties of lumber with intuitively reasonable implications for the forest industry.We also establish a powerful inference framework for performing empirical likelihood inference under the density ratio model when Type I censored samples are present. This inference framework centers on the maximization of a concave dual partial empirical likelihood function, and features an easy computation. We study the properties of this dual partial empirical likelihood, and find its corresponding likelihood ratio test to have a simple chi-square limiting distribution under the null model and a non-central chi-square limiting distribution under local alternatives.

View record

Applications of penalized likelihood methods for feature selection in statistical modeling (2012)

Feature selection plays a pivotal role in knowledge discovery and contemporary scientific research. Traditional best subset selection or stepwise regression can be computationallyexpensive or unstable in the selection process, and so various penalized likelihood methods (PLMs) have received much attention in recent decades. In this dissertation, we develop approaches based on PLMs to deal with the issues of feature selection arising from several application fields.Motivated by genomic association studies, we first address feature selection in ultra-high-dimensional situations, where the number of candidate features can be huge. Reducing the dimension of the data is essential in such situations. We propose a novel screening approach via the sparsity-restricted maximum likelihood estimator that removes most of the irrelevant features before the formal selection. The model after screening serves as an excellent starting point for the use of PLMs. We establish the screening and selection consistency of the proposed method and develop efficient algorithms for its implementation.We next turn our attention to the analysis of complex survey data, where the identification of influential factors for certain behavioral, social, and economic indices forms a variable selection problem. When data are collected through survey sampling from a finite population, they have an intrinsic dependence structure and may provide a biased representation of the target population. To avoid distorted conclusions, survey weights are usually adopted in these analyses. We use a pseudo-likelihood to account for the survey weights and propose a penalized pseudo-likelihood method for the variable selection of survey data. The consistency of the proposed approach is established for the joint randomization framework.Lastly, we address order selection for finite mixture models, which provides a flexible tool for modeling data from a heterogeneous population. PLMs are attractive for such problems. However, this application requires maximizations over nonsmooth and nonconcave objective functions, which are computationally challenging. We transform the original multivariate objective function into a sum of univariate functions and design an iterative thresholding-based algorithm to efficiently solve the sparse maximization without ad hoc steps. We establish the convergence of the new algorithm and illustrate its efficiency through both simulations and real-data examples.

View record

Master's Student Supervision (2010 - 2020)
An R package for monitoring test under density ratio model and its applications (2018)

Quantiles and their functions are important population characteristics in many applications.In forestry, lower quantiles of the modulus of rapture and other mechanicalproperties of the wood products are important quality indices. It is importantto ensure that the wood products in the market over the years meet the establishedindustrial standards. Two well-known risk measures in finance and hydrology,value at risk (VaR) and median shortfall (MS), are quantiles of their correspondingmarginal distributions. Developing effective statistical inference methods andtools on quantiles of interest is an important task in both theory and applications.When samples from multiple similar natured populations are available, Chen et al.[2016] proposed to use a density ratio model (DRM) to characterize potential latentstructures in these populations. The DRM enables us to fully utilized the informationcontained in the data from connected populations. They further proposeda composite empirical likelihood (CEL) to avoid a parametric model assumptionthat is subject to model-mis-specification risk and to accommodate clustered datastructure. A cluster-based bootstrap procedure was also investigated for varianceestimation, construction of confidence interval and test of various hypotheses.This thesis contains complementary developments to Chen et al. [2016]. First,a user-friendly R package is developed to make their methods easy-to-use for practitioners.We also include some diagnostic tools to allow users to investigate thegoodness of the fit of the density ratio model. Second, we use simulation to comparethe performance DRM-CEL-based test and the famous Wilcoxin rank test forclustered data. Third, we study the performance of DRM-CEL-based inferencewhen the data set contains observations with different cluster sizes. The simulationresults show that DRM-CEL method works well in common situations.

View record

Small area quantile estimation under unit-level models (2017)

Sample surveys are widely used as a cost-effective way to collect information on variables of interest in target populations. In applications, we are generally interested in parameters such as population means, totals, and quantiles. Similar parameters for subpopulations or areas, formed by geographic areas and socio-demographic groups, are also of interest in applications. However, the sample size might be small or even zero in subpopulations due to the probability sampling and the budget limitation. There has been intensive research on how to produce reliable estimates for characteristics of interest for subpopulations for which the sample size is small or even zero. We call this line of research Small Area Estimation (SAE). In this thesis, we study the performance of a number of small area quantile estimators based on a popular unit-level model and its variations. When a finite population can be regarded as a sample from some model, we may use the whole sample from the finite population to determine the model structure with a good precision. The information can then be used to produce more reliable estimates for small areas. However, if the model assumption is wrong, the resulting estimates can be misleading and their mean squared errors can be underestimated. Therefore, it is critical to check the robustness of estimators under various model mis-specification scenarios. In this thesis, we first conduct simulation studies to investigate the performance of three small area quantile estimators in the literature. They are found not to be very robust in some likely situations. Based on these observations, we propose an approach to obtain more robust small area quantile estimators. Simulation results show that the proposed new methods have superior performance either when the error distribution in the model is non-normal or the data set contain many outliers.

View record

Generalized method of moments - theoretical, econometric and simulation studies (2011)

The GMM estimator is widely used in the econometrics literature. This thesis mainly focus on three aspects of the GMM technique. First, I derive the prooves to study the asymptotic properties of the GMM estimator under certain conditions. To my best knowledge, the original complete prooves proposed by Hansen (1982) is not easily available. In this thesis, I provide complete prooves of consistency and asymptotic normality of the GMM estimator under some stronger assumptions than those in Hansen (1982). Second, I illustrate the application of GMM estimator in linear models. Specifically, I emphasize the economic reasons underneath the linear statistical models where GMM estimator (also referred to the Instrumental Variable estimator) is widely used. Third, I perform several simulation studies to investigate the performance of GMM estimator under different situations.

View record

Properties of Empirical and Adjusted Empirical Likelihood (2010)

Likelihood based statistical inferences have been advocated by generations of statisticians. As an alternative to the traditional parametric likelihood, empirical likelihood (EL) is appealing for its nonparametric setting and desirable asymptotic properties.In this thesis, we first review and investigate the asymptotic and finite-sample properties of the empirical likelihood, particularly its implication to constructing confidence regions for population mean. We then study the properties of the adjusted empirical likelihood (AEL) proposed by Chen et al. (2008). The adjusted empirical likelihood was introduced to overcome the shortcomings of the empirical likelihood when it is applied to statistical models specified through general estimating equations. The adjusted empirical likelihood preserves the first order asymptotic properties of the empirical likelihood and its numerical problem is substantially simplified.A major application of the empirical likelihood or adjusted empirical likelihood is the construction of confidence regions for the population mean. In addition, we discover that adjusted empirical likelihood, like empirical likelihood, has an important monotonicity property.One major discovery of this thesis is that the adjusted empirical likelihood ratio statistic is always smaller than the empirical likelihood ratio statistic. It implies that the AEL-based confidence regions always contain the corresponding EL-based confidence regions and hence have higher coverage probability. This result has been observed in many empirical studies, and we prove it rigorously.We also find that the original adjusted empirical likelihood as specified by Chen et al. (2008) has a bounded likelihood ratio statistic. This may result in confidence regions of infinite size, particularly when the sample size is small. We further investigate approaches to modify the adjusted empirical likelihood so that the resulting confidence regions of population mean are always bounded.

View record


If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.


Learn about our faculties, research and more than 300 programs in our 2022 Graduate Viewbook!