Jiahua Chen

 
Prospective Graduate Students / Postdocs

This faculty member is currently not looking for graduate students or Postdoctoral Fellows. Please do not contact the faculty member with any such requests.

Professor

Research Interests

finite mixture model
empirical likelihood
asymptotic theory
sample survey

Relevant Degree Programs

 

Graduate Student Supervision

Doctoral Student Supervision (Jan 2008 - May 2019)
Sequential ED-design for binary dose-response experiments (2018)

Dose-response experiments and subsequent data analyses are often carried out according to optimal designs for the purpose of accurately determining a specific effective dose (ED) level. If the interest is the dose-response relationship over a range of ED levels, many existing optimal designs are not accurate. In this dissertation, we propose a new design procedure, called two-stage sequential ED-design, which directly and simultaneously targets several ED levels. We use a small number of trials to provide a tentative estimation of the model parameters. The doses of the subsequent trials are then selected sequentially, based on the latest model information, to maximize the efficiency of the ED estimation over several ED levels. Although the commonly used logistic and probit models are convenient summaries of the dose-response relationship, they can be too restrictive. We introduce and study a more flexible albeit slightly more complex three-parameter logistic dose-response model. We explore the effectiveness of the sequential ED-design and the D-optimal design under this model, and develop an effective model fitting strategy. We develop a two-step iterative algorithm to compute the maximum likelihood estimate of the model parameters. We prove that the algorithm iteration increases the likelihood value, and therefore will lead to at least a local maximum of the likelihood function. We also study the numerical solution to the D-optimal design for the three-parameter logistic model. Interestingly, all our numerical solutions to the D-optimal design are three-point-support distributions.We also discuss the use of the ED-design when experimental subjects become available in groups. We introduce the group sequential ED-design, and demonstrate how to construct this design. The ED-design has a natural extension to more complex model and can satisfy a broad range of the demands that may arise in applications.

View record

On dual empirical likelihood inference under semiparametric density ratio models in the presence of multiple samples with applications to long term monitoring of lumber quality (2014)

Maintaining a high quality of lumber products is of great social and economic importance. This thesis develops theories as part of a research program aimed at developing a long term program for monitoring change in the strength of lumber. These theories are motivated by two important tasks of the monitoring program, testing for change in strength populations of lumber produced over the years and making statistical inference on strength populations based on Type I censored lumber samples. Statistical methods for these inference tasks should ideally be efficient and nonparametric. These desiderata lead us to adopt a semiparametric density ratio model to pool the information across multiple samples and use the nonparametric empirical likelihood as the tool for statistical inference.We develop a dual empirical likelihood ratio test for composite hypotheses about the parameter of the density ratio model based on independent samples from different populations. This test encompasses testing differences in population distributions as a special case. We find the proposed test statistic to have a classical chi-square null limiting distribution. We also derive the power function of the test under a class of local alternatives. It reveals that the local power is often increased when strength is borrowed from additional samples even when their underlying distributions are unrelated to the hypothesis of interest. Simulation studies show that this test has better power properties than all potential competitors adopted to the multiple sample problem under the investigation, and is robust to model misspecification. The proposed test is then applied to assess strength properties of lumber with intuitively reasonable implications for the forest industry.We also establish a powerful inference framework for performing empirical likelihood inference under the density ratio model when Type I censored samples are present. This inference framework centers on the maximization of a concave dual partial empirical likelihood function, and features an easy computation. We study the properties of this dual partial empirical likelihood, and find its corresponding likelihood ratio test to have a simple chi-square limiting distribution under the null model and a non-central chi-square limiting distribution under local alternatives.

View record

Applications of penalized likelihood methods for feature selection in statistical modeling (2012)

Feature selection plays a pivotal role in knowledge discovery and contemporary scientific research. Traditional best subset selection or stepwise regression can be computationallyexpensive or unstable in the selection process, and so various penalized likelihood methods (PLMs) have received much attention in recent decades. In this dissertation, we develop approaches based on PLMs to deal with the issues of feature selection arising from several application fields.Motivated by genomic association studies, we first address feature selection in ultra-high-dimensional situations, where the number of candidate features can be huge. Reducing the dimension of the data is essential in such situations. We propose a novel screening approach via the sparsity-restricted maximum likelihood estimator that removes most of the irrelevant features before the formal selection. The model after screening serves as an excellent starting point for the use of PLMs. We establish the screening and selection consistency of the proposed method and develop efficient algorithms for its implementation.We next turn our attention to the analysis of complex survey data, where the identification of influential factors for certain behavioral, social, and economic indices forms a variable selection problem. When data are collected through survey sampling from a finite population, they have an intrinsic dependence structure and may provide a biased representation of the target population. To avoid distorted conclusions, survey weights are usually adopted in these analyses. We use a pseudo-likelihood to account for the survey weights and propose a penalized pseudo-likelihood method for the variable selection of survey data. The consistency of the proposed approach is established for the joint randomization framework.Lastly, we address order selection for finite mixture models, which provides a flexible tool for modeling data from a heterogeneous population. PLMs are attractive for such problems. However, this application requires maximizations over nonsmooth and nonconcave objective functions, which are computationally challenging. We transform the original multivariate objective function into a sum of univariate functions and design an iterative thresholding-based algorithm to efficiently solve the sparse maximization without ad hoc steps. We establish the convergence of the new algorithm and illustrate its efficiency through both simulations and real-data examples.

View record

Master's Student Supervision (2010 - 2018)
An R package for monitoring test under density ratio model and its applications (2018)

Quantiles and their functions are important population characteristics in many applications.In forestry, lower quantiles of the modulus of rapture and other mechanicalproperties of the wood products are important quality indices. It is importantto ensure that the wood products in the market over the years meet the establishedindustrial standards. Two well-known risk measures in finance and hydrology,value at risk (VaR) and median shortfall (MS), are quantiles of their correspondingmarginal distributions. Developing effective statistical inference methods andtools on quantiles of interest is an important task in both theory and applications.When samples from multiple similar natured populations are available, Chen et al.[2016] proposed to use a density ratio model (DRM) to characterize potential latentstructures in these populations. The DRM enables us to fully utilized the informationcontained in the data from connected populations. They further proposeda composite empirical likelihood (CEL) to avoid a parametric model assumptionthat is subject to model-mis-specification risk and to accommodate clustered datastructure. A cluster-based bootstrap procedure was also investigated for varianceestimation, construction of confidence interval and test of various hypotheses.This thesis contains complementary developments to Chen et al. [2016]. First,a user-friendly R package is developed to make their methods easy-to-use for practitioners.We also include some diagnostic tools to allow users to investigate thegoodness of the fit of the density ratio model. Second, we use simulation to comparethe performance DRM-CEL-based test and the famous Wilcoxin rank test forclustered data. Third, we study the performance of DRM-CEL-based inferencewhen the data set contains observations with different cluster sizes. The simulationresults show that DRM-CEL method works well in common situations.

View record

Generalized method of moments : theoretical, econometric and simulation studies (2011)

The GMM estimator is widely used in the econometrics literature. This thesis mainly focus on three aspects of the GMM technique. First, I derive the prooves to study the asymptotic properties of the GMM estimator under certain conditions. To my best knowledge, the original complete prooves proposed by Hansen (1982) is not easily available. In this thesis, I provide complete prooves of consistency and asymptotic normality of the GMM estimator under some stronger assumptions than those in Hansen (1982). Second, I illustrate the application of GMM estimator in linear models. Specifically, I emphasize the economic reasons underneath the linear statistical models where GMM estimator (also referred to the Instrumental Variable estimator) is widely used. Third, I perform several simulation studies to investigate the performance of GMM estimator under different situations.

View record

Properties of Empirical and Adjusted Empirical Likelihood (2010)

No abstract available.

 
 

If this is your researcher profile you can log in to the Faculty & Staff portal to update your details and provide recruitment preferences.