PhD Defence: Association Tests and Stepwise Variable Selection in Finite Mixture Regression Models (Chong Gan)
Date and Time
Location
SSC 1303/MS Teams (send request to Tricia, gradms@uoguelph.ca, for meeting link)
Details
CANDIDATE: Chong Gan
ABSTRACT:
Mixture regression model is widely used to cluster subjects from a suspected heterogeneous population due to differential relationships between the response and covariates over unobserved subpopulations. In such applications, an important yet missing component of the statistical evidence pertaining to the significance of a hypothesis is needed to substantiate the findings. There are three scenarios that describe the contribution of a covariate on the response: (1) the covariate has no effect on the response, (2) the covariate has the homogeneous effect on the response across all subpopulations, and (3) the covariate has heterogeneous effects on the response depending on the subpopulations. We can test the effect type of the covariate by answering following two questions: (a) whether the covariate has significant effects? (b) whether the effects of the covariate differ significantly from one subpopulation to another? We name test (a) as overall effect test and test (b) as heterogeneous effect test. These two hypotheses can be tested sequentially, forming a sequential test.
In real applications, the order of the mixture models is usually unknown, which poses technical challenges to the validity of hypothesis test. In this thesis, we first propose four testing procedures for hypotheses regarding the covariate effect. Our proposed tests are all variations from the classical likelihood ratio test. For proof of concept, this study only considers one covariate under finite Gaussian mixture regression (GMR) framework. These testing procedures are evaluated through simulation studies and applications to the Chiroptera and the diabetes data. Finally, the weighted significance test (WEST) procedure is found to outperform the others in general.
Based on the WEST procedure, we develop two stepwise variable selection procedures: a forward variable selection and a backward variable elimination procedure, to identify the significant covariates from a given set of covariates. Furthermore, we generalize the WEST procedure to test effect types of a covariate given that there is another covariates in the model whose effects can be homogeneous and heterogeneous among different subpopulations. In the analysis of Chiroptera data, we explore which of the eight environmental and biological covariates influence the forearm length development of bat species across the world. Among the two most significant covariates identified in the variable selection, we test their types of effects: homogeneous or heterogeneous across all subpopulations. Simulation studies are conducted to assess the performances of our proposed procedures.
Finally, we generalize our methods from GMR models to finite Poisson Mixture regression model. Our study is motivated by the modelling and analysis of wildfire occurrences in British Columbia, Canada. Particularly, our focus is on finding environmental factors and human activities that affect the risk of human-caused wildfire occurrences. We apply our proposed methods to the human-caused wildfire data and assess the performance of our proposed method through simulation study.
In summary, this thesis proposes a novel procedure, WEST procedure, for testing the association between a covariate and the response and testing the effect type of the covariate in mixture regression models. We also develop two novel stepwise variable selection procedures for mixture regression models based on the WEST procedure.
Examining Committee
- Dr. Rajesh Pereira, Chair
- Dr. Zeny Feng, Advisor
- Dr. Khurram Nadeem, Advisory Committee Member
- Dr. Jeremy Balka, Department Member
- Dr. Utkarsh Dang, Carleton University, External Examiner