Summerlee Science Complex Room 1511
CANDIDATE: ALI ALATTAS
ABSTRACT:
Many studies handle binary longitudinal data by model-based classifiers, such as logistic regression or Generalized Estimating Equations, although their assumptions are often unsatisfied with real data. In contrast, tree-based classifiers are free of distributional assumptions, and several of these classifiers use bootstrap samples to construct multiple trees and combine them, in order to reduce prediction error. An example of such an ensemble method is bagging. For paired data, Adler et al. (2011a) compared two types of bagging, namely subject-based strategies and observation-based strategies. In this thesis, we extended these strategies to longitudinal data.
The subject-based strategies are one (bootstrap), all (bootstrap) and random (bootstrap). They all take a bootstrap sample of N subjects, but then select the measurements within subjects differently. The observation-based strategies are one and all. Strategy one samples one observation from each subject. Strategy all takes all the observations from each subject, or in other words, uses the data set without modification.
To evaluate the performance of the five strategies, we compared the subject-based strategies to the observation-based strategies twice. Firstly, all strategies are classified by bagging. Secondly, the subject-based strategies are classified by bagging and the observation-based strategies are classified by a single Classification Tree (CT). We found random (bootstrap) to be the best strategy when the covariates are fixed over time. All three subject-based strategies performed well when the covariates are time-varying and the sample size is large. We illustrated the five strategies on a subset of a well-known dataset on mother’s stress and children’s morbidity.
Advisory Committee
- J. Horrocks, Advisor
- P. Kim
Examining Committee
- Z. Feng, Chair
- J. Horrocks
- P. Kim
- A. Ali