Design and Analysis Considerations for Complex Longitudinal and Survey Sampling Studies
Mercaldo, Nathaniel David
Pre-existing cohort data (e.g., electronic health records) are being increasingly available, and the need for novel and efficient uses of these data is paramount due to resource constraints. This dissertation consists of three chapters relating to the design and analysis of longitudinal and survey sampling studies when utilizing these types of data. In chapter one, we extend outcome-dependent sampling (ODS) designs for longitudinal binary data to permit data collection in two stages. We consider two subclasses of designs: fixed designs where the designs at each stage are pre-specified, and adaptive designs that utilize stage one data to improve design choice at stage two. We demonstrate that data from both stages can be aggregated to generate valid parameter estimates using ascertainment-corrected maximum likelihood methods. Efficiency gains are observed compared to random sampling, and in certain situations, single-stage ODS sampling designs. In chapter two, we investigate the effects of utilizing an imperfect sampling frame on the design, and analysis of complex survey data. We explore the impact of stratum misclassification on the choice of study design, on the operating characteristics of survey estimators, and on the appropriateness of two common approaches to survey design analysis. Stratified sampling is recommended over random sampling if interest lies in making inferential statements regarding rare subgroups. In the presence of misclassification, the relative efficiency depends on the subgroup prevalence, and analytic methods that account for the design are still required for valid inferences. In chapter three, we introduce the MMLB R package which is used to estimate parameters from marginalized regression models for longitudinal binary data. These models are described, and estimation procedures outlined under random, and ODS schemes. We provide examples to demonstrate how to fit these models, and how data may be generated under a pre-specified marginal mean model. We hope these chapters provide specific and general insights that will improve our ability to conduct efficient research studies under resource constraints.