Show simple item record

On Optimal Prediction Rules With Prospective Missingness and Bagged Empirical Null Inference in Large-Scale Data

dc.creatorMercaldo, Sarah Fletcher
dc.date.accessioned2020-08-22T20:58:45Z
dc.date.available2018-09-18
dc.date.issued2017-09-18
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-09062017-101455
dc.identifier.urihttp://hdl.handle.net/1803/14094
dc.description.abstractThis dissertation consists of three papers related to missing data, prediction, and large scale inference. The first paper defines the problem of obtaining predictions from an exist- ing clinical risk prediction model when covariates are missing. We introduce the Pattern Mixture Kernel Submodel - submodels fit within each missing data pattern - that minimize prediction error in the presence of missingness. PMKS is explored in simulations and a case study, outperforming standard simple and multiple imputation techniques. The second paper introduces the Bagged Empirical Null p-value, a new algorithm that combines exist- ing methodology of Bagging and Empirical Null techniques to identify important effects in massive high-dimensional data. We illustrate the approach using a famous leukemia gene example where we uncovered new findings that are supported by previously published bench- work and we evaluate the algorithm’s performance in novel pseudo-simulations. The third paper gives recommendations for including the outcome in the imputation model during construction, validation, and application. We suggest only including the outcome for impu- tation of missing covariate values during model construction to obtain unbiased parameter estimates. When the outcome is used in the imputation algorithm during the validation step, we show through simulation, the model prediction metrics are optimistically inflated, and the actual pragmatic model performance would be inferior to the validated results. While the three papers presented here provide foundations for missing data and large scale inferential techniques, these ideas are applicable to a wide range of biomedical settings.
dc.format.mimetypeapplication/pdf
dc.subjectmissing data
dc.subjectimputation
dc.subjectprediction models
dc.subjectlarge-scale inference
dc.subjectp-values
dc.titleOn Optimal Prediction Rules With Prospective Missingness and Bagged Empirical Null Inference in Large-Scale Data
dc.typedissertation
dc.contributor.committeeMemberJeffrey D. Blume
dc.contributor.committeeMemberMatthew S. Shotwell
dc.contributor.committeeMemberThomas G. Stewart
dc.contributor.committeeMemberMelinda C. Aldrich
dc.type.materialtext
thesis.degree.namePHD
thesis.degree.leveldissertation
thesis.degree.disciplineBiostatistics
thesis.degree.grantorVanderbilt University
local.embargo.terms2018-09-18
local.embargo.lift2018-09-18
dc.contributor.committeeChairRobert A. Greevy


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record