• About
    • Login
    View Item 
    •   Institutional Repository Home
    • Electronic Theses and Dissertations
    • Electronic Theses and Dissertations
    • View Item
    •   Institutional Repository Home
    • Electronic Theses and Dissertations
    • Electronic Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of Institutional RepositoryCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    On Optimal Prediction Rules With Prospective Missingness and Bagged Empirical Null Inference in Large-Scale Data

    Mercaldo, Sarah Fletcher
    : https://etd.library.vanderbilt.edu/etd-09062017-101455
    http://hdl.handle.net/1803/14094
    : 2017-09-18

    Abstract

    This dissertation consists of three papers related to missing data, prediction, and large scale inference. The first paper defines the problem of obtaining predictions from an exist- ing clinical risk prediction model when covariates are missing. We introduce the Pattern Mixture Kernel Submodel - submodels fit within each missing data pattern - that minimize prediction error in the presence of missingness. PMKS is explored in simulations and a case study, outperforming standard simple and multiple imputation techniques. The second paper introduces the Bagged Empirical Null p-value, a new algorithm that combines exist- ing methodology of Bagging and Empirical Null techniques to identify important effects in massive high-dimensional data. We illustrate the approach using a famous leukemia gene example where we uncovered new findings that are supported by previously published bench- work and we evaluate the algorithm’s performance in novel pseudo-simulations. The third paper gives recommendations for including the outcome in the imputation model during construction, validation, and application. We suggest only including the outcome for impu- tation of missing covariate values during model construction to obtain unbiased parameter estimates. When the outcome is used in the imputation algorithm during the validation step, we show through simulation, the model prediction metrics are optimistically inflated, and the actual pragmatic model performance would be inferior to the validated results. While the three papers presented here provide foundations for missing data and large scale inferential techniques, these ideas are applicable to a wide range of biomedical settings.
    Show full item record

    Files in this item

    Icon
    Name:
    Mercaldo.pdf
    Size:
    18.67Mb
    Format:
    PDF
    View/Open

    This item appears in the following collection(s):

    • Electronic Theses and Dissertations

    Connect with Vanderbilt Libraries

    Your Vanderbilt

    • Alumni
    • Current Students
    • Faculty & Staff
    • International Students
    • Media
    • Parents & Family
    • Prospective Students
    • Researchers
    • Sports Fans
    • Visitors & Neighbors

    Support the Jean and Alexander Heard Libraries

    Support the Library...Give Now

    Gifts to the Libraries support the learning and research needs of the entire Vanderbilt community. Learn more about giving to the Libraries.

    Become a Friend of the Libraries

    Quick Links

    • Hours
    • About
    • Employment
    • Staff Directory
    • Accessibility Services
    • Contact
    • Vanderbilt Home
    • Privacy Policy