Show simple item record

Identification and Prediction of Incompletely Ascertainable, Rare Healthcare Outcomes

dc.contributor.advisorMatheny, Michael E
dc.contributor.advisorReeves, Ruth
dc.contributor.advisorFabbri, Daniel
dc.creatorJeffery, Alvin Dean
dc.date.accessioned2022-01-10T16:47:14Z
dc.date.created2021-12
dc.date.issued2021-11-19
dc.date.submittedDecember 2021
dc.identifier.urihttp://hdl.handle.net/1803/16994
dc.description.abstractClinical prediction models are increasingly common, particularly with advances in machine learning. Assigning outcome labels on which to train these models is challenging due to the time-consuming and resource-intensive nature of manual chart reviews. Our overall objective in this work was to examine whether noisy labels generated from subject matter experts’ heuristics using heterogenous data types could be used to provide outcome labels to large, observational datasets to support predictive modeling. We used the clinical condition of opioid-induced respiratory depression as our use case. We applied a data programming paradigm with labeling functions that served as weak learners to identify opioid-induced respiratory depression in a cohort of 44,999 post-operative adult patients (52,861 visits) in a de-deidentified electronic health record database. We then used this method to provide outcome labels for: (a) building a clinical prediction model and (b) conducting a genome-wide association study. We found that the data programming method, using 14 labeling functions created by a dually trained biomedical informaticist and critical care nurse, could identify all individuals who had the condition of interest with only a moderate number false positives (sensitivity = 1.0, positive predictive value = 0.263). For rare outcomes, this method has the potential to significantly reduce the number of manual chart reviews required for outcome labeling. Due to limited data availability in our data source, our clinical prediction models were unable to yield a performance better than random chance. Our genome-wide association study, while under-powered (14 cases and 1877 controls among those with genetic data of European ancestry), resulted in some statistically significant associations, particularly when using the quantitative (rather than traditional binary) trait generated from probabilistic outcome labels of the data programming method. In sum, we have applied and evaluated an approach to generate outcome labels for a rare outcome in a large data set without the need to manually review every record. In addition to this primary contribution, we also demonstrated how those outcomes labels can be used for downstream tasks, such as clinical prediction model development and genetic association studies.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectinformatics
dc.subjectphenotyping
dc.subjectrespiratory depression
dc.subjectpredictive modeling
dc.subjectgenetics
dc.titleIdentification and Prediction of Incompletely Ascertainable, Rare Healthcare Outcomes
dc.typeThesis
dc.date.updated2022-01-10T16:47:14Z
dc.type.materialtext
thesis.degree.nameMS
thesis.degree.levelMasters
thesis.degree.disciplineBiomedical Informatics
thesis.degree.grantorVanderbilt University Graduate School
local.embargo.terms2022-12-01
local.embargo.lift2022-12-01
dc.creator.orcid0000-0003-2797-6508


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record