Show simple item record

Learning Clinical Data Representations for Machine Learning

dc.creatorSulieman, Lina Mahmoud
dc.date.accessioned2020-08-23T15:48:16Z
dc.date.available2019-06-17
dc.date.issued2018-11-29
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-11192018-165929
dc.identifier.urihttp://hdl.handle.net/1803/14642
dc.description.abstractImplementing machine learning in healthcare has increased in the past years. Representing clinical data is the Crux of machine learning. Learning informative features can improve the trained models’ performance. This dissertation describes methods to learn representations for temporal and text data to improve machine learning results. Three data representations are discussed across three aims to tackle three biomedical informatics problems: 1) identifying patients at high risk of suffering from a negative outcome (readmission or death) to allocate intervention resources efficiently; 2) triaging patients’ messages and identifying their needs which requires human and time resources; 3) locating information about a phenotype in the clinical documents that requires human resources and increase information overload on healthcare providers. In the first aim, a representation leveraged the post-discharge data to predict the patients’ outcome over one year after discharge. Training the outcome prediction model on post-discharge and before-discharge data improved performance significantly compared the model trained on before-discharge clinical data only. In the second aim, the dissertation describes methods to learn representations that incorporate the semantics and the context of the words. These representations outperformed traditional features in identifying the patients’ needs in portal messages sent to healthcare providers. The results demonstrate that training machine learning models on these learned representations performs better than representations that lack those features. In the third aim, a deep learning model leveraged the clinical documents’ contents and the billing codes to learn representations for sentences. The model implemented the representations to extract the sentences that include phenotype information (i.e., relevant sentences) without using an annotated dataset. The extraction model achieved higher performance than a similar keyword-based extraction and KnowledgeMap, a clinical concepts extraction tool. The representations described in this dissertation are extensible to other electronic medical records. The proposed models can learn new representations that improve the clinical machine learning performance and can be applied to other medical informatics problems.
dc.format.mimetypeapplication/pdf
dc.subjectelectronic health records
dc.subjectEMR
dc.subjectprediction models
dc.subjectclinical models
dc.subjectNLP
dc.subjectfeature representation
dc.subjecttext mining
dc.subjectnatural language processing
dc.subjectoutcome prediction
dc.subjectdynamic features
dc.subjectreadmission
dc.subjecttext features
dc.subjectinformation extraction
dc.subjectclinical documents
dc.subjectdeep learning
dc.subjectmachine learning
dc.titleLearning Clinical Data Representations for Machine Learning
dc.typedissertation
dc.contributor.committeeMemberColin Walsh
dc.contributor.committeeMemberTom Lasko
dc.contributor.committeeMemberChristopher Fonnesbeck
dc.contributor.committeeMemberBradley Malin
dc.type.materialtext
thesis.degree.namePHD
thesis.degree.leveldissertation
thesis.degree.disciplineBiomedical Informatics
thesis.degree.grantorVanderbilt University
local.embargo.terms2019-06-17
local.embargo.lift2019-06-17
dc.contributor.committeeChairDaniel Fabbri


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record