dc.creator | VanHouten, Jacob Paul | |
dc.date.accessioned | 2020-08-22T20:35:17Z | |
dc.date.available | 2018-07-29 | |
dc.date.issued | 2016-07-29 | |
dc.identifier.uri | https://etd.library.vanderbilt.edu/etd-07252016-100314 | |
dc.identifier.uri | http://hdl.handle.net/1803/13588 | |
dc.description.abstract | Electronic health records (EHRs) are rich data sources that can be analyzed to discover new, clinically relevant patterns of disease manifestations. However, sparsity, irregularity, and asynchrony in health records pose challenges for their use in such discovery tasks, as standard statistical and machine learning techniques possess limited ability to handle these complications. Abstracting the clinical data into models and then using elements of those models as input to statistical and machine learning algorithms is one approach to overcoming these challenges. This dissertation provides insight into the use of different models for this purpose.
First, I examine the effect of model complexity on algorithm performance. Specifically, I examine how well different models capture the low-specificity information distributed throughout electronic health data. For several predictive algorithms, low-complexity models turn out to be nearly as powerful and much less costly as high-complexity models.
I then explore the use of continuous longitudinal models of laboratory results and diagnosis billing codes to discover clinically relevant patterns between and among these data. I look for associations between clusters of specific laboratory values and single billing codes, and identify known associations as well as others that are consistent with current medical knowledge but not expected a priori.
Finally, I use the same longitudinal abstraction models as inputs into more complex probabilistic models that adjust for indirect associations, and find that diagnosis codes can be used to predict the laboratory status of a patient. | |
dc.format.mimetype | application/pdf | |
dc.subject | clinical informatics | |
dc.subject | data mining | |
dc.subject | medical records | |
dc.subject | data representation | |
dc.subject | machine learning | |
dc.title | Using Abstraction to Overcome Problems of Sparsity, Irregularity, and Asynchrony in Structured Medical Data | |
dc.type | dissertation | |
dc.contributor.committeeMember | Katherine E. Hartmann | |
dc.contributor.committeeMember | Nancy M. Lorenzi | |
dc.contributor.committeeMember | Michael E. Matheny | |
dc.contributor.committeeMember | Christopher J. Fonnesbeck | |
dc.type.material | text | |
thesis.degree.name | PHD | |
thesis.degree.level | dissertation | |
thesis.degree.discipline | Biomedical Informatics | |
thesis.degree.grantor | Vanderbilt University | |
local.embargo.terms | 2018-07-29 | |
local.embargo.lift | 2018-07-29 | |
dc.contributor.committeeChair | Thomas A. Lasko | |