Show simple item record

Automated Mapping of Laboratory Tests to LOINC Codes using Noisy Labels in a National Electronic Health Record System Database

dc.creatorParr, Sharidan Kristen
dc.date.accessioned2020-08-22T17:25:37Z
dc.date.available2019-07-16
dc.date.issued2018-07-16
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-07142018-114155
dc.identifier.urihttp://hdl.handle.net/1803/13005
dc.description.abstractStandards, such as the Logical Observation Identifiers Names and Codes (LOINC®) are critical for interoperability and integrating data into common data models, but are inconsistently used. Without consistent mapping to standards, clinical data cannot be harmonized, shared, or interpreted in a meaningful context. We sought to develop an automated machine learning pipeline that leverages noisy labels to map laboratory data to LOINC codes. Across 130 sites in the Department of Veterans Affairs Corporate Data Warehouse, we selected the 150 most commonly-used laboratory tests with numeric results per site from 2000 through 2016. Using source data text and numeric fields, we developed a machine learning model and manually validated random samples from both labeled and unlabeled datasets. The raw laboratory data consisted of >6.5 billion test results, with 2,215 distinct LOINC codes. The model predicted the correct LOINC code in 85% of the unlabeled data and 96% of the labeled data by test frequency. In the subset of labeled data where the original and model-predicted LOINC codes disagreed, the model-predicted LOINC code was correct in 83% of the data by test frequency. Using a completely automated process, we are able to assign LOINC codes to unlabeled data with high accuracy. When the model-predicted LOINC code differed from the original LOINC code, the model prediction was correct in the vast majority of cases. This scalable, automated algorithm may improve data quality and interoperability, while substantially reducing the manual effort currently needed to accurately map laboratory data.
dc.format.mimetypeapplication/pdf
dc.subjectLaboratory
dc.subjectData Quality
dc.subjectMachine Learning
dc.titleAutomated Mapping of Laboratory Tests to LOINC Codes using Noisy Labels in a National Electronic Health Record System Database
dc.typethesis
dc.contributor.committeeMemberThomas Lasko
dc.contributor.committeeMemberMatthew Shotwell
dc.type.materialtext
thesis.degree.nameMS
thesis.degree.levelthesis
thesis.degree.disciplineBiomedical Informatics
thesis.degree.grantorVanderbilt University
local.embargo.terms2019-07-16
local.embargo.lift2019-07-16
dc.contributor.committeeChairMichael Matheny


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record