Extracting Detailed Tobacco Exposure From The Electronic Health Record

Osterman, Travis John

Extracting Detailed Tobacco Exposure From The Electronic Health Record

dc.creator	Osterman, Travis John
dc.date.accessioned	2020-08-22T17:25:36Z
dc.date.available	2019-08-09
dc.date.issued	2017-08-09
dc.identifier.uri	https://etd.library.vanderbilt.edu/etd-07142017-193150
dc.identifier.uri	http://hdl.handle.net/1803/13004
dc.description.abstract	Lung cancer is the leading cause of cancer-related death in the United States and worldwide. Natural language processing (NLP) tools exist to determine smoking status (ever-smoker vs. never-smoker) from electronic health record data, but no system to date extracts detailed smoking data needed to assess a patient’s eligibility for lung cancer screening. Here we describe the Smoking History And Pack-year Extraction System (SHAPES), a rules-based, NLP system to quantify tobacco exposure from electronic clinical notes. SHAPES was developed based on 261 patient records with 9,573 clinical notes and validated on 352 randomly selected patient records with 4,040 notes. F-measures for never-smoking status, ever-smoking status, rate of smoking, duration of smoking, quantity of cigarettes, and years quit were 0.86, 0.82, 0.79, 0.62, 0.64, and 0.61, respectively. Sixteen of 22 individuals eligible for lung cancer screening were identified (precision = 0.94, recall = 0.73). SHAPES was compared to a previously validated smoking classification system using a phenome wide association study (PheWAS). SHAPES predicted similar significant associations with 66% less sample size (10,000 vs. 35,788), and detected 411 (268%) more associations in the full dataset than when using just ever/never smoking status. Using smoking data from SHAPES, a smoking genome by environment interaction study found 57 statistically significant interactions between smoking and diseases including previously describes interactions between ischemic heart disease and rs1746537, obesity and rs10871777, and type 2 diabetes and rs2943641. These studies support the use of SHAPES for lung cancer screening and other research requiring quantitative smoking history. External validation needs to be performed prior to implementation at other medical centers.
dc.format.mimetype	application/pdf
dc.subject	data extraction
dc.subject	lung cancer screening
dc.subject	phewas
dc.subject	gxe
dc.subject	smoking
dc.subject	natural language processing
dc.title	Extracting Detailed Tobacco Exposure From The Electronic Health Record
dc.type	thesis
dc.contributor.committeeMember	Mia Levy, M.D., Ph.D.
dc.contributor.committeeMember	Pierre Massion, M.D.
dc.type.material	text
thesis.degree.name	MS
thesis.degree.level	thesis
thesis.degree.discipline	Biomedical Informatics
thesis.degree.grantor	Vanderbilt University
local.embargo.terms	2019-08-09
local.embargo.lift	2019-08-09
dc.contributor.committeeChair	Josh Denny, M.D., M.S.

Files in this item

Name:: TOsterman.pdf
Size:: 2.828Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record