Show simple item record

Active Learning for Named Entity Recognition in Clinical Text

dc.creatorChen, Yukun
dc.date.accessioned2020-08-22T17:04:59Z
dc.date.available2016-06-25
dc.date.issued2015-06-25
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-06122015-162419
dc.identifier.urihttp://hdl.handle.net/1803/12541
dc.description.abstractNamed entity recognition (NER) is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance. However, they often require large numbers of annotated samples, which are expensive to build with the use of domain experts in annotation. Active learning (AL), a sample selection approach that can be integrated with supervised ML, has shown the promising potential to minimize the annotation cost while maximizing the performance of ML-based models in various NLP tasks. However, very few studies have investigated AL for clinical NER in a real-life setting. In this dissertation research, I systematically studied AL in a clinical NER task to identify medical problems, treatments, and lab tests in clinical notes. Novel AL algorithms were developed to query the most informative and least costly sentences based on three properties: uncertainty, representativeness, and annotation time. I also developed the first AL-enabled annotation system for clinical NER. Using this system, I further conducted user studies to assess the performance of AL in real world annotation processes for building clinical NER systems. The initial user study shows that conventional AL methods with no consideration of annotation time did not always perform better than random sampling for different users. However, our newly developed AL algorithms with cost models for estimating annotation time were more promising in practice. To achieve an NER model with 0.70 in F-measure, simulated results show that the new AL method saved ~33.3% in estimated annotation time, compared to random sampling. In the user study, the new AL algorithm achieved better performance than random sampling and saved up to ~26.5% real annotation time for one of the users. To the best of our knowledge, this is the first study examining the practical AL systems for clinical NER. Our study demonstrates that AL has the potential to save annotation time and improve model quality for building ML-based NER systems, when novel querying algorithms are implemented. Our future work includes developing better querying algorithms and evaluating the system with larger number of users.
dc.format.mimetypeapplication/pdf
dc.subjectnatural language processing
dc.subjectnamed entity recognition
dc.subjectmachine learning
dc.subjectActive learning
dc.subjectclinical NLP
dc.titleActive Learning for Named Entity Recognition in Clinical Text
dc.typedissertation
dc.contributor.committeeMemberThomas A. Lasko
dc.contributor.committeeMemberQiaozhu Mei
dc.contributor.committeeMemberQingxia Chen
dc.type.materialtext
thesis.degree.namePHD
thesis.degree.leveldissertation
thesis.degree.disciplineBiomedical Informatics
thesis.degree.grantorVanderbilt University
local.embargo.terms2016-06-25
local.embargo.lift2016-06-25
dc.contributor.committeeChairJoshua C. Denny
dc.contributor.committeeChairHua Xu


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record