Assisted annotation of biomedical text using rapTAT, an online learning-based tool
Gobbel, Glenn Temple
Clinical providers frequently document patient encounters using unstructured text. Although such text contains a wealth of data useful for improving healthcare, extracting it can be difficult. Natural language processing systems can help with data extraction, but system training and testing still require the costly generation of an annotated document corpus to serve as a reference standard. To reduce this cost, we developed the Rapid Text Annotation Tool (RapTAT), which pre-annotates documents based on user feedback and online machine learning. The tool gradually shifts the manual annotation task from identification and labeling of text phrases to one of review and correction. The tool was evaluated in a use case involving manual or assisted annotation of 404 clinical notes for concepts related to quality of care during heart failure treatment. Notes were divided into 20 batches of 19-21 documents for iterative annotation and training. The annotation rate increased significantly during the review process for RapTAT-assisted but not manual reviewers. F-measure of RapTAT pre-annotations increased from 0.5-0.6 to >0.80 (relative to both assisted reviewer and reference annotations) over the first 3 batches and more slowly thereafter. The tool significantly reduced overall workload by approximately 50% by both decreasing the number of annotations that had to be manually added and helping reviewers to annotate at an increased rate. Assisted annotation did not appear to reduce annotation quality; overall inter-annotator agreement was significantly higher between RapTAT-assisted than manual reviewers. This study demonstrates that online machine learning can substantially decrease the burden of annotating medical text.