Defining Phenotypes, Predicting Drug Response, and Discovering Genetic Associations in the Electronic Health Record with Applications in Rheumatoid Arthritis
Carroll, Robert James
Electronic Health Records (EHRs) allow for the digital capture of patient information and have proven to be a valuable tool for patient treatment. In this dissertation, I explore reuse of EHR data for clinical and genomic research with a focus on rheumatoid arthritis (RA). RA is a chronic autoimmune disorder that primarily affects joints with swelling, stiffness, and pain, and if left untreated can lead to permanent joint damage. Phenome wide association studies (PheWAS) leverage the breadth of codified diagnostic information about patients in the EHR to find disease associations. A package for the R statistical language is presented here that includes the tools needed to perform EHR-based or observational trial PheWAS, from ICD-9 code translation to association testing and meta-analysis. It includes a versatile plotting system for phenotype related information following the Manhattan plot paradigm. This methodology is applied in conjunction with genetic risk scores (GRS) to assess pleiotropy and shared genetic risk among phenotypes. Investigations of 99 known risk variants for RA and three formulations of GRS show that the GRS is more specific to RA than the individual single nucleotide polymorphisms, but the GRSs had clinically interesting associations with hypothyroidism. Presented next is the development of an algorithm to retrospectively identify drug response to etanercept in the EHR. Using chart reviews and a variety of input data including billing codes, processed free text, and medication entries, a support vector machine and random forest classifier were created that can discriminate between drug responders and non-responders with an area under the receiver operating characteristic curve of 0.939 and 0.923, respectively. The drug response algorithm was applied to create a case control cohort. Using these records, the final study identifies phenotypes associated with etanercept response, including fibromyalgia and several axial skeleton disease phenotypes: intervertebral disc disorders, degeneration of intervertebral disc, and spinal stenosis. Taken together, these studies demonstrate that EHR data can be an important tool for clinical and genomic research, and offer particular promise for the study of RA.