Conditional Associations with Big Data: Estimating Adjusted Rank Correlations in the Electronic Health Record
In this thesis, we apply and adapt a new method to assess conditional associations in a large dataset from the Vanderbilt University Medical Center Electronic Health Record (EHR). We estimate pairwise rank correlations among disease status and lab values in the EHR after adjusting for demographical information. Our covariate-adjusted rank correlations involve fitting cumulative probability models (CPMs), extracting probability-scale residuals (PSRs) from these models, and computing the sample correlation between PSRs for different outcomes. This approach is rank-based, robust, and applicable to a variety of data types. Computational challenges arise with large datasets, particularly when we apply these methods to continuous outcome variables such as most lab values; we propose some workaround solutions. We present our results with estimates and confidence intervals for the partial Spearman’s rank correlations among all pairwise combinations of the most frequent 250 ICD codes and 50 lab results among 472,570 patients with data in the EHR. We also present results stratified by sex and diabetes status, demonstrating how to assess for differences in correlations between different population strata.