Outcome Misclassification in Logistic Regression: Examining Hospitalization Risk and its Association with Health Literacy
In a cohort of patients seeking primary care, the Mid-South Coronary Heart Disease Cohort Study (MSCHDCS) sought to examine the association between health literacy and hospitalization risk in the year following enrollment. The hospital admission outcome data were originally collected from the Vanderbilt University Medical Center (VUMC); however, researchers eventually realized hospitalizations might be missed, and so they expanded hospital admissions to also include the surrounding Vanderbilt Health Affiliated Network (VHAN). By including admissions to non-VUMC hospitals, new hospitalizations were identified, and so by only using VUMC admissions data, many outcomes were misclassified. The goal of this research is to explore the potential impact outcome misclassification can have in settings similar to MSCHDCS where the hospitalizations might be missed due to inadequate outcome measurement. We explore the impact of non-differential and differential misclassification on naïve analyses and demonstrate the impact that misclassification can have on the results when it is dependent on the variables of interest, as in the MSCHDCS data. In the presence of suspected differential misclassification, we propose to collect validation data on a subset of patients and describe methods that can be used in conjunction with this data to obtain unbiased coefficient estimates. While older methods depend on prior knowledge of the sensitivity and specificity for the misclassification, the validation data removes this limitation and allows for adjustment with more flexible, non-parametric methods. With the highlighted adjustment methods, non-biased estimates were obtained in a simulation study as well as in the example data.