Impact of Delayed Event Time on Cox and Logistic Regression Models and Its Application to GWAS
Irlmeier, Rebecca Terese
Logistic regression models are used in the majority of genomic studies to analyze the genetic data linked to electronic health record (EHR) data, and do not take full usage of the time-to-event information available in EHRs. Previous work has shown that Cox (proportional hazards) regression, which can account for the left truncation and right censoring that occurs in the EHR, increased the power to detect genotype-phenotype associations compared to logistic regression. Here we extend this to evaluate the relative performance of Cox regression and various logistic regression models in the presence of delayed event time, relating to recorded time accuracy of the event of interest. One Cox model and three logistic regression models were considered under different scenarios of delayed event time. Extensive simulation studies and a genomic study application were used to evaluate the impact of delayed event time. We found that while logistic regression does not model the time-to-event directly, various logistic regression models used in the literature were more sensitive to delayed event time than Cox regression. The simulations showed that Cox regression had similar or modest improvement in statistical power over various logistic regression models at controlled type I error. This was supported by the empirical data, where the Cox models steadily had the highest sensitivity to detect known genotype-phenotype associations under all scenarios of delayed event time. In the presence of delayed event time scenarios that might exist in EHRs, Cox regression outperformed the logistic regression models commonly used in genomic studies.