Predicting Colorectal Cancer Recurrence by Utilizing Multiple-View Multiple-Learner Supervised Learning
Castellanos, Jason Alfred
Colorectal Cancer (CRC) remains a leading cause of cancer-related mortality in the United States. A key therapeutic dilemma in the treatment of CRC is whether patients with stage II and stage III disease require adjuvant chemotherapy after surgical resection. Attempts to improve identification of patients at increased risk of recurrence have yielded many predictive models based on gene expression data, but none are FDA approved and none are used in standard clinical practice. To improve recurrence prediction, we utilize an ensemble learning approach to predict recurrence status at 3 years after diagnosis. Multiple views of a microarray dataset were generated then used to train a diverse pool of base learners using 10x 10-fold cross-validation. Stacked generalization was used to train an ensemble model. Our results demonstrate that molecular data predicts recurrence significantly better than basic clinical data. We also demonstrate that the performance of the multiple-view multiple learner (MVML) supervised learning framework exceeds or matches that of the best base learners across all performance metrics.