Random Forest Classification of Acute Coronary Syndrome
VanHouten, Jacob Paul
Coronary artery disease (CAD) is the leading cause of death worldwide. Acute coronary syndromes (ACS), a subset of CAD, account for 1.4 million hospitalizations $165 billion in costs in the United States alone. A major challenge to the physician when diagnosing and treating patients with suspected ACS is that there is significant overlap between patients with and without ACS. There is a high cost to missing a diagnosis of ACS, but also a high cost to inappropriate treatment of patients without ACS. American College of Cardiology/American Heart Association guidelines recommend early risk stratification of patients to determine their likelihood of major adverse events, but many individual tests and prognostic indices lack sufficient performance characteristics for use in clinical practice. Prognostic indices specifically are often not representative of the population on which they are used and rely on complete and accurate data. We explored the use of state-of-the-art machine learning techniques random forest and elastic net on 23,576 records from the Synthetic Derivative to develop models with better performance characteristics than previously established prognostic indices in determining the risk of ACS for patients presenting with suspicious symptoms. We bootstrapped the process of model creation, and found that the random forest significantly outperformed elastic net, L2 regularized regression, and the previously-developed TIMI and GRACE scores. We also assessed the model calibration for the random forest and explored methods of correction. Our preliminary findings suggest that machine learning applied to noisy and largely missing data can still perform as well or better than previously developed scoring metrics.