Stabilizing Calibration of Clinical Prediction Models in Non-Stationary Environments: Methods Supporting Data-Driven Model Updating
Davis, Sharon Elizabeth
Risk prediction models are increasingly employed in clinical environments to support population health management, quality assessment, and clinical decision support tools directly integrated with electronic health records. However, patient populations, clinical practice, workflows, and information systems continuously evolve over time. Calibration of prediction models often deteriorates as training and application populations become increasingly disparate, and mis-identification of appropriate patients for a given use case can result in sub-optimal care and potential patient safety issues. As a result, updating strategies to sustain performance over time are becoming critical components of model implementations. Variations in the response of models to changes in clinical environments may impact the timing, extent, and form of drift in accuracy, yet common pre-defined updating plans fail to account for such differences. As an alternative, we develop a suite of methods supporting data-driven model updating strategies in order to consistently retain model calibration over time for both regression and machine learning models. We first develop the notation of dynamic calibration curves to maintain an evolving assessment of model performance. These curves bring together methods of stringent model validation and online learning from data streams to continuously evaluate performance as new patient data is observed. Leveraging these dynamic calibration curves, we construct a calibration drift detection system to trigger model updating as performance declines and inform the updating process with insight into selecting updating datasets. Finally, we define a non-parametric testing procedure to select between available updating methods, including recalibration and model refitting. This test aims to recommend simple updating methods without compromising performance achievable with more complex adjustments. All methods are designed to be widely applicable to dichotomous outcome models regardless of the underlying learning algorithm and customizable to meet the needs of clinical use cases. This work promotes a shift away from insufficient “one-size fits all” strategies. Individually and in concert, these methods tailor updating to the requirements of specific prediction models and use cases. They lay the ground work for the design of automated, electronic health record-embedded prediction model surveillance tools to promote long-term performance and utility of prediction models underlying a variety of informatics applications.