Machine Learning on Single-Cell RNA-Seq to Advance our Understanding of Clonal Hematopoiesis
Sharber, Brian
0009-0005-9598-8211
:
2023-11-16
Abstract
Clonal Hematopoiesis of Indeterminate Potential (CHIP) is characterized by genetic mutations within blood-forming stem cells, leading to the emergence of mutated blood cell populations. Associated with elevated risks of various diseases, including malignancies and cardiovascular ailments, CHIP's intricate relationship between genetics and health underscores the need for comprehensive investigation and understanding. Current methodologies for the identification of CHIP cells are time-consuming and cost intensive. Utilizing single-cell RNA sequencing (scRNA-seq) data, this study aims to delve deeper into the complex genomic landscape of CHIP, harnessing the power of machine learning classifiers and techniques to build a more cost-effective pipeline for the identification of CHIP cells and enhance our understanding. A specialized machine learning classifier is tailored within a pipeline specifically for the nuances of CHIP-related single-cell RNA expression data, meticulously analyzing critical features. Using model-agnostic methods such as permutation importance, the model refines hundreds of features down to the most critical ones while maintaining a high level of accuracy. Exploration with the TET2 dataset successfully pinpoints 13 key genes that play a pivotal role in the identification of CHIP cells vs. non-CHIP cells using a model that accurately classifies CHIP cells 91% of the time. Exploration with the DNMT3A dataset successfully pinpoints 3 key features using a model that accurately classifies CHIP cells 81% of the time. The classifier developed within the pipeline holds the potential to assist in the precise identification of CHIP cells and unveil distinct RNA expression profiles. This research endeavors to illuminate the genetic and functional facets of CHIP cells, paving the way for advancements in disease prediction, diagnostics, and potential therapeutic interventions.