Strategies for Improving Multiomic Metabolite Identifications Using Compound Libraries, Machine Learning, and Structural Mass Spectrometry
Picache, Jaqueline-Mae Arenas
One major challenge within biomedical research is the ability to identify unknown analytes in untargeted experiments. The aim of this work is to address this challenge using ion mobility – mass spectrometry (IM-MS). Specifically, IM-MS, in conjunction with other technologies described below, can improve the confidence of small molecule identifications when compared to traditional workflows. IM-MS experiments afford an additional separation dimension as well as derivation of the collision cross section (CCS). The CCS is a unique molecular characteristic that can be used to putatively identify molecules. Initial studies within this work focused on building compound libraries of CCS values such that exact matching and the ability to distinguish between isomeric compounds became possible. One major effort towards this goal was that of the Mass Spectrometry Metabolite Library of Standards which contained primary metabolites. The next major effort was dedicated to developing a larger, crowd-sourced repository of CCS values called the Unified CCS Compendium. Development of this compendium included standardizing the calculation and reporting of CCS values which has since been adopted on an international scale. Curation of this high precision, quality assured repository of experimental values enabled development of informatics tools that in turn improve confidence of small molecule identifications. The first of these informatics tools is a regression-based filtering system that utilizes confidence and predictive intervals to assign chemical classifications to analytes with tentative identifications from a traditional workflow. A proof-of-concept experiment using human serum demonstrated this process. The second informatic tool is a machine learning-based algorithm called the Supervised Inference of Feature Taxonomy from Ensemble Randomization (SIFTER). SIFTER assigns chemical classifications to unknown molecules that otherwise have no assignment based on traditional workflows. The remainder of this work discusses how crowd-sourced databases can propel a field forward. An example of this is the Unified CCS Compendium which increased community dialogue about CCS standardization protocols and led to the development of open-sourced tools that all can use to improve multiomic metabolite identifications in untargeted studies.