Basophile: accurate fragment charge state prediction improves peptide identification rates
In shotgun proteomics, database search algorithms rely on fragmentation models to predict fragment ions that should be observed for a given peptide sequence. The most widely used strategy (Naïve model) is oversimplified, breaking all peptide bonds with equal probability to produce fragments of all charges below that of the precursor ion. More accurate models are too computationally intensive for on-the-fly use in database search algorithms. We created an ordinal-regression based model called Basophile that reflects the relative importance of basic residues and fragment length in charge retention during CID/HCD fragmentation of charged peptides. The model improves the accuracy of predictions by reducing the number of unnecessary fragments that are routinely predicted for highly charged precursors. When compared with the Naïve model and Protein Prospector’s prediction model, Basophile has shown an average of 26% and 28% more identifications in triply-charged precursors on ion trap data. Basophile achieves simplicity and speed by solving the prediction problem with an ordinal regression equation, which can be easily incorporated into any database search software for shotgun proteomic identification.