Improvements to BCL::Fold de novo Protein Structure Prediction
Proteins play a pivotal role in the functions of a cell. Structural information about proteins facilitates the understanding of their function. Limitations of experimental methods make it difficult to determine the structure experimentally for certain proteins. Structure prediction methods and other approaches to derive information from existing experimental data can aid in elucidating protein structure and guide further experiments. BCL::Align is a multiple sequence alignment tool using a dynamic programming algorithm. Its customizable scoring function is a weighted sum of different scoring terms that can be tailored for different applications. Optimal weights for sequence alignment and fold recognition were determined by a Monte Carlo optimization. In a comparison with other methods, BCL::Align ranked best in alignment accuracy with a Cline score of 22.90 for the SABmark Twilight Zone reference set. The de novo protein structure prediction method BCL::Fold was tested in the Critical Assessment of Structure Prediction (CASP) experiment. Of 18 proteins, BCL::Fold sampled the topology correctly for 11 respectively 12 proteins, measured by topology score > 0.8 respectively GDT_TS > 33%. Analysis of models at the different steps resulted in the identification of non-native conformations for loops and beta sheets. Filtering of models by the new loop angle scoring term enriched for native-like models. Small Angle X-ray scattering (SAXS) experiments provide low-resolution information about a protein’s shape. While insufficient to obtain an atomic resolution protein structure, it can be used in BCL::Fold to focus the sampling on models that agree with the SAXS data. Receiver operator characteristic (ROC) curve analysis resulted in an Area under the curve (AUC) of 0.9264 showing that topologies can be discriminated even for BCL models with a reduced representation. In a folding benchmark, using SAXS data improved models. Filtering models by agreement with SAXS data enriches for correct models; and building models with SAXS data yields models that are more native like.