Structure prediction and variant interpretation of membrane proteins aided by machine learning algorithms
Helical membrane proteins (HMPs) play essential roles in various biological processes. Despite their prevalence in the genome, a very small portion (~2%) of structures in the Protein Data Bank are HMPs, partially due to the experimental difficulties in determining structures of HMPs and their complexes. Therefore, accurate computational methods for predicting structure and interpreting variants of HMPs are particularly desirable. We developed a method, using state-of-the-art machine learning techniques, that accurately predicts residue weighted contact numbers (WCNs) from amino acid sequences. We demonstrated that residues’ WCNs predicted by this method not only are effective restraints for improving the fraction of native contacts in tertiary structure prediction of HMPs, they can also be used to derive a powerful score for selecting native-like docking candidates of HMP complexes. We also developed a machine learning-based protein-specific method capable of accurately predicting functional consequences of variants of the KCNQ1 potassium channel. The success of our methods suggests that using structural properties predicted by machine-learning algorithms as restraints can be an effective approach to improving sampling and scoring in membrane protein structure prediction. It also suggests a promising pipeline, where a machine learning model is tailored to a specific protein target and trained with a functionally validated data set to calibrate informatics tools.