Comprehensive Analysis of the Spatial Distribution of Missense Variants in Protein Structures Reveals Patterns Predictive of Pathogenicity
Sivley, Robert Michael
The spatial distribution of genetic variation within proteins is shaped by evolutionary constraint and thus can provide insights into the functional importance of protein regions and the potential pathogenicity of protein alterations. To facilitate the spatial analysis of coding variation in protein structure, we develop PDBMap, an automated pipeline for mapping genetic variants into all solved and predicted protein structures. We then comprehensively evaluate the 3D spatial patterns of constraint on human germline and somatic variation in 4,568 solved protein structures. Different classes of coding variants have significantly different spatial distributions. Neutral missense variants exhibit a range of 3D constraint patterns, with a general trend of spatial dispersion driven by constraint on core residues. In contrast, germline and variants are significantly more likely to be clustered in protein structure space. Finally, we demonstrate that this difference in the spatial distributions of disease-associated and benign germline variants provides a signature for accurately classifying variants of unknown significance (VUS) that is complementary to current approaches for VUS classification.