Constraint on Rare Protein-Coding Variation: Pathogenicity Prediction and Phenotypic Discovery
Sivley, Robert Michael
Patterns of genetic variation along the human genome provide insight into functional and evolutionary constraints on different loci. Quantifying these patterns of constraint improves our ability to identify functional regions and interpret the phenotypic effects of genetic mutations. Building on exome-sequencing data from tens of thousands of individuals, we are now able to quantify constraint on a large scale. In this work, we explore three avenues by which constraint on rare protein-coding variation can be used to better understand human biology and elucidate the genetic drivers of disease. We first present a novel algorithm to classify variants of unknown significance (VUS) using patterns of spatial constraint on disease-causing variation in protein structure. We demonstrate its utility in classifying VUS in RTEL1, a helicase protein, from patients with familial interstitial pneumonia. Next, we quantify spatial constraint on somatic mutations in 3D protein structures and identify patterns indicative of driver mutations in several proteins. Finally, we perform phenome-wide association studies (PheWAS) to interrogate the phenotypic impact of rare protein-coding variants in genes intolerant to loss-of-function mutations. This dissertation makes significant advances in our understanding of how evolutionary constraint on protein-coding genetic variants is related to their contribution to human disease. In particular, we leveraged this progress to develop powerful approaches to variant pathogenicity prediction, the detection of putative driver mutations in cancer, and the identification of novel phenotype associations for highly constrained genes.