Confronting Complexity: A Comprehensive Statistical and Computational Strategy for Identifying the Missing Link between Genotype and Phenotype
Thornton-Wells, Tricia Ann
Common diseases with a genetic basis are likely to have a very complex etiology, in which the mapping between genotype and phenotype is far from straightforward. A new comprehensive statistical and computational strategy for identifying the missing link between genotype and phenotype is proposed, which emphasizes the need to address heterogeneity in the first stage of any analysis. A simulation study comparing three ‘unsupervised’ clustering methods was conducted, and the best method—Bayesian Classification—was evaluated further for its performance and applicability to real data under a wide range of simulation conditions. The proposed two-stage analysis strategy was then applied to late-onset Alzheimer disease data. Bayesian Classification found statistically significant clusterings for independent family-based and case-control datasets, which used the same five markers in LRRTM3 as their most influential in determining cluster assignment. In subsequent analyses to detect main effects and gene-gene interactions, markers in four genes—PLAU, IDE, CDC2 and ACE—were found to be associated with late-onset Alzheimer disease in particular subsets of the data based on their LRRTM3 haplotype. While each of these genes are viable candidates for LOAD based on their known biological function, further studies are needed to replicate these statistical findings and to elucidate possible biological interaction mechanisms between LRRTM3 and these genes. Going forward, genetic studies will increasingly focus time and resources to collecting phenotypic data that can refine definitions or subcategories of traits or diseases and can serve as endophenotypes, which are more likely to have simple etiologies and to directly map to specific genetic markers. In the case of neurological diseases, one collection of phenotyping technologies which has matured considerably over the past five to ten years is neuroimaging. In addition, an emphasis on possible biological mechanisms of disease has positively influenced the design of behavioral assessment tools, increasing their utility as phenotyping tools, which provide endophenotypes that can be mapped to genotypic data. Methodologies enabling the integration of disparate data sources (genotyping and neuroimaging or behavioral) must be investigated in order to harness the power inherit in their complexity.