UNDERSTANDING THE EVOLUTION AND SEQUENCE ARCHITECTURE OF GENE REGULATORY ELEMENTS THROUGH MACHINE LEARNING
Enhancers are genomic regions distal to promoters that bind transcription factors (TFs) to regulate the dynamic spatiotemporal patterns of gene expression required for proper differentiation and development of multi-cellular organisms. It is critical to understand the mechanisms underlying enhancer evolution and function, as alterations in their activity influence both speciation and disease. Recent genome-wide profiling of histone modifications associated with enhancer activity revealed that the regulatory landscape changes dramatically between species—active regions with enhancer activity are extremely variable across closely related mammals. In this dissertation, I present the work I have done investigating evolution of enhancers and dissecting their sequence architectures. In Chapter I, I outline the background and the motivation for studying these questions. In Chapter II, I show that conserved enhancers in mammalian species are more pleiotropic than species-specific enhancers, suggesting evolutionary constraint underpinning the loss and gain of enhancers. In Chapter III, I demonstrate the conservation of enhancer sequence properties despite the rapid turnover of the location of active enhancers in mammalian species through a machine learning based, cross-species prediction framework. In Chapter IV, I investigate the power of a state-of-art enhancer prediction algorithm, deep neural networks, at modeling enhancer architecture. Finally, in Chapter V, I summarize the conclusions of the proceeding chapters and discuss future work that could be done to answer questions raised by the findings in this dissertation.