Genestation: The Genomic Search Engine Toolkit
Recent advances in sequencing technology have greatly decreased the cost and increased the availability of genomic data. Armed with this rich base of information, it has become feasible to pursue the study of complex diseases, such as preterm birth, which involve complex regulatory networks and bear the traces of our recent evolutionary past. In order to facilitate the study of preterm birth, I created the GEneSTATION pregnancy research database. The development of this database necessitated the creation of the Genestation Search Engine Toolkit, a novel search solution able to manage over 77 million records. I designed an extensible ontological model for storing biological data in ElasticSearch that delivers over 1000 times faster performance in high volume queries than pre-existing genome database structures, such as the Generic Model Organism Database Project’s Chado schema. Utilizing this new technology, I created dynamic web-based statistical analysis and visualization tools that enabled search and exploration of genomic data in real-time. These new technologies will enable researchers to gain a better understanding of their questions in the context of the whole genome and develop a greater intuition of the nature of genomic data.