An automatic machine learning framework for the analysis of microbiome data and robust pipeline identification and evaluation
An increasing body of literature suggests that the human microbiome can be used to identify and predict the development of many diseases. Machine learning algorithms are often used to facilitate the disease-classification task; however, current approaches often result in “black boxes” and it is difficult to know how each component of a machine learning pipeline contributes to performance. Without this information it is difficult to diagnose issues or know how to make improvements. Algorithms that pinpoint which components of the machine learning pipeline effect performance are needed. This dissertation has focused on the development of an automated machine learning framework specialized for 16s metagenomic data that lets users identify efficient machine learning pipelines and interpret which components of the pipeline are most relevant. The framework is available as a python package and was designed to be modular, such that new machine learning algorithms can easily be added. The tool is applied to both 16s colorectal cancer data and 16s data exploring the role of intra-epithelial lymphocytes and milk-derived osteopontin on the gut microbiome.