A Comparison of State-of-the-Art Algorithms for Learning Bayesian Network Structure from Continuous Data
Fu, Lawrence Dachen
In biomedical and biological domains, researchers typically study continuous data sets. In these domains, an increasingly popular tool for understanding the relationship between variables is Bayesian network structure learning. There are three methods for learning Bayesian network structure from continuous data. The most popular approach is discretizing the data prior to structure learning. Alternative approaches are integrating discretization with structure learning as well as learning directly with continuous data. It is not known which method is best since there has not been a unified study of the three approaches. The purpose of this work was to perform an extensive experimental evaluation of them. For large data sets consisting of originally discrete variables, discretization-based approaches learned the most accurate structures. With smaller sample sizes or data without an underlying discrete mechanism, a method learning directly with continuous data performed best. Also, for some quality metrics, the integrated methods did not provide improvements over simple discretization methods. In terms of time-efficiency, the integrated approaches were the most computationally intensive, while methods from the other categories were the least intensive.