Novel Methods for Variable Selection in Non-faithful Domains, Understanding Support Vector Machines, Learning Regions of Bayesian Networks, and Prediction Under Manipulation
Brown, Laura Elizabeth
The focus of my research was to develop several novel computational techniques for discovering informative patterns and complex relationships in biomedical data. First, an efficient, heuristic method was developed to search for the features with largest absolute weight in a polynomial Support Vector Machine (SVM) model. This algorithm provides a new ability to understand, conceptualize, visualize, and communicate polynomial SVM models. Second, a new variable selection algorithm, called Feature Space Markov Blanket (FSMB), was designed. FSMB combines the advantages from kernel methods and Markov Blanket-based techniques for variable selection. FSMB was evaluated on several simulated, "difficult" distributions where it identified the Markov Blankets with high sensitivity and specificity. Additionally, it was run on several real world data sets; the resulting classification models are parsimonious (for two data sets, the models consisted of only 2-3 features). On another data set, the Markov Blanket-based method performed poorly; FSMB's improved performance suggests the existence of a complex, multivariate relationship in the underlying domain. Third, a well-cited algorithm for learning Bayesian networks (Max-Min Hill-Climbing, MMHC) was extended to locally learn a region of a Bayesian network. This local method was compared to MMHC in an empirical evaluation. The local method took, as expected, a fraction of the time to learn regions compared to MMHC; of particular interest, the local technique learned regions with equal or better quality. Finally, an approach using the formalism of causal Bayesian networks was designed to make predictions under manipulations; this approach was used in a submission to the Causality Challenge. The approach required the use and combination of the three methods from this research and many state-of-the-art techniques to build and evaluate models. The results of the competition (the submission performed best on one of the four tasks presented) illustrate some of the strengths and weaknesses of causal discovery methods and point to new directions in the field. The methods explored are introductory steps along research paths to explore understanding SVM models, variable selection in non-faithful problems, identifying causal relations in large domains, and learning with manipulations.