Sparse Network-regularized Nonnegative Matrix Factorization and Applications to Tumor Subtyping
Cancers are complex diseases and identification of clinically important subtypes has the potential to guide better prognosis and treatment. The utility of graph-regularized nonnegative matrix factorization (GNMF) has been demonstrated on tumor subtype identification based on exome-level mutation data. In a recent study, it revealed that using a panel of important genes achieved superior classification than using the full set of (exome-level) mutations. We hypothesized that combining sparse coding with GNMF will enable automatic selection of important genes to aid tumor subtyping as well as interpretations of the underlying pathways responsible for the subtypes. To test our hypothesis, we proposed a new formulation that incorporates a lasso-like penalty into GNMF to enable variable selection and sparse representation. We evaluated the proposed method for rich scenarios of simulated mutation cohorts, and further demonstrated the utility on real mutation data from large-scale sequencing studies.