Integrative Biomarker Discovery for Breast Cancer Metastasis from Gene Expression and Protein Interaction Data Using Error-tolerant Pattern Mining

Date of Submission: 
November 24, 2009
Report Number: 
Report PDF: 
Biomarker discovery for complex diseases is a challenging problem. Most of the existing approaches identify individual genes as disease markers, thereby missing the interactions among genes. Moreover, often only single biological data source is used to discover biomarkers. These factors account for the discovery of inconsistent biomarkers. In this paper, we propose a novel error-tolerant pattern mining approach for integrated analysis of gene expression and protein interaction data. This integrated approach incorporates constraints from protein interaction network and efficiently discovers all patterns (groups of genes) in a bottom-up fashion from the gene-expression data. We call these patterns active sub-network biomarkers. To illustrate the efficacy of our proposed approach, we used four breast cancer gene expression data sets and a human protein interaction network and showed that active sub-network biomarkers are more biologically plausible and genes discovered are more reproducible across studies. Finally, through pathway analysis, we also showed a substantial enrichment for known cancer genes and hence were able to generate relevant hypotheses for understanding the molecular mechanisms of breast cancer metastasis.