NSF Logo

Generalization of the Association Analysis Framework

National Science Foundation Award Number: IIS-0916439 (August 1, 2009 - July 31, 2012)



Personnel:

Vipin Kumar, PI
Department of Computer Science and Engineering
4-192, EE/CSci Building
University of Minnesota
Minneapolis, MN 55455
Phone (612) 625 0726
E-mail: kumar at cs.umn.edu     URL: http://www.cs.umn.edu/~kumar

Michael Steinbach, co-PI
Department of Computer Science and Engineering
5-225C, EE/CSci Building
University of Minnesota
Minneapolis, MN 55455
Phone (612) 626-7503
E-mail: steinbac at cs.umn.edu     URL: http://www-users.cs.umn.edu/~steinbac/

List of Supported Students:

Graduate student(s): Undergraduate Students:

Collaborators:


Webpage:

http://www-users.cs.umn.edu/~kumar/iis09.html

Project Activities and Findings:

The area of data mining known as association analysis seeks to find patterns that describe the relationships among the binary attributes (variables) used to characterize a set of objects. The iconic example is market basket data, where the objects are transactions consisting of sets of items purchased by a customer, and the attributes are binary variables that indicate whether or not an item was purchased by a particular customer. The patterns are either sets of items that are frequently purchased together (frequent itemset patterns) or rules that capture the fact that the purchase of one set of items often implies the purchase of a second set of items (association rule patterns). A key strength of association pattern mining is that the potentially exponential nature of the search can often be made tractable by using support based pruning of patterns, i.e., eliminating patterns supported by few transactions. Efforts to date have created a well-developed conceptual (theoretical) foundation and an efficient set of algorithms. The framework that has been created has been extended well beyond the original application to market basket data to encompass new applications.

Despite the solid foundations of association analysis and the potential economic and intellectual benefits of pattern discovery and its various applications, this group of techniques is not widely used as a data analysis tool in most scientific and commercial domains. The reason is that there are many areas, such as those involving continuous and dense data with labels, where such techniques would be very useful, but cannot currently be easily and effectively applied. Our work on this project aims to extend association analysis to be more widely applicable. Our focus has been on biomedical data, although most of our work could be adapted to non-biological data as well.

Publications:

  1. Gang Fang, Majda Haznadar, Wen Wang, Haoyu Yu, Michael Steinbach, Tim Church, William Oetting, Brian Van Ness and Vipin Kumar, High-order SNP Combinations Associated with Complex Diseases: Efficient Discovery, Statistical Power and Functional Interactions, PLoS ONE, 7(4): e33531. doi:10.1371/journal.pone.0033531, 2012
  2. Gang Fang, Gaurav Pandey, Wen Wang, Manish Gupta, Michael Steinbach, Vipin Kumar, Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol 24(2), p 279-294, 2012
  3. Sanjoy Dey, Gowtham Atluri, Michael Steinbach, and Vipin Kumar, A pattern mining based integrative framework for biomarker discovery, ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM-BCB 2012), October 7-10, Orlando, FL, 2012 (to appear).
  4. Tracy L. Bergemann, Timothy K. Starr, Haoyu Yu, Michael Steinbach, Jesse Erdmann, Yun Chen, Robert T. Cormier, David A. Largaespada, and Kevin A. T. Silverstein, New methods for finding common insertion sites and co-occurring common insertion sites in transposon- and virus-based genetic screens, Nucleic Acids Res. 2012 May; 40(9): 3822– 3833
  5. Pandey G., Manocha S., Atluri G., Kumar V., Enhancing the functional content of protein interaction networks, Technical Report 12-001, Computer Science, University of Minnesota
  6. Gang Fang, Wen Wang, Benjamin Oatley, Brian Van Ness, Michael Steinbach and Vipin Kumar, Characterizing Discriminative Patterns , Manuscript, arXiv: 1102.4104, communicated Feb 2011.
  7. Gang Fang, Wen Wang, Vanja Paunic, Benjamin Oately, Majda Haznadar, Michael Steinbach, Brian Van Ness, Chad L. Myers and Vipin Kumar, Construction and Functional Analysis of Human Genetic Interaction Networks with Genome-wide Association Data .
  8. Gang Fang, Michael Steinbach, Chad L. Myers and Vipin Kumar, Integration of Differential Gene-combination Search and Gene Set Enrichment Analysis: A General Approach.
  9. Michael Steinbach, Haoyu Yu, Gang Fang, Vipin Kumar, Using Constraints to Generate and Explore Higher Order Discriminative Patterns, 15th Pacific-Asia Conference on Knowledge Discovery in Databases (PAKDD 2011) Shenzhen, China, pp. 338-350, May 24-27.
  10. Michael Steinbach, Haoyu Yu, and Vipin Kumar, Identification of Co-occurring Insertions in Cancer Genomes Using Association Analysis , International Journal of Data Mining and Bioinformatics special issue for 2nd International Workshop on Data Mining for Biomarker Discovery (DMBD 2010), to appear in 2011.
  11. Bonnie Westra, Sanjoy Dey, Gang Fang, Michael Steinbach, Kay Savik, Cristina Oancea and Vipin Kumar, Interpretable Predictive Models for Knowledge Discovery from Home Care Electronic Health Records, Journal of Healthcare Engineering, pp. 55-74, Volume 2, Number 1 / March 2011.
  12. Gowtham Atluri, Jeremy Bellay, Gaurav Pandey, Chad Myers, Vipin Kumar, Discovering Coherent Value Bicliques In Genetic Interaction Data , In Proceedings of 9th International Workshop on Data Mining in Bioinformatics (BIOKDD'10), held in conjunction with 16th ACM Conference on Knowledge Discovery and Data mining (KDD), Washington D.C, July 2010.
  13. Subspace Differential Coexpression Analysis: Problem Definition and a General Approach, Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers, and Vipin Kumar, Proceedings of the 15th Pacific Symposium on Biocomputing (PSB), 15:145-156, 2010.
  14. Gang Fang, Gaurav Pandey, Wen Wang, Manish Gupta, Michael Steinbach, Vipin Kumar, Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data, IEEE Transactions on Knowledge and Data Engineering (TKDE), to appear. Available in vol. 99, no. PrePrints, 2010.
  15. Rohit Gupta, Smita Agrawal, Navneet Rao, Ze Tian, Rui Kuang, Vipin Kumar, Integrative Biomarker Discovery for Breast Cancer Metastasis from Gene Expression and Protein Interaction Data Using Error-tolerant Pattern Mining, In Proceedings of the International Conference on Bioinformatics and Computational Biology (BICoB), March 2010
  16. Rohit Gupta, Navneet Rao, Vipin Kumar, Discovery of Error-tolerant Biclusters from Noisy Gene Expression Data In Proceedings of 9th International Workshop on Data Mining in Bioinformatics (BIOKDD'10), held in conjunction with 16th ACM Conference on Knowledge Discovery and Data mining (KDD), Washington D.C, July 2010.