Generalization of the Association Analysis Framework
National Science Foundation Award Number: IIS-0916439 (August 1, 2009 - July 31, 2012)
Personnel:
Vipin Kumar, PI
Department of Computer Science and Engineering
4-192, EE/CSci Building
University of Minnesota
Minneapolis, MN 55455
Phone (612) 625 0726
E-mail: kumar at cs.umn.edu URL: http://www.cs.umn.edu/~kumar
Michael Steinbach, co-PI
Department of Computer Science and Engineering
5-225C, EE/CSci Building
University of Minnesota
Minneapolis, MN 55455
Phone (612) 626-7503
E-mail: steinbac at cs.umn.edu URL: http://www-users.cs.umn.edu/~steinbac/
List of Supported Students:
Graduate student(s):
- Gang Fang
- Gowtham Atluri
- Rohit Gupta
- Sanjoy Dey
- Vanja Paunic
- Wen Wang
Undergraduate Students:
- Benjamin Oatley
- Xiaoye Liu
- Garima Sharma
- Matt Challou
- Jeremy Weed
- Joanna Fakhoury
- Karl Holub
Collaborators:
- Brian Van Ness , Professor, Department of Genetics, Cell Biology, and Development, University of Minnesota.
- Kelvin Lim, Professor, Department of Psychiatry, University of Minnesota.
- Angus MacDonald , Associate Professor of Psychology, University of Minnesota.
Webpage:
http://www-users.cs.umn.edu/~kumar/iis09.html
Project Activities and Findings:
The area of data mining known as association analysis seeks to find patterns that describe the
relationships among the binary attributes (variables) used to characterize a set of objects. The
iconic example is market basket data, where the objects are transactions consisting of sets of
items purchased by a customer, and the attributes are binary variables that indicate whether or not
an item was purchased by a particular customer. The patterns are either sets of items that are
frequently purchased together (frequent itemset patterns) or rules that capture the fact that the
purchase of one set of items often implies the purchase of a second set of items (association rule
patterns). A key strength of association pattern mining is that the potentially exponential nature of
the search can often be made tractable by using support based pruning of patterns, i.e.,
eliminating patterns supported by few transactions. Efforts to date have created a well-developed
conceptual (theoretical) foundation and an efficient set of algorithms. The framework that has
been created has been extended well beyond the original application to market basket data to
encompass new applications.
Despite the solid foundations of association analysis and the potential economic and
intellectual benefits of pattern discovery and its various applications, this group of techniques is
not widely used as a data analysis tool in most scientific and commercial domains. The reason is
that there are many areas, such as those involving continuous and dense data with labels, where
such techniques would be very useful, but cannot currently be easily and effectively applied. Our
work on this project aims to extend association analysis to be more widely applicable. Our focus
has been on biomedical data, although most of our work could be adapted to non-biological data
as well.
Publications:
- Gang Fang, Majda Haznadar, Wen Wang, Haoyu Yu, Michael Steinbach, Tim Church,
William Oetting, Brian Van Ness and Vipin Kumar, High-order SNP Combinations
Associated with Complex Diseases: Efficient Discovery, Statistical Power and Functional
Interactions, PLoS ONE, 7(4): e33531. doi:10.1371/journal.pone.0033531, 2012
- Gang Fang, Gaurav Pandey, Wen Wang, Manish Gupta, Michael Steinbach, Vipin Kumar,
Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data,
IEEE Transactions on Knowledge and Data Engineering (TKDE), vol 24(2), p 279-294,
2012
- Sanjoy Dey, Gowtham Atluri, Michael Steinbach, and Vipin Kumar, A pattern mining
based integrative framework for biomarker discovery, ACM Conference on Bioinformatics,
Computational Biology and Biomedicine (ACM-BCB 2012), October 7-10, Orlando, FL,
2012 (to appear).
- Tracy L. Bergemann, Timothy K. Starr, Haoyu Yu, Michael Steinbach, Jesse Erdmann,
Yun Chen, Robert T. Cormier, David A. Largaespada, and Kevin A. T. Silverstein, New
methods for finding common insertion sites and co-occurring common insertion sites in
transposon- and virus-based genetic screens, Nucleic Acids Res. 2012 May; 40(9): 3822–
3833
- Pandey G., Manocha S., Atluri G., Kumar V., Enhancing the functional content of
protein interaction networks, Technical Report 12-001, Computer Science, University of
Minnesota
- Gang Fang, Wen Wang, Benjamin Oatley, Brian Van Ness, Michael Steinbach and Vipin Kumar, Characterizing Discriminative Patterns , Manuscript, arXiv: 1102.4104, communicated Feb 2011.
- Gang Fang, Wen Wang, Vanja Paunic, Benjamin Oately, Majda Haznadar, Michael Steinbach, Brian Van Ness, Chad L. Myers and Vipin Kumar, Construction and Functional Analysis of Human Genetic Interaction Networks with Genome-wide Association Data .
- Gang Fang, Michael Steinbach, Chad L. Myers and Vipin Kumar, Integration of Differential Gene-combination Search and Gene Set Enrichment Analysis: A General Approach.
- Michael Steinbach, Haoyu Yu, Gang Fang, Vipin Kumar, Using Constraints to Generate and Explore Higher Order Discriminative Patterns, 15th Pacific-Asia Conference on Knowledge Discovery in Databases (PAKDD 2011) Shenzhen, China, pp. 338-350, May 24-27.
- Michael Steinbach, Haoyu Yu, and Vipin Kumar, Identification of Co-occurring Insertions in Cancer Genomes Using Association Analysis , International Journal of Data Mining and Bioinformatics special issue for 2nd International Workshop on Data Mining for Biomarker Discovery (DMBD 2010), to appear in 2011.
- Bonnie Westra, Sanjoy Dey, Gang Fang, Michael Steinbach, Kay Savik, Cristina Oancea and Vipin Kumar, Interpretable Predictive Models for Knowledge Discovery from Home Care Electronic Health Records, Journal of Healthcare Engineering, pp. 55-74, Volume 2, Number 1 / March 2011.
- Gowtham Atluri, Jeremy Bellay, Gaurav Pandey, Chad Myers, Vipin Kumar, Discovering Coherent Value Bicliques In Genetic Interaction Data , In Proceedings of 9th International Workshop on Data Mining in Bioinformatics (BIOKDD'10), held in conjunction with 16th ACM Conference on Knowledge Discovery and Data mining (KDD), Washington D.C, July 2010.
- Subspace Differential Coexpression Analysis: Problem Definition and a General Approach, Gang Fang, Rui Kuang, Gaurav Pandey, Michael Steinbach, Chad L. Myers, and Vipin Kumar, Proceedings of the 15th Pacific Symposium on Biocomputing (PSB), 15:145-156, 2010.
- Gang Fang, Gaurav Pandey, Wen Wang, Manish Gupta, Michael Steinbach, Vipin Kumar, Mining Low-Support Discriminative Patterns from Dense and High-Dimensional Data, IEEE Transactions on Knowledge and Data Engineering (TKDE), to appear. Available in vol. 99, no. PrePrints, 2010.
- Rohit Gupta, Smita Agrawal, Navneet Rao, Ze Tian, Rui Kuang, Vipin Kumar, Integrative Biomarker Discovery for Breast Cancer Metastasis from Gene Expression and Protein Interaction Data Using Error-tolerant Pattern Mining, In Proceedings of the International Conference on Bioinformatics and Computational Biology (BICoB), March 2010
- Rohit Gupta, Navneet Rao, Vipin Kumar, Discovery of Error-tolerant Biclusters from Noisy Gene Expression Data In Proceedings of 9th International Workshop on Data Mining in Bioinformatics (BIOKDD'10), held in conjunction with 16th ACM Conference on Knowledge Discovery and Data mining (KDD), Washington D.C, July 2010.