Characterizing Pattern based Clustering

Date of Submission: 
April 19, 2005
Report Number: 
05-015
Report PDF: 
Abstract: 
Recently, there has been considerable interest in using association patterns for clustering. Although several interesting algorithms have been developed, further investigation is needed to characterize (1) the benefits of using association patterns and (2) the most effective way of using them for clustering. To that end, we present a new clustering technique, bisecting K-means Clustering with pAttern Preservation (K-CAP), which exploits key properties of the hyperclique association pattern and bisecting k-means. Experimental results on document data show that, in terms of entropy, K-CAP can perform substantially better than the standard bisecting k-means algorithm when data sets contain clusters of widely different sizes--the typical situation. Furthermore, because hyperclique patterns can be found much more efficiently than other types of association patterns, K-CAP retains the appealing computational efficiency of bisecting k-means.