Topic-driven Clustering for Document Datasets

Date of Submission: 
April 22, 2005
Report Number: 
05-017
Report PDF: 
Abstract: 
In this paper, we define the problem of topic-driven clustering, which organizes a document collection according to a given set of topics (either from domain experts, or as a requirement satisfying users' needs). We propose three topic-driven schemes that consider the similarity between the document to its topic and the relationship among the documents within the same cluster and from different clusters simultaneously. We present the experimental results of the proposed topic-driven schemes on five datasets. Our experimental results show that the proposed topic-driven schemes are efficient and effective with topic prototypes of different levels of specificity.