Topic-driven Clustering for Document Datasets
Date of Submission:
April 22, 2005
In this paper, we define the problem of topic-driven clustering, which organizes a document collection according to a given set of topics (either from domain experts, or as a requirement satisfying users' needs). We propose three topic-driven schemes that consider the similarity between the document to its topic and the relationship among the documents within the same cluster and from different clusters simultaneously. We present the experimental results of the proposed topic-driven schemes on five datasets. Our experimental results show that the proposed topic-driven schemes are efficient and effective with topic prototypes of different levels of specificity.