Concept-Aware Ranking: Teaching an Old Graph New Moves

Date of Submission: 
March 20, 2006
Report Number: 
06-007
Report PDF: 
Abstract: 
Ranking algorithms for web graphs, such as PageRank and HITS, are typically utilized for graphs where a node represents a unique URL (Webpage) and an edge is an explicitly-defined link between two such nodes. In addition to the link structure itself, additional information about the relationship between nodes may be available. For example, anchor text in a Web graph is likely to provide information about the underlying concepts connecting URLs. In this paper, we propose an extension to the Web graph model to take into account conceptual information encoded by links in order to construct a new graph which is sensitive to the conceptual links between nodes. By extracting keywords and recurring phrases from the anchor tag data, a set of concepts is defined. A new definition of a node (one which encodes both an URL and concept) is then leveraged to create an entirely new Web graph, with edges representing both explicit and implicit conceptual links between nodes. In doing so, inter-concept relationships can be modeled and utilized when using graph ranking algorithms. This improves result accuracy by not only retrieving links which are more authoritative given a users' context, but also by utilizing a larger pool of web pages that are limited by concept-space, rather than keyword-space. This is illustrated using webpages from the University of Minnesota's College of Liberal Arts websites.