Usage Meets Link Analysis: Towards Improving Site Specific and Intranet Search via Usage Statistics

Date of Submission: 
May 24, 2004
Report Number: 
04-019
Report PDF: 
Abstract: 
In this paper, we explore the possibility of incorporating usage statistics to improve ranking quality in site specific and intranet search engines. We introduce a number of usage based ranking approaches including a PageRank extension, Usage aware PageRank (UPR), an extension to HITS (UHITS), and a naive approach that uses number of visits to pages as a quality measure. We compare these methods against each other and against two major link analysis approaches (PageRank and HITS). We investigate weighting schemes that take into account the probability of visiting a page directly (by typing or via bookmarks), as well as the relative probability of following a particular link from a given page. Both of these probabilities can be approximated from usage logs. We developed a site specific search engine (http://usearch.cs.umn.edu/), and incorporated the above methods. The parameter space for UPR and UHITS are sampled to examine the effects of varying usage emphasis factors. Experimental results are carried out on a medium size domain, cs.umn.edu, with 20K static web pages. We provide both global and query dependent comparisons. Experiments suggest that UPR is promising and has a number of desirable properties. It generalizes PageRank and inherits basic PageRank properties. It is also stable and flexible. The emphasis given to usage information is controlled via two parameters. If the parameters are set to zero, the algorithm reduces to the original PageRank algorithm; if they are set to one, the emphasis shifts to the usage graph; for values in between, both of the graphs are used with the specified weights. UPR is relatively inexpensive. The usage graph can be updated incrementally and efficiently as new usage information becomes available. A UPR iteration has a space/time complexity similar to a PageRank iteration.