Position Paper for the
CHI 97 Basic Research Symposium
(March 22-23, 1997, Atlanta, GA)
The GroupLens Research Project: Exploring Collaborative Filtering
Joseph A. Konstan, John Riedl, and Bradley N. Miller
University of Minnesota
Department of Computer Science
200 Union Street SE -- Rm 4-192
Minneapolis, MN 55455
E-mail: {konstan,
riedl,
bmiller}@cs.umn.edu
URL:
http://www.cs.umn.edu/Research/GroupLens
also
Net Perceptions, Inc.
11200 West 78th Street, Suite 300
Eden Prairie, MN 55344-3814
E-mail: {konstan,
riedl,
bmiller}@netperceptions.com
URL:
http://www.netperceptions.com
Index
Abstract || Keywords ||
Position Statement || References
Abstract
Collaborative filtering attempts to address information overload by forming recommendations
based on the opinions of other people who have seen information items. The GroupLens project
provides personalized collaborative filtering for Usenet news. Personalization is based on a
personal set of "neighbors" chosen based on prior patterns of agreement.
Initial GroupLens project trials have shown that the system provides useful recommendations and
that it can be implemented efficiently. They have also allowed us to test several hypotheses
about measures of opinion and agreement. We are now beginning a multi-year project to
explore several other research questions.
Keywords
Collaborative filtering, information filtering, information overload.
Position Statement
Information overload is a significant problem for today's consumers of information. The
"computer revolution" has created an enormous wealth of available data, but this volume of
data is often too great for humans to effectively use. There are many different approaches
to sifting through immense data sets, including search and visualization techniques, programmable
or learning agents that detect items of interest, and informal social techniques in which friends
and colleagues recommend items of interest to each other. The term "collaborative filtering"
encompasses a range of formalized social techniques that capture the opinions of individuals
who consume a piece of information and use these opinions to form recommendations for other
information consumers.
The GroupLens Project
The GroupLens project, started in 1992 by Paul Resnick and John Riedl, has focused on applying
collaborative filtering to Usenet news, a high-volume, high-noise set of discussion groups
distributed across the Internet. Several characteristics make Usenet an interesting research
area for collaborative filtering:
- The large number of users and news postings provide a rich source of data and a challenge
for real-time implementation.
- The short lifetime of articles places greater demands on the speed with which new opinions
are digested and in turn affect recommendations.
- The relative sparseness of the opinion matrix (i.e., the fact that most people read only a small
fraction of the available articles) lets us explore the greater challenge of designing algorithms
that operate on very sparce sets of opinion data.
- The hierarchical categories of newsgroups allow us to test hypotheses about how relevant user
agreement in one category is in making recommendations in related and unrelated categories.
- The dual organizations of Usenet news--loose temporal order and discussion threads--allow us
to explore effective presentation of recommendations to users.
- The diversity of content and subject area allow us to test hypotheses about the value of
collaborative filtering in moderated and unmoderated discussions, question and answer lists,
structured bulletin boards, and other forms of communication.
GroupLens research is an on-going project that has already demonstrated some significant results.
Among the achievements of the project so far are the following:
- User trials to demonstrate the the system does indeed work in a useful way.
- Algorithm analysis comparing the effectiveness of several different algorithms for
making recomendations.
- Data analysis showing that time spent reading an article is a useful implicit measure of
a user's opinion of the value of an article.
- Data analysis showing that agreement with users in one newsgroup is not generally predictive
of agreement in other newsgroups.
- Design and evaluation of a robust architecture that can be scaled to serve large user
communities and large sets of items.
In addition, this research has led to the creation of a start-up software company, Net Perceptions,
that is commercializing a collaborative filtering toolkit and server for a wider variety of
applications.
Research Questions
As we proceed forward with further research, there are many key questions that still remain to
be answered both in the domain of Usenet news and more widely. We are at the beginning of a new
multi-year effort to gather data and study this set of topics:
- Discussion Threads. Are user opinions of articles within threads significantly more
consistent than ratings of articles from different threads but the same newsgroup? If so, how
does thread identity compare with other topic measures (e.g., keyword matches) for creating
domains of consistency? Also, what are effective user interfaces for displaying recommendations
to users who read news by thread? Many newsreaders display only a single line for each thread,
should that line represent the average recommendation, the best, or something else?
- Implicit Measures of Opinion. We have started to examine the value of time spent
reading as an implicit measure of opinion and have been encouraged by our results. There are
many other observable user actions that could correlate well with user opinion, including actions
such as saving, printing, or forwarding an article; replying to an article; and "killing" a discussion
thread. We are interested in learning how effective these implicit measures can be, both in
isolation and together as a replacement for or supplement to explicit ratings.
- Effect of Recommendation Display on Decision-Making Tasks. From discussions with users,
we hypothesize that there are several largely distinct news reading styles that are supported
differently by different news readers. Included are different goals relating to the relative
value and cost users associate with reading and missing good and bad articles, and the time users
choose to spend reading. We are investigating how the type of recommendation display affects
performance in different article selection tasks.
- Measures of Confidence. Our algorithms (and other collaborative filtering algorithms)
provide only rough measures of the confidence with which a recommendation is made. We are
interested in studying all aspects of the confidence issue, including investigating better
ways of calculating confidence and exploring the way displays of confidence affect user
behavior and perceptions.
- Additional Tools, Techniques, and Algorithms. We are continuing to investigate a wide
range of systems design issues including algorithms, integration with other recommendation systems,
and general interfaces.
Research Methods
There are three primary methods that are employed in this research: user tests, controlled trials,
and open trials. We use user tests to evaluate display effectiveness for a task, and for other
research questions that are difficult to evaluate from wider tests without confounding data. We
use controlled trials--trials where users are asked to read and rate a specific set of articles
without seeing recommendations--to create a complete matrix testbed from which controlled
experiments can be run (e.g., evaluating prediction accuracy for different algorithms at different
ratings densities). We use the open trials to gather real-world results including information
about real system usage (from trace logs) and real system performance and accuracy (from
retrospective analyses of trace log data). We also consider the open trial to be a valuable
service to the news reading community.
Why Present This at the Basic Research Symposium
We believe that there are several ways in which this research project can benefit from the
ideas and thoughts of Basic Research Symposium participants. In particular, we hope to obtain
feedback on these issues:
- Identifying specific research questions that are more widely applicable and that merit
specific research focus. The number of research questions available far outstrips our resources
to address them. We have fairly good feedback from the user community and the commercial
community on their priorities, but we seek input from the wider research community.
- Balancing the benefits of controlled and open trials. We are frequently torn between
the expreimental "cleanliness" of controlled trials and the real-world data of open trials. In
the Usenet news domain, no controlled trial can accurately reflect the fact that users make real
choices when reading news, and that those choices reflect a wide range of input including time,
subject, mood, and any displayed recommendation.
- Identifying requirements for a useful collaborative filtering corpus. It is our hope
to eventually make a database of trace log data available for other researchers, and seek
recommendations for how to make this database most useful.
We also hope that this research is of interest to BRS participants. We find collaborative
filtering to be an exciting research area with many challenging research questions and
many exciting applications.
References
P. Resnick, N. Iacovou, M. Sushak, P. Bergstrom, and J. Riedl. "GroupLens:
An Open Architecture for Collaborative Filtering of Netnews," Proceedings
of the 1994 Computer Supported Cooperative Work Conference, ACM, 1994.
J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl.
"GroupLens: Collaborative Filtering for Usenet News," to appear in
Communications of the ACM special issue on collaborative filtering,
March 1997.
B. Miller, J. Riedl, and J. Konstan. "Experiences with GroupLens: Making
Usenet Useful Again," Proceedings of the Usenix 1997 Winter Technical
Conference, Anaheim, CA, January 1997.
GroupLens Research Project Home Page. URL: http://www.cs.umn.edu/Research/GroupLens
Return to Top of Page || Index.