University of Iowa homepage
 

 

Text Retrieval as Risk Minimization

ChengXiang Zhai
School of Computer Science
Carnegie Mellon University

Tuesday, April 2
4:00-4:50pm, 15 SH

Abstract

With the dramatic increase in online information in recent years, text retrieval is becoming increasingly important. It is a significant scientific challenge to develop principled approaches to information retrieval that also perform well empirically. In this talk, I will present a new text retrieval framework based on Bayesian decision theory that unifies several existing retrieval models within a general probabilistic framework, and that facilitates the development of new principled approaches to text retrieval. In this framework, queries and documents are modeled using statistical language models (i.e., probabilistic models of text), and retrieval is cast as a risk minimization problem. While traditional retrieval models rely heavily on ad hoc parameter tuning to achieve satisfactory retrieval performance, the use of language models makes it possible to exploit statistical estimation methods to improve retrieval performance and set retrieval parameters automatically. I will present a two-stage language model that, according to extensive evaluation, achieves excellent retrieval performance without any ad hoc parameter tuning. The risk minimization retrieval framework further allows for incorporating user factors beyond the traditional notion of relevance, as will be demonstrated by language modeling methods that are used to rank documents in terms of both relevance and sub-topic diversity.

 

Thursday, October 07, 2004, 10:21:31.
University of Iowa Logo College of Liberal Arts and Sciences Logo Computing Research Association Logo Association for Computing Machinery Logo
Translate this page automatically.
 
©2005 The University of Iowa, All Rights Reserved.