|
[an error occurred while processing this directive]
|
|
Text Retrieval as Risk Minimization
ChengXiang Zhai
School of Computer Science
Carnegie Mellon University
Tuesday, April 2
4:00-4:50pm,
15 SH
Abstract
With the dramatic increase in online information in recent years, text
retrieval is becoming increasingly important. It is a significant
scientific challenge to develop principled approaches to information
retrieval that also perform well empirically. In this talk, I will
present a new text retrieval framework based on Bayesian decision
theory that unifies several existing retrieval models within a general
probabilistic framework, and that facilitates the development of new
principled approaches to text retrieval. In this framework, queries
and documents are modeled using statistical language models (i.e.,
probabilistic models of text), and retrieval is cast as a risk
minimization problem. While traditional retrieval models rely heavily
on ad hoc parameter tuning to achieve satisfactory retrieval
performance, the use of language models makes it possible to exploit
statistical estimation methods to improve retrieval performance and
set retrieval parameters automatically. I will present a two-stage
language model that, according to extensive evaluation, achieves
excellent retrieval performance without any ad hoc parameter
tuning. The risk minimization retrieval framework further allows for
incorporating user factors beyond the traditional notion of relevance,
as will be demonstrated by language modeling methods that are used to
rank documents in terms of both relevance and sub-topic diversity.
|