Semi-Automatic Query Expansion using
Most Discriminant Words
Alessio Signorini
University of Iowa, Computer Science
alessio-signorini@uiowa.edu


Download the article
  PDF Version     PS Version
Abstract
Most casual users of IR systems type short queries. With current indexing technologies, short queries return enormous amount of results that may be impossible to examinate carefully. If users not find what they are looking for between the first 10/100 results, they may stop searching, losing the important results wrongly ranked by the search engine. While users generally know what they are looking for, the task of express their desires in a compact, precise, written form, i.e. the query, represent a real problem. Word usage is in fact both domain and user dependent, and may easily mislead the search engine. In this report I investigate a new method for query expansion, that exploiting the user's feedback on some discriminant words, try to increase the focus on the user's query domain.
Introduction
Recent studies demonstrated that users tend to write short queries for their searches, often of just 1-2 terms. In addition, users don't like to use boolean forms, and they surely don't want to attribute weights to their query terms.

The use of a language imply the knowledge of its words, unfortunately, those words my have more than one meaning in according to the context in which they are used. While users generally know what they are looking for, the task of expressing their desires in a compact, precise, written form, i.e. the query, become a real problem.

Domain dependent words may confuse the search engine, and lead it to return results from the wrong domain. In the published literature, several methods have been proposed to expand the query, in such a way to refine the results, and target the focus on the domain. Some methods suggest to expand the query with new terms extracted from relevant documents, others, propose to substitute some of them with domain-specific terms, to better specify the target of the query.

In this report I describe a new approach to the query expansion problem. The method uses the user's feedback on some words, identified by the system, to "discriminate" between different domains.
Download
A downloadable package will be available soon...
Bibliography
Updated soon...