|
Comprehensive Exam. The Comprehensive Exam in
Computer Science is meant to be a literature review for a research area, and the first chapter
in my thesis. Mine is about Sentiment Analysis, and the paper can be found
here.
ESSIR 2009 -
European Summer School in Information Retrieval.
This one was a blast. Plenty of lectures from well-known (and lesser-known) researchers, and
a ton of gifted graduate students. Afterward I went to Venice, saw the drowning city. The
food was amazing.
TREC 2009 Blog Mining/Retrieval.
Our emphasis in this task has been on determining the
importance of news headlines by looking at the blogosphere. By extracting citations to the
news articles and analyzing the text around them, we ranked New York Times headlines for a
certain day. Surprisingly, our system also did well in providing a diverse set of blog posts
for each news story (i.e. discussing different aspects of the news story), since we did not
explicitly address this issue. More work needs to be done on the exact definition of "importance"
of news stories in a day - is it what a newspaper editor would put on the main page or what
people talk about on the web? [paper]
TREC 2009 Chemical Retrieval.
This is a new track in TREC 2009, dealing with the retrieval
of US and European patents that have to do with chemistry. We approached this task from the IR
point of view, not as chemists. By paying special attention to the claims made in the patent
(arguably the most important part of a patent), we were able to retrieve relevant patents for the
Prior Art task. Prior Art check happens when a patent examiner checks whether somebody has done
something like what a new patent is claiming to do before (thus invalidating the present one).
Using classification hierarchy for patents, we improved the performance of our system dramatically.
[paper]
ICWSM 2009 Topic Tracking in Blogs. Using
Viet's relevance-based language modeling approach
to track events in a collection of data provided by Spinn3r for ICWSM09. This collection has
60 million blog posts spanning two months in 2008. We've also arranged the events into a hierarchy
and illustrated how the relevance model is able to handle gracefully the difference in scope
by pushing common terms to the "background". [paper]
TREC 2008 Relevance Feedback Track.
The purpose of this track is to "provide a framework for
exploring the effects of different factors on the success of relevance feedback". The data set
provided for this task was GOV2, a partial crawl of the web representing the US government. This
data set contains over 25 million documents and takes up nearly half of a terabyte of space. We
used a Java indexing tool Lucene to index the
documents. Using this index we were able to run queries, and pick the best ones for the task
at hand. We then used a topic model developed by Ha
Thuc Viet to use the relevance feedback information for quiery improvement.
[paper]
Last Updated 11|18|2009
|