Yelena Mejova, Researcher
    Home  Research  Bibliography  Personal                                                   CS@UI   UI


Comprehensive Exam. The Comprehensive Exam in Computer Science is meant to be a literature review for a research area, and the first chapter in my thesis. Mine is about Sentiment Analysis, and the paper can be found here.

ESSIR 2009 - European Summer School in Information Retrieval. This one was a blast. Plenty of lectures from well-known (and lesser-known) researchers, and a ton of gifted graduate students. Afterward I went to Venice, saw the drowning city. The food was amazing.

TREC 2009 Blog Mining/Retrieval. Our emphasis in this task has been on determining the importance of news headlines by looking at the blogosphere. By extracting citations to the news articles and analyzing the text around them, we ranked New York Times headlines for a certain day. Surprisingly, our system also did well in providing a diverse set of blog posts for each news story (i.e. discussing different aspects of the news story), since we did not explicitly address this issue. More work needs to be done on the exact definition of "importance" of news stories in a day - is it what a newspaper editor would put on the main page or what people talk about on the web? [paper]

TREC 2009 Chemical Retrieval. This is a new track in TREC 2009, dealing with the retrieval of US and European patents that have to do with chemistry. We approached this task from the IR point of view, not as chemists. By paying special attention to the claims made in the patent (arguably the most important part of a patent), we were able to retrieve relevant patents for the Prior Art task. Prior Art check happens when a patent examiner checks whether somebody has done something like what a new patent is claiming to do before (thus invalidating the present one). Using classification hierarchy for patents, we improved the performance of our system dramatically. [paper]

ICWSM 2009 Topic Tracking in Blogs. Using Viet's relevance-based language modeling approach to track events in a collection of data provided by Spinn3r for ICWSM09. This collection has 60 million blog posts spanning two months in 2008. We've also arranged the events into a hierarchy and illustrated how the relevance model is able to handle gracefully the difference in scope by pushing common terms to the "background". [paper]

TREC 2008 Relevance Feedback Track. The purpose of this track is to "provide a framework for exploring the effects of different factors on the success of relevance feedback". The data set provided for this task was GOV2, a partial crawl of the web representing the US government. This data set contains over 25 million documents and takes up nearly half of a terabyte of space. We used a Java indexing tool Lucene to index the documents. Using this index we were able to run queries, and pick the best ones for the task at hand. We then used a topic model developed by Ha Thuc Viet to use the relevance feedback information for quiery improvement. [paper]

Last Updated 11|18|2009