Annotated Bibliography
    Home  Research  Bibliography  Personal                                  CS@UI   UI


Keywords:
TREC 2008, Sentiment Analysis, LSI, Named Entity Search, Relevance Feedback, Recommender Systems, SVM, Active Learning, Ranking, Statistical Language Models, Fraud Detection, Data Set, Inter-Domain Classification, Association Analysis, Structural Correspondence Learning, Sociology, Gender, Topic Detection, Folksonomy, Social Networks, Graph Models, Question Answering, Blog Mining, Computational Linguistics, Query Composition, Semantic Networks, Privacy, Wikipedia

TIP: To find something in particular just type it into a Find box of your browser.


  Mining Correlated Bursty Topic Patterns from Coordinated Text Streams
Wang, Zhai, Hu, Sproat 05/05/2009
KDD 2007 Blog Mining
The authors mine coordinated text streams using a generative topic model and and EM algorithm for "bursty" topics. They have some interesting techniques for correlating the streams and excluding document-specific terms (noise).


  Discovering Hot Topics in the Blogosphere
Platakis, Kotsakos, Gunopulos 04/29/2009
EURECA 2008 Blog Mining
The authors use Kleinberg's algorithm for finding bursty terms in hierarchical data streams. They compare their results to Blogoscope, an online system for the analysis of the blogosphere.


  On Ranking Controversies in Wikipedia: Models and Evaluation
Vuong, Lim, Sun, Le, Lauw, Chang 04/29/2009
WSDM 2008 Wikipedia
The paper defines controversial Wikipedia articles, and uses information about the revision history as well as revision history of an article's authors to rank the articles in degree of being controversial.


  Privacy-Enhancing Personalized Web Search
Yabo Xu, Benyu Zang, Zheng Chen, Ke Wang 04/20/2009
WWW 2007 Privacy
The paper proposes a way to build user profiles such that the user is able to specify the amount of information s/he wants to expose to the search engine.


  Extracting Semantic Networks from Text Via Relational Clustering
Stanley Kok and Pedro Domingos 04/13/2009
ECML PKDD 2008 Semantic Networks
An interesting paper that uses TextRunner to extract facts from text and then extracts semantic networks using Markov logic.


  Understanding the Relationship between Searchers' Queries and Information Goals
Doug Downey, Susan Dumais, Dan Liebling, Eric Horvitz 04/06/2009
CIKM 2008 Query Composition
This paper explores the user's behavior when they're searching for information to fill their "information need" - basically when they search using queries. Authors find that search engine perform poorly on less frequent queries. They explore different metrics for difficulty of queries ("information need").


  A Meta-Learning Approach for Robust Rank Learning
V. Carvalho, J. Elsas, W. Cohen, J. Carbonell 03/30/2009
SIGIR 2008 LR4IR Ranking
The paper proposes a meta-learner that improves ranking results for some algorithms (perceptron, ListNet), and perhaps even for RankSVM. If the (rather simple) meta-learner can in fact make simple algorithms as effective as a complicated RankSVM, it could be of real note.


  Annotating Expressions of Opinions and Emotions in Language
Janyce M Wiebe, Theresa Wilson, Claire Cardie 03/19/2009
Language Resources and Evaluation 2005 Computational Linguistics, Sentiment Analysis
Describes a detailed annotation of a 10,000-sentence corpus of articles drawn from the world press. Annotations are for extraction private states (which encompasses opinions, emotions, sentiments, speculations, evaluations). The annotated data is available online.


  Tracking Point of View in Narrative
Janyce M Wiebe 03/17/2009
Computational Linguistics Computational Linguistics, Sentiment Analysis
Proposes an algorithm for annotating passages for a character's psychological point of view. Plenty of examples are given of the annotations.


  Multidemensional Text Analysis for eRulemaking
N. Kwon, S. Shulman, E. Hovy 03/15/2009
DGSNA 2006 Sentiment Analysis
The proposed system extracts the semantic structure of arguments (as pertaining to a particular rule/law), and determines the sentiment category (support the regulation, oppose the regulation, and propose a new idea).


  Mining and Summarizing Customer Reviews
Minquing Hu, Bing Liu 03/14/2009
KDD 2004 Sentiment Analysis
The authors mine features as well as sentiments about them from reviews. Using POS tagging as well as some association mining (CBA), they come up with features. Then they use WordNet's similarity/antonymity word features to determine the polarity of the adjectives around the features. They produce a nice feature-specific product summary.


  Opinion Retrieval from Blogs
Wei Zhang, Clement Yu, Weiyi Meng 03/17/2009
CIKM 2007 Sentiment Analysis, Blog Mining
The authors use SVMs for sentiment classification.


  Combining Low-Level and Summary Representations of Opinions for Multi-Perspective Question Answering
C. Cardie, Wiebe, Wilson, Litman 03/17/2009
AAAI Symposium on New Directions in Question Answering 2003 Sentiment Analysis, Question Answering
The authors approach question answering as an opinion-oriented information extraction. They propose an annotation scheme developed for "low-level representation of opinions". This is an abstract framework for a system, so it still needs a few specific techniques to have an implementable system.


  Graphical Models in a Nutshell
D. Koller, N. Friedman, L. Getoor, B. Taskar 03/24/2009
(book) Introduction to Statistical Relational Learning Graph Models
This chapter introduces graphical models, mainly Bayesian Networks and Markov Models, and some inference algorithms.


  Scalable Community Discovery on Textual Data with Relations
H. Li, Z. Nie, W. Lee, C. Giles, J. Wen 03/23/2009
CIKM 2008 Social Networks
The paper describes a scalable model for extracting social networks from linked collections. It breaks the link graph into neighborhoods, and uses LDA only on those small sub-sets, thus making it more tractable.


  SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining
Andrea Esuli and Fabrizio Sebastiani 03/23/2009
LREC 2006 Sentiment Analysis
This paper describes the process of making a word list by using WordNet. The result of this process is SentiWordNet, a collection of words annotated on positive/negative and objective/subjective scales.


  On the Effect of Group Structures on Ranking Strategies in Folksonomies
Abel, Henze, Krause, Kriesell 02/23/2009
WWW 2008 Folksonomy, Ranking
This paper uses data from GroupMe! website to use group assignments and group labels to enrich the labeling of various resources (web pages, images, video, etc.) Not surprisingly, the additional information helps in ranking the tags (the folksonomy) in order of relevance to a topic.


  Hierarchical Topic Detection in TDT-2004
Ao Feng, James Allan 02/16/2009
CIIR Technical Report Topic Detection
Hierarchical Topic Detection poses interesting question - what is a topical hierarchy? How do you define its levels? This paper addresses these questions and proposes a method based on story agglomeration over time, the premise being that stories about the same topic tend to be close in time.


  UMass at TDT 2004
Conell, Feng, Kumaran, Raghavan, Shah, Allan 02/16/2009
TDT 2004 Workshop Topic Detection
This paper describes UMass submission to TDT 2004. Experiments in Hierarchical Topic Detection, Topic Tracking, New Event Detection, and Link Detection are described.


  Gender and Emotion in the United States: Do Men and Women Differ in Self-Reports of Feelings and Expressive Behavior?
Robin W Simon, Leda E Nath 02/15/2009
American Journal of Sociology Vol 109 No 5 Sociology, Gender
A study on the gender-specific emotion behaviors, it uses a survey to collect data about the recent emotional experiences of the subjects. Such self-reporting mechanism is a secondary analysis of states that are best observed first-hand. I'd say this is a good motivator for emotional analysis of text such as blogs, which I believe is closer to the moment of emoting than a survey.


  Domain Adaptation with Structural Correspondence Learning
John Biltzer, Ryan McDonald, Fernando Pereira 02/04/2009
EMNLP 2006 Structural Correspondence Learning
The writers propose a structural correspondence learning model that has a notion of pivot features at its heart. The task is to transfer knowledge from one domain (labeled) to another (unlabeled). Pivot features are "features which occur frequently in the two domains and behave similarly in both. The testing is done on two collections - one from Wall Street Journal and one from MEDLINE. The technique seems to be quite general and applicable to all kinds of features.


  Selecting the right objective measure for association analysis
Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava 02/02/2009
Information Systems (2004) Association Analysis
This is a wonderful overview (and resource) of objective measures for association patterns. The authors propose some general properties for the measures. The measure satisfying the most properties should be the most "fair". Yet one must not forget the requirements of the task at hand. Some of the perkier measures can be useful in special cases. The measures are grouped by their properties, and some techniques are proposed for standardizing data to produce better measurements.


  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
John Blitzer, Mark Dredze, Fernando Pereira 02/02/2009
ICML 2006 Sentiment Analysis, Inter-Domain Classification
The researchers in this paper use their previously-developed structural correspondence learning (SCL) algorithm, showing significant reductions in classification over a baseline. Also propose a way of selecting unlabeled data for labeling in order to affect classification accuracy.


  BLEWS: Using Blogs to Provide Context for News Articles
Gamon, Basu, Belenko, Fisher, Hurst, Konig 12/04/2008
ICWSM Sentiment Analysis
This is a paper of a team from Microsoft Research and Live Labs describing their system for extracting social context for political news articles using the blogosphere. There is an interesting bit on duplicate article detection that uses n-grams (of n>10). But the team backed away from actually using sentiment polarity as a part of the analysis providing some convincing quotes from their dataset. Instead, they used "emotional charge" quantity to show how emotionally agitated the blog writers are about the news article.


  LETOR: A Benchmark Collection for Learning to Rank for Information Retrieval
Microsoft Research Asia 11/05/2008
draft paper Data Set, Ranking
LETOR is a benchmark dataset developed for Learning to Rank for Information Retrieval. This paper talks about the latest release of LETOR - LETOR3.0. What is great about it is that the features are already extracted for the data - this saves a lot of time for us researchers. On the other hand, it is always iffy to have somebody else to mess with your work (and do some computing you have no control over). Still, this is a well-respected dataset. Formatted for easy integration with SVMlight, it is a valuable resource for all of us mining enthusiasts.


  A Statistical Language Modeling Approach to Online Deception Detection
Lina Zhou, Zongmei Shi, Dongsong Zhang 10/29/2008
IEEE Transactions on Knowledge and Data Engineering Statistical Language Models, Fraud Detection
The paper describes the use of Statistical Language Models for detecting online deception. Deception here defined as "information being intentionally transmitted to cause false conclusion", a narrowing of the typical deception notion. They use an interesting smoothing method in their model - Kneser-Ney, which seems to work like a discounted window. They use SVMlight (with linear kernel) as a baseline. They show that the language model outperforms SVM in unigrams and bigrams, though some values in the results section are curious. They also mention type-token ratio as a way of analyzing the data set.


  Topic Models and a Revisit of Text-related Applications
Viet Ha-Thuc, Padmini Srinivasan 10/22/2008
PIKM '08 LSI
Viet presented this paper, which is to appear in PIKM'08. The paper addresses two limitations of topic models - limited scalability and inability to model relevance. The scalability issue addressed using a two-phase topic discovery technique. Also a new topic model is presented where relevance is modeled by separating relevant topic from others.


  Learning SVM Ranking Function from User Feedback Using Document Metadata and Active Learning in the Biomedical Domain
Robert Arens 10/15/2008
ECML Workshop SVM, Active Learning, Ranking
Robert did a wonderful job presenting his paper for our reading group. His research concerns learning from explicit feedback using SVMs. One surprising finding was that simple learning techniques outperform the more complex ones (not to mention have a much better run time). Intriguing future work concerns strategies for dealing with bad user feedback and large collections of unlabled or sparsely labled data.


  Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs
Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai 10/08/2008
WWW 2007 Sentiment Analysis, LSI
The paper proposes a probabilistic model "to capture the mixture of topics and sentiments simultaneously". They use a mixture of a background topic, several subtopics and positive and negative sentiment topics. They learn the model distributions for these using an EM algorithm. A Hidden Markov Model is then used to derive temporal development of the subtopics and their sentiments. This model can be used to rank sentences for topics, categorize sentences by sentiments, and reveal the overall opinions for documents or topics. This approach seems quite general, and is fertile in possible expansions.


  EigenRank: A Ranking-Oriented Approach to Collaborative Filtering
Nathan Liu, Qiang Yang 10/01/2008
SIGIR 2008 Recommender Systems
The paper proposes a collaborative filtering algorithm based on user ranking of the items (instead of the customary ratings). This way, it is argued, we capture the true preferences of the users. Kendall rank correlation coefficient is used for determining similarity between the rankings. The new ranking is compiled using two strategies: (1) a greedy order algorithm and (2) a random walk in a graph of neighbors (thus the connection with PageRank). Although the experimental data doesn't look too convincing (no mention of significance anywhere), and the random walk algorithm seemed too unstable to be tested only once (no folding was done), the paper had interesting concept. Also it would have been interesting to see what kind of computation time their algorithms had (cause it seemed like they may be very involved). A computation time to improvement comparison would have been great.


  Opinion Mining and Sentiment Analysis
Bo Pang and Lillian Lee 09/27/2008
Foundations and Trends in Information Retrieval 2008 Sentiment Analysis
A recent overview (2008) of Sentiment Analysis research.


  Mining Newsgroups Using Networks Arising From Social Behavior
Rakesh Agrawal, Sridhar Rajagopalan, Ramakrishnan Srikant, Yirong Xu 09/25/2008
WWW 2003 Sentiment Analysis
This team discovers some interesting characteristics of newsgroups (essentially blogs): "The relationship between the two individuals in the newsgroup network is much more likely to be antagonistic than reinforcing." and "Many authors go off-topic and cause the discussion to drift off to an unrelated subject". They represent the blogs using a graph and use an eigenvector and Kernighan-Lin heuristic on top of spectral partitioning to figure out the bipartitions of the graph into groups of users with similar opinions.


  Latent Semantic Indexing: An Overview
Barbara Rosario 09/27/2008
INFOSYS 240 Spring 2000 LSI
An overview of Latent Semantic Indexing.


  Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search.
Dmitri Kalashnikov, Rabia Nuray-Turan, Sharad Mehrotra 9/24/2008
SIGIR 2008 Named Entity Search
The paper presents an approach to Web People Search (WePS) involving a skyline-based classifier. They say this classifier is well-stydied, but nobody in our research group has ever heard of it. Using co-occurrence measurements they train thier skyline classifier to maximize two measures: F score and B-cubed. They compare this approach to an SVM (perhaps unfairly, since they have only 8 training features). In the end, their approach is horribly inefficient - it requires almost 35000 queries to a search engine for each name.


  Selecting Good Expansion Terms for Pseudo-Relevance Feedback
Guihong Cao, Jian-Yun Nie, Jianfeng Gao, Stephen Robertson 9/17/2008
SIGIR 2008 Relevance Feedback
The authors show that a commonly held belief that the most frequent terms in top retrieved documents are relevant to the query is not correct. In fact, there are many non-relevant terms among these. They propose a term classification process to predict the usefulness of terms. Term distribution and co-occurrence with original query terms are some of the features used in training. They achive a substantial increase in performance using this technique.


  UIC at TREC 2007 Blog Track
Wei Zhang, Clement Yu 9/10/2008
TREC 2007 Sentiment Analysis
The UIC developed a three-step algorithm for the opinion task of the Blog track. This task requires the teams to retrieve the documents relevant to a query and rank them as opinionated or objective, and then for objective either negative, positive, or neutral. They use an SVM classifier for recognizing opinionated terms, thus providing a means to rate each document appropriately.


Last Updated 05|07|2009