|
Keywords:
TREC 2008, Sentiment Analysis, LSI, Named Entity Search, Relevance Feedback, Recommender Systems,
SVM, Active Learning, Ranking, Statistical Language Models, Fraud Detection, Data Set,
Inter-Domain Classification, Association Analysis, Structural Correspondence Learning, Sociology,
Gender, Topic Detection, Folksonomy, Social Networks, Graph Models, Question Answering, Blog Mining,
Computational Linguistics, Query Composition, Semantic Networks, Privacy, Wikipedia
|
TIP: To find something in particular just type it into a Find box of your browser.
|
|
Mining Correlated Bursty Topic Patterns from Coordinated Text Streams
| |
Wang, Zhai, Hu, Sproat
|
05/05/2009
| |
KDD 2007
|
Blog Mining
| |
The authors mine coordinated text streams using a generative topic model and
and EM algorithm for "bursty" topics. They have some interesting techniques
for correlating the streams and excluding document-specific terms (noise).
|
|
Discovering Hot Topics in the Blogosphere
| |
Platakis, Kotsakos, Gunopulos
|
04/29/2009
| |
EURECA 2008
|
Blog Mining
| |
The authors use Kleinberg's algorithm for finding bursty terms in hierarchical
data streams. They compare their results to Blogoscope, an online system for
the analysis of the blogosphere.
|
|
On Ranking Controversies in Wikipedia: Models and Evaluation
| |
Vuong, Lim, Sun, Le, Lauw, Chang
|
04/29/2009
| |
WSDM 2008
|
Wikipedia
| |
The paper defines controversial Wikipedia articles, and uses information about
the revision history as well as revision history of an article's authors to
rank the articles in degree of being controversial.
|
|
Privacy-Enhancing Personalized Web Search
| |
Yabo Xu, Benyu Zang, Zheng Chen, Ke Wang
|
04/20/2009
| |
WWW 2007
|
Privacy
| |
The paper proposes a way to build user profiles such that the user is able to
specify the amount of information s/he wants to expose to the search engine.
|
|
Extracting Semantic Networks from Text Via Relational Clustering
| |
Stanley Kok and Pedro Domingos
|
04/13/2009
| |
ECML PKDD 2008
|
Semantic Networks
| |
An interesting paper that uses TextRunner to extract facts from text and then
extracts semantic networks using Markov logic.
|
|
Understanding the Relationship between Searchers' Queries and Information Goals
| |
Doug Downey, Susan Dumais, Dan Liebling, Eric Horvitz
|
04/06/2009
| |
CIKM 2008
|
Query Composition
| |
This paper explores the user's behavior when they're searching for information to
fill their "information need" - basically when they search using queries. Authors find
that search engine perform poorly on less frequent queries. They explore different
metrics for difficulty of queries ("information need").
|
|
A Meta-Learning Approach for Robust Rank Learning
| |
V. Carvalho, J. Elsas, W. Cohen, J. Carbonell
|
03/30/2009
| |
SIGIR 2008 LR4IR
|
Ranking
| |
The paper proposes a meta-learner that improves ranking results for some algorithms
(perceptron, ListNet), and perhaps even for RankSVM. If the (rather simple) meta-learner
can in fact make simple algorithms as effective as a complicated RankSVM, it could be
of real note.
|
|
Annotating Expressions of Opinions and Emotions in Language
| |
Janyce M Wiebe, Theresa Wilson, Claire Cardie
|
03/19/2009
| |
Language Resources and Evaluation 2005
|
Computational Linguistics, Sentiment Analysis
| |
Describes a detailed annotation of a 10,000-sentence corpus of articles drawn from
the world press. Annotations are for extraction private states (which encompasses
opinions, emotions, sentiments, speculations, evaluations). The annotated data is
available online.
|
|
Tracking Point of View in Narrative
| |
Janyce M Wiebe
|
03/17/2009
| |
Computational Linguistics
|
Computational Linguistics, Sentiment Analysis
| |
Proposes an algorithm for annotating passages for a character's psychological point
of view. Plenty of examples are given of the annotations.
|
|
Multidemensional Text Analysis for eRulemaking
| |
N. Kwon, S. Shulman, E. Hovy
|
03/15/2009
| |
DGSNA 2006
|
Sentiment Analysis
| |
The proposed system extracts the semantic structure of arguments (as pertaining to a
particular rule/law), and determines the sentiment category (support the regulation,
oppose the regulation, and propose a new idea).
|
|
Mining and Summarizing Customer Reviews
| |
Minquing Hu, Bing Liu
|
03/14/2009
| |
KDD 2004
|
Sentiment Analysis
| |
The authors mine features as well as sentiments about them from reviews. Using POS tagging
as well as some association mining (CBA), they come up with features. Then they use WordNet's
similarity/antonymity word features to determine the polarity of the adjectives around the
features. They produce a nice feature-specific product summary.
|
|
Opinion Retrieval from Blogs
| |
Wei Zhang, Clement Yu, Weiyi Meng
|
03/17/2009
| |
CIKM 2007
|
Sentiment Analysis, Blog Mining
| |
The authors use SVMs for sentiment classification.
|
|
Combining Low-Level and Summary Representations of Opinions for Multi-Perspective Question Answering
| |
C. Cardie, Wiebe, Wilson, Litman
|
03/17/2009
| |
AAAI Symposium on New Directions in Question Answering 2003
|
Sentiment Analysis, Question Answering
| |
The authors approach question answering as an opinion-oriented information extraction. They
propose an annotation scheme developed for "low-level representation of opinions". This is an
abstract framework for a system, so it still needs a few specific techniques to have an
implementable system.
|
|
Graphical Models in a Nutshell
| |
D. Koller, N. Friedman, L. Getoor, B. Taskar
|
03/24/2009
| |
(book) Introduction to Statistical Relational Learning
|
Graph Models
| |
This chapter introduces graphical models, mainly Bayesian Networks and Markov Models,
and some inference algorithms.
|
|
Scalable Community Discovery on Textual Data with Relations
| |
H. Li, Z. Nie, W. Lee, C. Giles, J. Wen
|
03/23/2009
| |
CIKM 2008
|
Social Networks
| |
The paper describes a scalable model for extracting social networks from linked
collections. It breaks the link graph into neighborhoods, and uses LDA only on
those small sub-sets, thus making it more tractable.
|
|
SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining
| |
Andrea Esuli and Fabrizio Sebastiani
|
03/23/2009
| |
LREC 2006
|
Sentiment Analysis
| |
This paper describes the process of making a word list by using WordNet. The
result of this process is SentiWordNet, a collection of words annotated on
positive/negative and objective/subjective scales.
|
|
On the Effect of Group Structures on Ranking Strategies in Folksonomies
| |
Abel, Henze, Krause, Kriesell
|
02/23/2009
| |
WWW 2008
|
Folksonomy, Ranking
| |
This paper uses data from GroupMe! website to use group assignments and group
labels to enrich the labeling of various resources (web pages, images, video, etc.)
Not surprisingly, the additional information helps in ranking the tags (the
folksonomy) in order of relevance to a topic.
|
|
Hierarchical Topic Detection in TDT-2004
| |
Ao Feng, James Allan
|
02/16/2009
| |
CIIR Technical Report
|
Topic Detection
| |
Hierarchical Topic Detection poses interesting question - what is a topical
hierarchy? How do you define its levels? This paper addresses these questions
and proposes a method based on story agglomeration over time, the premise being
that stories about the same topic tend to be close in time.
|
|
UMass at TDT 2004
| |
Conell, Feng, Kumaran, Raghavan, Shah, Allan
|
02/16/2009
| |
TDT 2004 Workshop
|
Topic Detection
| |
This paper describes UMass submission to TDT 2004. Experiments in Hierarchical
Topic Detection, Topic Tracking, New Event Detection, and Link Detection are
described.
|
|
Gender and Emotion in the United States: Do Men and Women Differ in
Self-Reports of Feelings and Expressive Behavior?
| |
Robin W Simon, Leda E Nath
|
02/15/2009
| |
American Journal of Sociology Vol 109 No 5
|
Sociology, Gender
| |
A study on the gender-specific emotion behaviors, it uses a survey to collect
data about the recent emotional experiences of the subjects. Such self-reporting
mechanism is a secondary analysis of states that are best observed first-hand.
I'd say this is a good motivator for emotional analysis of text such as blogs,
which I believe is closer to the moment of emoting than a survey.
|
|
Domain Adaptation with Structural Correspondence Learning
| |
John Biltzer, Ryan McDonald, Fernando Pereira
|
02/04/2009
| |
EMNLP 2006
|
Structural Correspondence Learning
| |
The writers propose a structural correspondence learning model that has a
notion of pivot features at its heart. The task is to transfer knowledge from
one domain (labeled) to another (unlabeled). Pivot features are "features which
occur frequently in the two domains and behave similarly in both. The testing is
done on two collections - one from Wall Street Journal and one from MEDLINE. The
technique seems to be quite general and applicable to all kinds of features.
|
|
Selecting the right objective measure for association analysis
| |
Pang-Ning Tan, Vipin Kumar, Jaideep Srivastava
|
02/02/2009
| |
Information Systems (2004)
|
Association Analysis
| |
This is a wonderful overview (and resource) of objective measures for association
patterns. The authors propose some general properties for the measures. The measure
satisfying the most properties should be the most "fair". Yet one must not forget
the requirements of the task at hand. Some of the perkier measures can be useful
in special cases. The measures are grouped by their properties, and some techniques
are proposed for standardizing data to produce better measurements.
|
|
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
| |
John Blitzer, Mark Dredze, Fernando Pereira
|
02/02/2009
| |
ICML 2006
|
Sentiment Analysis, Inter-Domain Classification
| |
The researchers in this paper use their previously-developed structural
correspondence learning (SCL) algorithm, showing significant reductions in
classification over a baseline. Also propose a way of selecting unlabeled
data for labeling in order to affect classification accuracy.
|
|
BLEWS: Using Blogs to Provide Context for News Articles
| |
Gamon, Basu, Belenko, Fisher, Hurst, Konig
|
12/04/2008
| |
ICWSM
|
Sentiment Analysis
| |
This is a paper of a team from Microsoft Research and Live Labs describing
their system for extracting social context for political news articles using
the blogosphere. There is an interesting bit on duplicate article detection
that uses n-grams (of n>10). But the team backed away from actually using
sentiment polarity as a part of the analysis providing some convincing quotes
from their dataset. Instead, they used "emotional charge" quantity to show
how emotionally agitated the blog writers are about the news article.
|
|
LETOR: A Benchmark Collection for Learning to Rank for Information Retrieval
| |
Microsoft Research Asia
|
11/05/2008
| |
draft paper
|
Data Set, Ranking
| |
LETOR is a benchmark dataset developed for Learning to Rank for Information
Retrieval. This paper talks about the latest release of LETOR - LETOR3.0.
What is great about it is that the features are already extracted for the
data - this saves a lot of time for us researchers. On the other hand, it is
always iffy to have somebody else to mess with your work (and do some computing
you have no control over). Still, this is a well-respected dataset. Formatted
for easy integration with SVMlight, it is a valuable resource for all of us
mining enthusiasts.
|
|
A Statistical Language Modeling Approach to Online Deception Detection
| |
Lina Zhou, Zongmei Shi, Dongsong Zhang
|
10/29/2008
| |
IEEE Transactions on Knowledge and Data Engineering
|
Statistical Language Models, Fraud Detection
| |
The paper describes the use of Statistical Language Models for detecting
online deception. Deception here defined as "information being intentionally
transmitted to cause false conclusion", a narrowing of the typical deception
notion. They use an interesting smoothing method in their model - Kneser-Ney,
which seems to work like a discounted window. They use SVMlight (with
linear kernel) as a baseline. They show that the language model outperforms
SVM in unigrams and bigrams, though some values in the results section are
curious. They also mention type-token ratio as a way of analyzing the data
set.
|
|
Topic Models and a Revisit of Text-related Applications
| |
Viet Ha-Thuc, Padmini Srinivasan
|
10/22/2008
| |
PIKM '08
|
LSI
| |
Viet presented this paper, which is to appear in PIKM'08. The paper
addresses two limitations of topic models - limited scalability and
inability to model relevance. The scalability issue addressed using
a two-phase topic discovery technique. Also a new topic model is presented
where relevance is modeled by separating relevant topic from others.
|
|
Learning SVM Ranking Function from User Feedback Using Document
Metadata and Active Learning in the Biomedical Domain
| |
Robert Arens
|
10/15/2008
| |
ECML Workshop
|
SVM, Active Learning, Ranking
| |
Robert did a wonderful job presenting his paper for our reading group. His research
concerns learning from explicit feedback using SVMs. One surprising finding was that
simple learning techniques outperform the more complex ones (not to mention have a
much better run time). Intriguing future work concerns strategies for dealing with
bad user feedback and large collections of unlabled or sparsely labled data.
|
|
Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs
| |
Qiaozhu Mei, Xu Ling, Matthew Wondra, Hang Su, ChengXiang Zhai
|
10/08/2008
| |
WWW 2007
|
Sentiment Analysis, LSI
| |
The paper proposes a probabilistic model "to capture the mixture of topics and
sentiments simultaneously". They use a mixture of a background topic, several
subtopics and positive and negative sentiment topics. They learn the model distributions
for these using an EM algorithm. A Hidden Markov Model is then used to derive temporal
development of the subtopics and their sentiments. This model can be used to rank
sentences for topics, categorize sentences by sentiments, and reveal the overall
opinions for documents or topics. This approach seems quite general, and is fertile in
possible expansions.
|
|
EigenRank: A Ranking-Oriented Approach to Collaborative Filtering
| |
Nathan Liu, Qiang Yang
|
10/01/2008
| |
SIGIR 2008
|
Recommender Systems
| |
The paper proposes a collaborative filtering algorithm based on user ranking
of the items (instead of the customary ratings). This way, it is argued, we capture
the true preferences of the users. Kendall rank correlation coefficient is used
for determining similarity between the rankings. The new ranking is compiled using
two strategies: (1) a greedy order algorithm and (2) a random walk in a graph of
neighbors (thus the connection with PageRank). Although the experimental data doesn't
look too convincing (no mention of significance anywhere), and the random walk algorithm
seemed too unstable to be tested only once (no folding was done), the paper had
interesting concept. Also it would have been interesting to see what kind of computation
time their algorithms had (cause it seemed like they may be very involved). A
computation time to improvement comparison would have been great.
|
|
Opinion Mining and Sentiment Analysis
| |
Bo Pang and Lillian Lee
|
09/27/2008
| |
Foundations and Trends in Information Retrieval 2008
|
Sentiment Analysis
| |
A recent overview (2008) of Sentiment Analysis research.
|
|
Mining Newsgroups Using Networks Arising From Social Behavior
| |
Rakesh Agrawal, Sridhar Rajagopalan, Ramakrishnan Srikant, Yirong Xu
|
09/25/2008
| |
WWW 2003
|
Sentiment Analysis
| |
This team discovers some interesting characteristics of newsgroups
(essentially blogs): "The relationship between the two individuals
in the newsgroup network is much more likely to be antagonistic than
reinforcing." and "Many authors go off-topic and cause the discussion
to drift off to an unrelated subject". They represent the blogs using
a graph and use an eigenvector and Kernighan-Lin heuristic on top of
spectral partitioning to figure out the bipartitions of the graph
into groups of users with similar opinions.
|
|
Latent Semantic Indexing: An Overview
| |
Barbara Rosario
|
09/27/2008
| |
INFOSYS 240 Spring 2000
|
LSI
| |
An overview of Latent Semantic Indexing.
|
|
Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search.
| |
Dmitri Kalashnikov, Rabia Nuray-Turan, Sharad Mehrotra
|
9/24/2008
| |
SIGIR 2008
|
Named Entity Search
| |
The paper presents an approach to Web People Search (WePS) involving a skyline-based
classifier. They say this classifier is well-stydied, but nobody in our research group
has ever heard of it. Using co-occurrence measurements they train thier skyline classifier
to maximize two measures: F score and B-cubed. They compare this approach to an SVM
(perhaps unfairly, since they have only 8 training features). In the end, their approach
is horribly inefficient - it requires almost 35000 queries to a search engine for each
name.
|
|
Selecting Good Expansion Terms for Pseudo-Relevance Feedback
| |
Guihong Cao, Jian-Yun Nie, Jianfeng Gao, Stephen Robertson
|
9/17/2008
| |
SIGIR 2008
|
Relevance Feedback
| |
The authors show that a commonly held belief that the most frequent terms in top
retrieved documents are relevant to the query is not correct. In fact, there are
many non-relevant terms among these. They propose a term classification process
to predict the usefulness of terms. Term distribution and co-occurrence with
original query terms are some of the features used in training. They achive a
substantial increase in performance using this technique.
|
|
UIC at TREC 2007 Blog Track
| |
Wei Zhang, Clement Yu
|
9/10/2008
| |
TREC 2007
|
Sentiment Analysis
| |
The UIC developed a three-step algorithm for the opinion task of the Blog track.
This task requires the teams to retrieve the documents relevant to a query and
rank them as opinionated or objective, and then for objective either negative,
positive, or neutral. They use an SVM classifier for recognizing opinionated
terms, thus providing a means to rate each document appropriately.
|
Last Updated 05|07|2009
|