|
[an error occurred while processing this directive]
|
|
Crawling the Web for Recent & Relevant Information
Filippo Menczer
Department of Management Sciences
University of Iowa
Monday, April 1
4:30-5:20pm,
105 MLH
Abstract
Focused crawlers are receiving much attention both as a way to address
the scalability limitations of current search engine technology, and for
adding useful
context to drive the crawl on the basis of user interests, queries or
topical subjects.
In this talk I will describe ongoing research aimed at studying the
various cues available
to intelligent Web crawling and searching agents, and at designing,
evaluating, and
deploying such agents for a number of applications. I will present results
of topological
measurements that draw connections between Web similarity metrics based on
lexical
content, link analysis, and semantic relatedness. I will then describe a
class of crawling
algorithms that aim to exploit both lexical and linkage cues to find
relevant and recent
pages, and demonstrate one such publicly available real-time crawler
called MySpiders.
A number of tasks, collections, and performance measures are proposed to
systematically
evaluate the effectiveness, efficiency, and scalability of crawling
algorithms in a fair way
under limited space constraints. After outlining several experimental
results, I will conclude
with a look at future opportunities stemming from distributed peer to peer
models for
crawling and searching.
Part of this work is joint with Padmini Srinivasan and Gautam Pant,
and is supported by an NSF CAREER Award.
Dr. Filippo Menczer
is an Assistant Professor in the Department of Management
Sciences at the University of Iowa, where he teaches courses in information
systems. After receiving his Laurea in Physics from the University of Rome
in 1991, he was affiliated with the Italian National Research Council. In
1998 he received a dual Ph.D. in Computer Science and Cognitive Science
from the University of California at San Diego. Dr. Menczer has been the
recipient of Fulbright, Rotary Foundation, NATO, and Santa Fe Institute
fellowships, among others.
The Adaptive Agents Research Group led by Dr.
Menczer pursues interdisciplinary research projects spanning from
ecological theory to distributed information systems; these contribute to
artificial life, agent based computational economics, evolutionary
computation, neural networks, machine learning, and adaptive intelligent
agents for Web, text, and data mining.
|