University of Iowa homepage
 

 

Crawling the Web for Recent & Relevant Information

Filippo Menczer
Department of Management Sciences
University of Iowa

Monday, April 1
4:30-5:20pm, 105 MLH

Abstract

Focused crawlers are receiving much attention both as a way to address the scalability limitations of current search engine technology, and for adding useful context to drive the crawl on the basis of user interests, queries or topical subjects. In this talk I will describe ongoing research aimed at studying the various cues available to intelligent Web crawling and searching agents, and at designing, evaluating, and deploying such agents for a number of applications. I will present results of topological measurements that draw connections between Web similarity metrics based on lexical content, link analysis, and semantic relatedness. I will then describe a class of crawling algorithms that aim to exploit both lexical and linkage cues to find relevant and recent pages, and demonstrate one such publicly available real-time crawler called MySpiders. A number of tasks, collections, and performance measures are proposed to systematically evaluate the effectiveness, efficiency, and scalability of crawling algorithms in a fair way under limited space constraints. After outlining several experimental results, I will conclude with a look at future opportunities stemming from distributed peer to peer models for crawling and searching.
Part of this work is joint with Padmini Srinivasan and Gautam Pant, and is supported by an NSF CAREER Award.

Dr. Filippo Menczer is an Assistant Professor in the Department of Management Sciences at the University of Iowa, where he teaches courses in information systems. After receiving his Laurea in Physics from the University of Rome in 1991, he was affiliated with the Italian National Research Council. In 1998 he received a dual Ph.D. in Computer Science and Cognitive Science from the University of California at San Diego. Dr. Menczer has been the recipient of Fulbright, Rotary Foundation, NATO, and Santa Fe Institute fellowships, among others. The Adaptive Agents Research Group led by Dr. Menczer pursues interdisciplinary research projects spanning from ecological theory to distributed information systems; these contribute to artificial life, agent based computational economics, evolutionary computation, neural networks, machine learning, and adaptive intelligent agents for Web, text, and data mining.
 

Thursday, October 07, 2004, 10:21:31.
University of Iowa Logo College of Liberal Arts and Sciences Logo Computing Research Association Logo Association for Computing Machinery Logo
Translate this page automatically.
 
©2005 The University of Iowa, All Rights Reserved.