|
| [an error occurred while processing this directive] |
Tolerating Software Faults in Distributed SystemsNeeraj MittalDepartment of Computer Science University of Texas at Austin
Thursday, March 14
Abstract
Recent advances in communication technology have led to a rapid
proliferation of distributed systems. For example, a cluster of
servers provided Web coverage of the Sydney Summer Olympics. As
distributed systems evolve from the special case to commonplace,
ensuring their reliable operation has emerged as an important and
challenging problem. In spite of extensive testing and debugging,
software faults persist even in commercial grade software. Many
distributed systems, especially those employed in safety-critical
environments, should be able to operate properly even in the presence
of software faults. Monitoring the execution of a distributed system,
and, on detecting a fault, initiating the appropriate corrective
action is an important way to tolerate such faults.
|
|
| Translate this page automatically. |
| ©2005 The University of Iowa, All Rights Reserved. |