Of the many distributed applications designed for the Internet, the successful ones are those that have paid careful attention to scale and robustness. These applications share several design principles. In this paper, we illustrate the application of these principles to common network monitoring tasks. Specifically, we describe and evaluate 1) a robust distributed topology discovery mechanism and 2) a mechanism for scalable fault isolation in multicast distribution trees. Our mechanisms reveal a different design methodology for network monitoring-one that carefully trades off monitoring fidelity (where necessary) for more graceful degradation in the presence of different kinds of network dynamics.
Read full abstract