Management Overlay Networks (MON)

What’s New

      • May-09-2008: MON source code is published in the Download section under the Illinois Open Source License
      • Dec-15-2006: We have designed a new algorithm for overlay construction that has better coverage and performance. MON is now listed as one of the infrastructure services running on the PlanetLab. We also updated the project webpage and the MON usage webpage.
      • May-16-2006: We have implemented a SQL-like query language that supports more complex queries (e.g., those with a where clause).
      • June-08-2005: A web interface for dynamic PlanetLab status query using MON is now available.
      • June-05-2005: The MON servers are integrated with the CoMon daemons.

The query command syntax for avg, top, bottom and histo is also unified.

  • June-03-2005: The MON webpage is created and the first version monclient is made available online.

People

Introduction

  • Background:Many large scale applications (e.g., content distribution, name service, publish/subscribe) are being deployed in wide area environment such as the PlanetLab testbed. To effectively manage such applications, the ability to dynamically query and control the application status is needed, for example, to find out all the application nodes that have reported a particular error message in their log files, or to restart all application nodes that are using too much free memory. Such dynamic query focuses on unplanned management tasks and provides access to unforeseen data attributes. Thus it is complementary to existing continuous monitoring systems.Since dynamic status query and control can often be finished in a short time (e.g., minutes to hours, but not years), MON takes a novel on-demand and no-repair approach. An overlay is built from scratch whenever some management tasks need to be executed. The overlay is discarded as soon as the tasks are finished. Since each overlay is only maintained for a short time, there is no need to have complex failure repair mechanisms. As a result, MON is extremely simple and unlikely to exhibit unexpected, emergent behaviors.In the MON project, we have focused on designing new algorithms for on-demand overlay construction to achieve high coverage, high reliability and low response time. We have also implemented a SQL like query language and various management commands to make MON practically useful. MON is currently one of the infrastructure services running on PlanetLab and it provides the ability to dynamically query the PlanetLab status. Below we briefly describe how to build overlays on-demand. The detailed usage of MON is described here. An early version of MON also supports software push. However, we have since removed the component and focused on status query and control.

  • On-demand overlay construction:MON consists of one daemon process running on each distributed node. The daemon process has a three-layer architecture as shown in the figure above. The lowest layer is responsible for (partial) membership maintenance. It enables each node to learn about new nodes and detect node failures. It also allows nodes to measure network delay between themselves. The middle layer is responsible for on-demand overlay construction. Currently we allow the construction of overlay trees and DAGs (direct acyclic graphs). The top layer is responsible for management command execution.On-demand overlay construction is very different from existing P2P overlays, which are built and maintained by handling individual joins and departures. For on-demand overlays, however, the goal is to setup an overlay to connect a set of existing nodes. As a result, membership information is very important to the construction algorithm. Designing an overlay construction algorithm involves the joint design of a membership layer and the overlay construction protocolTo achieve high coverage, we have designed several membership layers that self-organize nodes into certain loose form of overlay structures. The on-demand protocol can make use of such loose structure for achieve probabilistic high coverage and low response time. In addition, we have designed several techniques such as incremental overlay construction, DAG (direct acyclic graph) based opportunistic aggregation to improve the coverage, reliability and performance of on-demand overlays.We note the separation of overlay construction into a membership layer and on-demand protocol provides a lot of freedom for novel algorithm design. On the one extreme, the membership layer can just use simple gossip protocols for random membership maintenance. This provides little support for the on-demand protocol, thus the resulting algorithm may not have good coverage and performance. On the other extreme, the membership layer can maintain a tree structure by itself. This effectively becomes a persistent overlay, thus may involve complex failure repairs. The best tradeoff is probably somewhere in between, where the membership layer maintains some loosely structured information, which is easy to maintain, yet still provides useful information that the on-demand protocol can utilize for better overlay construction.

MON Usage

MON provides a web interface to dynamically query the PlanetLab status. The detailed description of the SQL like query language and a commandline monclient are available at the MON Usage page.

Download

The MON project is available for download under the Illinois Open Source License: Download MON source code

Publications

Jin Liang, Steven Ko, Indranil Gupta and Klara Nahrstedt, “MON: On-demand Overlays for Distributed System Management”, accepted to the 2nd USENIX Workshop on Real, Large Distributed Systems (WORLDS’05)

Jin Liang, Steven Ko, Indranil Gupta and Klara Nahrstedt, “MON: Management Overlay Networks for Distributed Systems”, poster session of SOSP 2005.

Jin Liang, Steven Ko, Indranil Gupta and Klara Nahrstedt, “MON: Design and Implementation of Management Overlay Networks for Distributed Systems”, UIUC Tech Report, April, 2005

Related Projects

Funding Agencies

The MON project is supported by the NSF under Grant number NSF ANI 03-23434 and NSF CAREER grant CNS-0448246. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or US government.