Grid Monitoring Working Group Mandate

The overall goal of this group is to help improve the reliability of the grid infrastructure, and to provide stakeholders with views of the infrastructure allowing them to understand the current and historical status of the service. The group must pull together the various existing relevant work on monitoring and try to provide a coherent path forward, along with a workplan. It should propose and build dashboards in order to provide customized views for the various stakeholders.

Stakeholders

  • Grid site administrators
  • Grid service managers and grid operators
  • VOs
  • Grid Project management

Scope

  • While there is a vast amount of monitoring data produced, this working group will focus on the data relevant to improving the understanding and reliability of the grid service. Accounting of system resources will be considered out of scope.
  • It should not cover local fabric monitoring per se, but should work with site admins and System Management Working Group to propose mechanisms to integrate grid service monitors with local site monitoring for the benefit of site managers.
  • Monitoring from the point of view of the applications is out of the scope of this group, and is addressed by the System Analysis WG, but good communication with that group must be maintained as many underlying tools and sources of information will be common. It is important to avoid duplication of effort and avoid multiple monitors of the same data.

Goals

  • Agree on common definitions for sensors and metrics that describe the current state of a grid service.
  • Describe the interface between a site and the grid monitoring fabric, in order to allow sites within different grid infrastructures to publish and consume the monitoring data
  • Capture the current state of the various monitoring archival repositories, and describe the interactions between them. Provide recommendations on how to improve this data interchange.
  • Provide views of the system (“dashboards”) adapted to each of the stakeholder communities; each should provide an overall status summary and allow drill-down to details. Should also provide historical summary views.
  • Ensure data integrity and access is controlled appropriately by defining security requirements within the monitoring infrastructure.
  • Recommend improvements to the overall monitoring architecture for EGEE (but potentially more general collaborative efforts if possible):
    • Separation of sensors, transport mechanism(s), database and schema, visualization
    • Forge collaborations to provide common sensor repository with agreed interfaces
    • Understand how to provide a reliable, (common?) transport mechanism (in collaboration with other WGs)
    • Propose specific dashboard developments to visualize multiple data sources.

Non-Goals

  • It is not a goal to develop more monitoring tools, unless a specific need is identified.
  • It is not a goal to replace existing fabric management systems. It may be a goal to identify, in collaboration with the System Management Working Group, a “default” fabric management system for sites which do not already have one.

Chairs

  • James Casey – CERN
  • Ian Neilson – CERN

Anticipated Participation

  • Grid site representatives, Monitoring tool providers, …
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2007-02-05 - IanRobertNeilson
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback