WLCG Monitoring Working Groups
Background presentation by Ian Bird
Meetings
System Management
One of the problems observed (by EGEE and LCG) in providing a reliable grid service is the reliability of the local fabric services of participating sites. The SMWG should bring together the existing expertise in different area of fabric management to build a common repository of tools and knowledge for the benefit of HEP system managers’ community. The idea is not to present all possible tools nor to create new ones, but to recommend specific tools for specific problems according to the best practices already in use at sites. Although this group is proposed in order to help improve grid sites reliability, the results should be useful to any site running similar local services. Two areas should be improved by the group: tools and documentation.
More Information
Grid Service Monitoring
The overall goal of this group is to help improve the reliability of the grid infrastructure, and to provide stakeholders with views of the infrastructure allowing them to understand the current and historical status of the service. The group must pull together the various existing relevant work on monitoring and try to provide a coherent path forward, along with a workplan. It should propose and build dashboards in order to provide customized views for the various stakeholders.
More information
System Analysis
The overall goal of this working group is to gain understanding of application failures in the grid environment, and to provide an application view of the state of the infrastructure. This view can provide input to grid service monitoring and management, and help gain a better understanding of the behaviour of the system and hence improve reliability and performance.
More information