JRA1.5 - Definition and Implementation of the Infrastructure Area Work Plan - Task Force Service Monitoring
Mandate and expected results
Understand and document the requirements for service monitoring
Identify where new features are required in the EMI software stack to meet these requirements.
Produce a work plan, with time-line, for adding these new features.
Guidelines / Hints
Contact EGI to obtain the requirements (Nagios, etc.)
Contact established and information available about Nagios probes and efforts
EGI presentation at Prague AHM and then discuss how EMI proceeds
Investigate, off-the-self or existing solution that will meet this requirements.
Hand-over discussions with EGI planned end of the week of 2011-02-07
EMI Key Objectives in Context
Short-term: Nagios Probes
Long-term: Investigations of service monitoring in EMI (to be refined)
Information about NAGIOS Probes from EGI (Emir)
During the preparation phase of EMI and EGI it was clearly agreed that the development of Nagios probes, these requiring expertise about a specific software component, would be a responsibility of EMI. EGI would be responsible of the development and maintenance of the rest of the framework needed to collect and display monitoring results. In addition I'm quoting relevant section of EMI DoW: "EMI will investigate and adapt off-the-shelf solutions and develop sensors to be plugged in industry standard monitoring tools, such as Nagios and standard CIM-based tools."
During the EGEE org.sam probes were developed and maintained by the SAM team. Funding didn't disappear, the responsibility was shifted to the service developers, i.e. EMI. Regarding the developer leaving, we were simply lucky that the person was around in the first 8 months of EGI project so he practically volunteered to maintain these probes.
However, this was never suppose to be a long term solution as this is not in our workplans.
Effort estimations: During the EGEE, SAM team allocated 1 FTE for development of probes. However, this was basically for developing probes for all services (CE, SRM, WMS) from the scratch. As the probes now exist effort should be smaller as it will cover maintenance and changes which will be caused by changes in monitored services.