A good monitoring system should have or been:

Universal Monitoring

We don’t want to implement a monitoring solution for the Linux server, another for network equipments, another for hardware, etc. We need one monitoring solution to monitor almost all our IT systems and services. A good monitoring system should provide a framework to include plugins to monitoring various services and devices. For example, it should be able to monitor: Operating systems — *nix, Windows, etc. System resources — CPU, Disk, Swap, Process, etc. Network Equipments — Switches, Router, VPN, Firewalls, UTL's, PLC's etc.

Efficient Alert Notifications

The system should be able to assign individuals (or groups) to a system or service as owners. This gives the power to the owners. Let the owners of the system (or service) be notified and take action. Should provide the ability for us to send notification using various methods — email, pager, SMS, IM, etc. Ability to set warning and critical alerts for systems and services that are monitored. Granular monitoring options to specify how often the system should be monitored, how many retries in case of failure, how many failure notifications to send, methods of notification, etc.

Web Dashboard

that provides overall health, issues, and alerts for all the systems across the network, along with the ability to drill-down to individual hosts (and services).

Issue Escalation

Should provide the ability to notify managers, when the owner of the system is not taking action on an issue within certain time period. For example, when a PLC crashes, and the responsible doesn’t fix it within reasonable time, the monitoring system should alert the manager about the issue.

Distributed Monitoring and Scalability

Should be capable of monitoring thousands of servers and services without too much overhead. Support distributed monitoring with multiple monitoring systems across the enterprise that can talk to a central monitoring server.

Reporting

Should generate various monitoring reports. For example, availability, trending, notification reports for administrators. Should provide daily, weekly, monthly, or custom date range analysis of various monitoring statistics

External Application Integration

Should provide a framework (or API) that can be used by external application to update the current status of the system or service that is getting monitored. Should be able to provide enough details for external vendors to integrate their solution with the monitoring software. The more extensible the framework is, more vendors will provide solution, and more companies will use it to make the software robust.

Open source solution

Since we’ll be exposing all our mission critical systems to the monitoring software, we should make sure that you can trust the monitoring software. Open source solutions are typically thoroughly tested and reviewed by the community for any potential security issues. Look for the track-record of the software. How many years it has been in the market, the longer the better. How many companies are using the software, the more the better.

Community and Commercial Support

When we are implementing it on a large scale (thousands of servers), we might want to implement a solution that is official supported and backed by a company. Several open source monitoring solutions are backed by a company that provides commercial support. Even if you don’t use the commercial support initially, you might want to use the support, when you expand your monitoring footprint.

Easy to Learn and Use

This might be obvious to some of us, but we’ll be surprised how many people end-up implementing a system that is very hard to learn and use. Don’t overlook this. The monitoring solution should be easy to implement and learn, as simple as that. We should not spend weeks trying to figure out how to get the software implemented and working successfully.

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2012-12-06 - TonoRiesco
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    SSM All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback