Instant messaging using zephyr

Maria Dimou , CERN/ User Services

Last rev: March 13, 2003

About this document

This document is composed by:

The problem

The IT supporter (MoD), has to inform users and service managers very fast in case of operational problems and changes.
By targeting the relevant people only, there is a better chance that the information will be given appropriate attention.
This project is about identifying the classes of information recipients and the method of reaching them effectively.

Project Mandate

Name of the project: Instant messaging using zephyr.
Client of the project: All users and managers of CERN IT services.
Project Background:

The aim is to send information about service problems only to the community affected, e.g. only the users registered on the relevant server. The US group management requested 'zephyr' to be the medium for such announcements.
The situation in August 2002 was that the concept of 'classes' (otherwise called 'groups' or 'lists') has never been implemented in the CERN's zephyr server, except for the class operations which covers everyone. Moreover, even if the classes were defined on the server, every client (user) should consciously subscribe to a given class before (s)he may receive zephyr messages. This means the users have to be told to subscribe, they have to take action (we can't control this) and they have to know where to subscribe, i.e. they have to know which afs, mail etc server they belong to.
Therefore, the effort and effectiveness of a 'proper' solution has to be evaluated by the project.

 

Project Definition: Purpose: 1.Improve user satisfaction by prompt and relevant announcements of IT service problems and planned short-term changes.
2.Establish a fast information channel between IT service managers, the team of Managers on Duty (MoD) in US group, the helpdesk members and the operations' team by mutual reporting on the status of various critical services.
Scope: 1.System tolerance evaluation under various scenarios of extended zephyr  use.
2.Evaluation of necessary user actions to profit from the new scheme.
3.Consultation with the IT service providers and the experiments'  computing supporters about the proposed solution.
4.Provision of the necessary additional configuration parameters (new classes &/or input files).
5.Handbook preparation for use by IT service managers, MoDs, the helpdesk members and the operations' team.
6.Agreement on a maintenance procedure, ensuring that the new scheme doesn't fall out of use due to lack of configuration updates.
Objectives: 1.Select the technical solution that requires least action from the users for maximum benefit, in terms of useful information received, at an affordable cost.
2.Implement the solution (configuration, scripts, users' lists) that performs well for the capacity of the present system (zephyr servers).
3.Inform, in meetings where IT users are present, disseminating the details on the new medium targeting announcements.
4.Document the implemented set-up and foresee a maintenance method for the future.

Project Status on March 10th, 2003

The following information reflects the status of the project on the date its closure was decided:

The MoD now has a tool (in perl) to send zephyr messages without passing through the operators. A web-viewable archive is also in place containing all zephyr messages sent by this MoD tool.
An alternative tool in tcl is also available for copying/pasting and editing of the template files via a graphical user interface. This tool doesn't archive the zephyr message automatically.
Relevant README files for sending zephyr are up-to-date in ~moduas/zephyr (afs).
It was agreed that, rather than using zephyr, services would inform the targeted audience of planned changes and interruptions by email (e.g. development work on server mailX.cern.ch starting at 18:30 this evening).

Next steps concerning targeting:

Zephyr messages concerning web server problems should not be sent to a limited set of users as the non-availability of a web site concerns not only the site owner but also all possible browsing users. This is why such messages will keep going to the global class operations.

An executive summary of the project is now included in this document concerning:

Formal closure of the Project

In a meeting between L.Pregernig (IT/US Group Leader), R.Woolnough (IT/US/UA Section Leader) and M.Dimou (Zephyr Project Leader) the following decision was taken by the IT/US leadership:

Tools for the MoD

Checklist when sending zephyr

A) When an incident occurs or when an intervention is scheduled, please make sure you inform the operators (tel. 75011) about your plan to send a Zephyr message that should contain these elements:

  1. Are in English and French
  2. Give a short description of the problem (e.g., interruptions of the central Web servers) in terms understandable for the general public (e.g. instead of "problem with xchg2.cern.ch" write "problem with one Exchange mail server")
  3. Tell which users are affected
  4. Give a time estimate of the expected duration of the service incident or a time when the next information update will be provided

B) When a problem is resolved send another message to inform users that the service is up and running again.
(checklist written by L.Pregernig, v 0.1, 2002-11-22)

Procedure and templates for sending zephyr

The procedure is described in the (afs) ~moduas/zephyr/README file. Zephyr to classes that concern a big number of users can only be sent from the 'moduas' account due to permissions' restriction at the level of the zephyr servers. Inter-personal zephyr is, of course, possible for everyone.

The templates can be found in the (afs) ~moduas/zephyr/templates directory. So far, we have ready to use templates with text for the following issues:

 

Appendix

Additional requirements and relevant actions:

Discussions with US group and the IT service providers added the following requirements:

Actions on the milestones, alternative scenarios and conclusions:

Discussions, in October 2002, within US group, with Batch&Interactive Service (BIS) managers, Afs experts and a CCSR announcement were activities undertaken to achieve milestone I1 (IT Services informed, advised and agreed on the Project Plan). The conclusions are:

Summary and Conclusion

Zephyr messages sent from 2002-09-30 to 2003-02-20. Extract from the archive and the MoD logbook (for those sent before archiving):

Subject Requestor Users affected Date Frequency (times/5months)
tel. exchange CS group All 2003-02-21 Type: commodities affecting All / Times: 2
router (reminder) CS group All 2003-02-19 Type: network affecting All / Times: 3
router upgrade CS group All 2003-02-18 Type: network affecting All / Times: 3
mail8 mail-service mail8 2003-02-13 Type: mail affecting One server / Times: 5
afs51 afs-service >1000 homes,atlas,opal 2003-01-29 Type: afs affecting One server / Times: 2
lxplus7 BIS-managers All 2003-01-28 Type: BIS affecting All / Times: 2
Office2000 winservices NICE 2003-01-22 Type: NICE affecting All / Times: 3
Office2000 winservices NICE 2003-01-21 Type: NICE affecting All / Times: 3
power cut (local) ST Division PS zone buildings 2003-01-17 Type: power affecting local / Times: 3
Office XP winservices Office XP users 2003-01-14 Type: NICE affecting subset / Times: 1
disk server move FIO & DS groups CASTOR 2003-01-13 Type: affecting Mass Storage / Times: 1
XCHG server mail-service CERNXCHG02 2003-01-09 Type: mail affecting One server / Times: 5
power cut test ST Division All 2003-01-06 Type: power affecting All / Times: 4
shutdown IT-US-MoD All 2002-12-20 Type: BIS affecting All / Times: 2
nameserver DOS CS group All 2002-12-18 Type: network affecting All / Times: 3
mail solved mail-service All 2002-12-18 Type: mail affecting All / Times: 5
mail investigating mail-service All 2002-12-18 Type: mail affecting All / Times: 5
power cut (local) CS group WEST zone build.s 2002-12-13 Type: power affecting local / Times: 3
afs21 afs-service >1000 homes,atlas et al exper. data 2002-12-11 Type: afs affecting One server / Times: 2
power cut (CC) CS group All 2002-12-11 Type: power affecting All / Times: 4
power cut (CC) CS group All 2002-12-10 Type: power affecting All / Times: 4
YE heating ST Division All 2002-12-09 Type: commodities affecting All / Times: 2
mail solved mail-service All 2002-12-09 Type: mail affecting All / Times: 5
mail investigating mail-service All 2002-12-09 Type: mail affecting All / Times: 5
web13 down web-service Pages on web13 2002-12-03 Type: web affecting One server / Times: 7
power cut (local) ST Division Prevessin 2002-11-27 Type: power affecting local / Times: 3
simba solved mail-service Doing login simba 2002-11-15 Type: mail affecting One application / Times: 2
simba investigation mail-service Doing login simba 2002-11-15 Type: mail affecting One application / Times: 2
virus alert Security NICE 2002-11-13 Type: NICE affecting All / Times: 3
power cut ST Division All 2002-11-08 Type: power affecting All / Times: 4
XCHG server mail-service CERNXCHG02 2002-11-08 Type: mail affecting One server / Times: 5
mail8 mail-service mail8 2002-11-08 Type: mail affecting One server / Times: 5
mail8 mail-service mail8 2002-11-04 Type: mail affecting One server / Times: 5
mail mail-service All 2002-10-31 Type: mail affecting All / Times: 5
? EST-SU ? 2002-10-28 Type: special request affecting ? / Times: 1
web13 web-service Pages on web13 2002-10-25 Type: web affecting One server / Times: 7
web4, web11 web-service Pages on web4,11 2002-10-10 Type: web affecting One server / Times: 7
web7 web-service Pages on web7 2002-10-07 Type: web affecting One server / Times: 7
web4 web-service Pages on web4 2002-10-03 Type: web affecting One server / Times: 7
web3 web-service Pages on web3 2002-10-01 Type: web affecting One server / Times: 7
web3,web4 web-service Pages on web3,4 2002-09-30 Type: web affecting One server / Times: 7

Estimated cost of the project:

Function Name Time
Unix users' environment expert T.Smith

3 hours consultancy 2 working days implementation
(special entries in HEPiX scripts)

Afs expert R.Toebbicke 2 hours consultancy 0.5 working day implementation
(automatic email from afs server in trouble to the MoD)
Mail expert M.Christaller 3 hours consultancy 1 working day implementation
(provision of usernames/mailserver files)
NICE expert Ch.Boissat (advisorA.Pace) 3 hours consultancy 2 working days implementation
(special entries in NICE user profiles)
Tool development advisor B.Pollermann 3 hours consultancy 2 working days implementation (MoD scripts)
MoD team All MoDs 5 hours consultancy
Project manager M.Dimou 1 day/week coordination & documentation
2 working days implementation (MoD scripts)

Key issues encountered and results:

During the investigation:

During the 5-month operation:

Conclusions:

 


 

Acknowledgments:

Thanks to Arash Khodabandeh for offering me the GDPM templates as well as to Christian Boissat, Michel Christaller, Dan Pop, Harry Renshall, Tim Smith, Rainer Toebbicke, the asis team and my peer Managers on Duty for valuable advice at every step of this project.

References:

  1. Zephyr-related documents on the CERN intranet.
  2. The zephyr service writeup http://consult.cern.ch/writeup/zephyr/main.html
  3. The zephyr document for the CERN NICE environment http://consult.cern.ch/cnls/229/art_zephyr.html
  4. Service change Announcements FOCUS October 2000 http://ref.cern.ch/CERN/IT/US/2000/040/