Instant messaging using zephyr
Last rev: March 13, 2003
About this document
This document is composed by:
- The Project Mandate description including a Project
Definition Report, namely the:
- Purpose (why make this project?)
- Scope (activities to achieve the objectives)
- Objectives (results, deliverables)
- A Project Plan [PDF] with:
- Milestone Plan (list of intermediate results, decisions)
- Responsibility Chart (list of milestones and relevant Functional authorities,
not individuals)
- Activity Schedules (list of activities and individuals involved to achieve
a milestone)
written in the GDPM (Goal Directed Project Management) method, which allows
to plan, organise and control projects.
Notes from the life of the project, requirements
added in the way and actions all included in the graphs
and appended to this document in detail.
- Project Status
- Tools for the Manager on Duty (MoD) [ Checklist
- Procedure - Templates for sending zephyr messages
].
- Summary and Conclusions.
- Formal closure of the Project.
The problem
The IT supporter (MoD), has to inform users and service managers very fast
in case of operational problems and changes.
By targeting the relevant people only, there is a better chance that the information
will be given appropriate attention.
This project is about identifying the classes of information recipients and
the method of reaching them effectively.
Project Mandate
Name of the project: |
Instant messaging using zephyr. |
Client of the project: |
All users and managers of CERN IT services. |
Project Background: |
The aim is to send information about service problems only to the community
affected, e.g. only the users registered on the relevant server. The US
group management requested 'zephyr' to be the medium for such announcements.
The situation in August 2002 was that the concept of 'classes' (otherwise
called 'groups' or 'lists') has never been implemented in the CERN's zephyr
server, except for the class operations which covers everyone.
Moreover, even if the classes were defined on the server, every client
(user) should consciously subscribe to a given class before (s)he may
receive zephyr messages. This means the users have to be told to subscribe,
they have to take action (we can't control this) and they have to know
where to subscribe, i.e. they have to know which afs, mail etc server
they belong to.
Therefore, the effort and effectiveness of a 'proper' solution has to
be evaluated by the project.
|
Project Definition: |
Purpose: |
1.Improve user satisfaction by prompt and relevant
announcements of IT service problems and planned short-term changes.
2.Establish a fast information channel between IT service managers, the
team of Managers on Duty (MoD) in US group, the helpdesk members and the
operations' team by mutual reporting on the status of various critical services. |
Scope: |
1.System tolerance evaluation under various scenarios
of extended zephyr use.
2.Evaluation of necessary user actions to profit from the new scheme.
3.Consultation with the IT service providers and the experiments' computing
supporters about the proposed solution.
4.Provision of the necessary additional configuration parameters (new classes
&/or input files).
5.Handbook preparation for use by IT service managers, MoDs, the helpdesk
members and the operations' team.
6.Agreement on a maintenance procedure, ensuring that the new scheme doesn't
fall out of use due to lack of configuration updates. |
Objectives: |
1.Select the technical solution that requires
least action from the users for maximum benefit, in terms of useful information
received, at an affordable cost.
2.Implement the solution (configuration, scripts, users' lists) that
performs well for the capacity of the present system (zephyr servers).
3.Inform, in meetings where IT users are present, disseminating the
details on the new medium targeting announcements.
4.Document the implemented set-up and foresee a maintenance method
for the future. |
Project Status on March 10th, 2003
The following information reflects the status of the project on the date its
closure was decided:
The MoD now has a tool (in perl) to send zephyr messages without passing through
the operators. A web-viewable archive is also in place containing all zephyr
messages sent by this MoD tool.
An alternative tool in tcl is also available for copying/pasting and editing
of the template files via a graphical user interface. This tool doesn't archive
the zephyr message automatically.
Relevant README files for sending zephyr are up-to-date in ~moduas/zephyr (afs).
It was agreed that, rather than using zephyr, services would inform the targeted
audience of planned changes and interruptions by email (e.g. development work
on server mailX.cern.ch starting at 18:30 this evening).
Next steps concerning targeting:
- Declare classes per afs group server. This would target, via the
HEPiX scripts, only users belonging to a certain project/experiment when a
problem occured on their afs server. After consultation with afs experts,
it was decided not to attempt targeting the users of afs home servers due
to the nomadic nature of these users.
- Target individual mailserver users via a list (provided by the mail service
manager) of registered users per mailserver, regularly updated. Implemented
with daily-updated list, perl script, zephyr-text template and README file
in ~moduas/zephyr/mailservers directory (afs).
- Target Windows users only with an approach similar to the Unix ones
(HEPiX scripts) via file ZEPHYR.SUB. As agreed with IT-IS operations on 10
Feb 2003 a Zephyr test class was created (name NICEXP on the zephyr server
side) for users on (advanced) Windows client systems (Windows 2000 or XP).
Following this test, NICE login scripts would, possibly, be enhanced to target
users of various groups (decision on granularity being evaluated with IS group):
- profile server
- homedir server
- mail server
- building (extracted from database information)
- division (extracted from database information as above)
Zephyr messages concerning web server problems should not be sent to a limited
set of users as the non-availability of a web site concerns not only the site
owner but also all possible browsing users. This is why such messages will keep
going to the global class operations.
An executive summary of the project is now included
in this document concerning:
- Inventory of zephyr messages sent.
- Estimated cost of the project.
- Issues encountered and Conclusions from this experience.
Formal closure of the Project
In a meeting between L.Pregernig (IT/US Group Leader), R.Woolnough (IT/US/UA
Section Leader) and M.Dimou (Zephyr Project Leader) the following decision was
taken by the IT/US leadership:
- Targeting of the appropriate communities will not be pursued (mailserver
users, afs group servers and NICE servers), as it is considered too difficult
to identify the users who may be interested in knowing about a problem, e.g.
if mailX users only receive zephyr about mailX's trouble, other people who
have sent them email messages may wish to know why these haven't been received
yet.
- Given that the zephyr messages will not be aimed at those concerned, a method
should be added to the existing templates, so that users can determine if
they are affected by the announcement, e.g. the afs
vos command or the mmm
web page for the afs and mail-related templates.
- The signature of the templates, already approved by the service providers,
should be changed from "IT Manager on Duty on behalf of XYZ Support"
to "IT Manager on Duty on behalf of the Service Provider" with no
mention of the relevant service.
- The template files should be in Word. After this management decision, extensive
testing has shown that zephyr messages of multiple lines cannot be submitted
to a class of users from a Windows PC and the submission can only be
done from Unix (lxplus). Copying and pasting text from a Word file is not
always possible.
- The automatic web-browsable archiving
(requested by BIS managers on September 20th, 2002) should be stopped and
be replaced by a folder of the NICE moduas account. This is because
the present archive contains the zephyr message with date and requesting authority
as seen by the users but additional "metadata", if any, are missing.
People interested in the new archive folder may be told how to access it by
user.relations@cern.ch.
Tools for the MoD
Checklist when sending zephyr
A) When an incident occurs or when an intervention is scheduled, please make
sure you inform the operators (tel. 75011) about your plan to send a Zephyr
message that should contain these elements:
- Are in English and French
- Give a short description of the problem (e.g., interruptions of the central
Web servers) in terms understandable for the general public (e.g. instead
of "problem with xchg2.cern.ch" write "problem with one Exchange mail server")
- Tell which users are affected
- Give a time estimate of the expected duration of the service incident or
a time when the next information update will be provided
B) When a problem is resolved send another message to inform users that the
service is up and running again.
(checklist written by L.Pregernig, v 0.1, 2002-11-22)
Procedure and templates for sending zephyr
The procedure is described in the (afs) ~moduas/zephyr/README file.
Zephyr to classes that concern a big number of users can only be sent from the
'moduas' account due to permissions' restriction at the level of the zephyr
servers. Inter-personal zephyr is, of course, possible for everyone.
The templates can be found in the (afs) ~moduas/zephyr/templates directory.
So far, we have ready to use templates with text for the following issues:
- Afs problem
- Web server problem
- Planned power-cut announcement
- Virus alert
- Network interruption in the Computer Centre
- Mail problem
- Simba (listbox) problem
- Announcement related to the heating system.
Appendix
Additional requirements and relevant actions:
Discussions with US group and the IT service providers added the following
requirements:
- September 20th, 2002: BIS managers' requirement to archive zephyr messages
to refer users back, when necessary.
This action is done.
The answer is:
The MoD tool for sending zephyrs automatically adds every zephyr message to
a web-browsable
archive. In addition, the same text of the zephyr message is also delivered
to the moduas INBOX, from where additional archiving is possible in the moduas
'zephyr-archive' mail folder.
- October 7th, 2002: Include resource estimates in selected solution.
This action is now in hand.
Its status is:
M. Dimou updates regularly GDPM "Activity Schedule" sheets with
effort invested so far. She also notes and collects estimated figures from
IT service managers who provide data and advice in the course of the project.
- October 23rd, 2002: Make a console, similar to the one of the operators,
for direct zephyr message submission by the MoD.
This is done.
The answer is:
The script /afs/cern.ch/user/m/moduas/zephyr/zephyr.pl for sending
zephyr messages and automatically archiving is in operation for use by the
MoD.
- October 28th, 2002: Pre-define a set of templates per problem type
to facilitate the MoD's task, when sending zephyr messages.
This action is now in hand.
Its status is:
Templates present in (afs) ~moduas/zephyr/templates directory in english
and french. Checked and approved by fellow MoDs.
- November 18th, 2002: Investigate occasional zephyr display problems for
Windows users.
This action is done.
The answer is:
M.Dimou edited http://consult.cern.ch/qa/3090.
NICE experts gave advice and users checked that the procedure works indeed.
- November 18th, 2002: Check if an alternative to zephyr for the Windows
platform exists.
This action is done.
The answer is:
The only alternative would be a tool available in Exchange but it requires
that:
1. All Windows' users are migrated
2. They have activated the 'interactive messaging tool.
The conclusion is to stay with zephyr, which is the only cross-platform tool
available today.
- November 18th, 2002: List the limitations of the proposed solution and
the granularity of the applications concerned.
This action is in hand.
The answer is partially in the action list (below). A consolidated list will
be made after the planned implementation of mid-February 2003 (see Project
Status information).
- January 13th, 2003: Send windows-related zephyr messages to NICE users
only.
This action is in hand.
Discussions are going on with T.Smith and Ch. Boissat for appropriate
updates of the zephyr servers and the NICE user profiles.
- January 13th, 2003: Prevent permanent computing equipment from some (not
IT) public auditoria from receiving zephyr.
This action is pending.
M.Dimou to collect the hosts involved and check with relevant services how
their configuration may be changed.
Actions on the milestones, alternative scenarios and
conclusions:
Discussions, in October 2002, within US group, with Batch&Interactive Service
(BIS) managers, Afs experts and a CCSR announcement were activities undertaken
to achieve milestone I1 (IT Services informed, advised and agreed on the Project
Plan). The conclusions are:
- About the zephyr server:
- Create classes per service category, when there is a way to target the
relevant users. However, IT services should populate the classes, i.e.
expect no explicit user action for subscription.
- Use special hooks in the hepix login scripts for automatic user subscription
to the relevant classes (e.g. enhance the present /etc/hepix/cluster/xclients.m
script with additional zctl commands, which create classes and
perform zephyr subscriptions).
- About 'fixed-population' servers (mail):
Services with relatively stable user population (e.g. users on a given mailserver)
can correspond to classes that either:
- Contain a fixed population, registered on the zephyr server from a users'
list, provided by the service manager (to be checked every few months
for updates) or:
- The special hook in the hepix scripts first discovers, at login time,
the mailserver where the user belongs (e.g. run the command 'nslookup
username.mailbox | grep Name') before deciding which class to subscribe
him/her to.
The 1st option (fixed-list) is chosen. Details are being arranged between
M.Dimou, T.Smith (for zephyr server updates) and M.Christaller (for automatic
transmission of class members) for the new classes:
mail5, mail6, mail7, mail8, cernxchg0[1,2,3,4]
NB! For planned changes on any given mailserver, IS group will be sending
individual email messages to the relevant users a few days and hours
before the intervention in addition to the last notice zephyr. This was agreed
betwen M.Dimou and M.Christaller on November 18th.
- About 'dynamic-population' servers (afs):
Services which are very dynamic (e.g. users on a given Afs-server, now being
re-located, for load-balancing reasons, at a rate of 5-10home-directories/1000/Afs-server/day)
can correspond to:
- The general, existing, class operations, with some hints, when
possible, for the users to find whether they are affected by this information.
Problems on Afs servers hosting experiment and/or project volumes can
be communicated in this general fashion, because they potentially affect
thousands of users all over the globe, who try to access their files without
login to any of the CERN interactive services.
- Classes, specific to the afs server, if the special hook in the hepix
scripts can decide, at login time, where the user belongs.
- Concerning user volumes: by running the command 'fs whereis .'
before subscribing him/her.
- Concerning experiment volumes by doing the subscription in group-specific
profiles, e.g. in /afs/cern.ch/group/zp for Atlas users.
- Windows' users should not see this type of messages, with the exceptions
of those who run the afs client on their PC. This corresponds to 30-40
users, in November 2002. The implementation of this requirement has not
yet been discussed in detail.
In the immediate future and, specifically, for afs operational problems, the
experts suggested that:
When an afs server experiences some problem (response delays and whatever
else system control scripts measure), an automatic email to mod@cern.ch will
be issued (no human intervention so far). The MoD of the day should immediately
call the afs experts and report this warning for investigation. They will
tell the MoD if it is a true problem and if we need to notify the users (and
the likely to be affected user community, to make the zephyr message text
as helpful as possible).
Summary and Conclusion
Zephyr messages sent from 2002-09-30 to 2003-02-20. Extract from the archive
and the MoD logbook (for those sent before archiving):
Subject |
Requestor |
Users affected |
Date |
Frequency (times/5months) |
tel. exchange |
CS group |
All |
2003-02-21 |
Type: commodities affecting All / Times: 2 |
router (reminder) |
CS group |
All |
2003-02-19 |
Type: network affecting All / Times: 3 |
router upgrade |
CS group |
All |
2003-02-18 |
Type: network affecting All / Times: 3 |
mail8 |
mail-service |
mail8 |
2003-02-13 |
Type: mail affecting One server / Times: 5 |
afs51 |
afs-service |
>1000 homes,atlas,opal |
2003-01-29 |
Type: afs affecting One server / Times: 2 |
lxplus7 |
BIS-managers |
All |
2003-01-28 |
Type: BIS affecting All / Times: 2 |
Office2000 |
winservices |
NICE |
2003-01-22 |
Type: NICE affecting All / Times: 3 |
Office2000 |
winservices |
NICE |
2003-01-21 |
Type: NICE affecting All / Times: 3 |
power cut (local) |
ST Division |
PS zone buildings |
2003-01-17 |
Type: power affecting local / Times: 3 |
Office XP |
winservices |
Office XP users |
2003-01-14 |
Type: NICE affecting subset / Times: 1 |
disk server move |
FIO & DS groups |
CASTOR |
2003-01-13 |
Type: affecting Mass Storage / Times: 1 |
XCHG server |
mail-service |
CERNXCHG02 |
2003-01-09 |
Type: mail affecting One server / Times: 5 |
power cut test |
ST Division |
All |
2003-01-06 |
Type: power affecting All / Times: 4 |
shutdown |
IT-US-MoD |
All |
2002-12-20 |
Type: BIS affecting All / Times: 2 |
nameserver
DOS |
CS group |
All |
2002-12-18 |
Type: network affecting All / Times: 3 |
mail solved |
mail-service |
All |
2002-12-18 |
Type: mail affecting All / Times: 5 |
mail investigating |
mail-service |
All |
2002-12-18 |
Type: mail affecting All / Times: 5 |
power cut (local) |
CS group |
WEST zone build.s |
2002-12-13 |
Type: power affecting local / Times: 3 |
afs21 |
afs-service |
>1000 homes,atlas et al exper. data |
2002-12-11 |
Type: afs affecting One server / Times: 2 |
power cut (CC) |
CS group |
All |
2002-12-11 |
Type: power affecting All / Times: 4 |
power cut (CC) |
CS group |
All |
2002-12-10 |
Type: power affecting All / Times: 4 |
YE heating |
ST Division |
All |
2002-12-09 |
Type: commodities affecting All / Times: 2 |
mail solved |
mail-service |
All |
2002-12-09 |
Type: mail affecting All / Times: 5 |
mail investigating |
mail-service |
All |
2002-12-09 |
Type: mail affecting All / Times: 5 |
web13 down |
web-service |
Pages on web13 |
2002-12-03 |
Type: web affecting One server / Times: 7 |
power cut (local) |
ST Division |
Prevessin |
2002-11-27 |
Type: power affecting local / Times: 3 |
simba solved |
mail-service |
Doing login simba |
2002-11-15 |
Type: mail affecting One application / Times: 2 |
simba investigation |
mail-service |
Doing login simba |
2002-11-15 |
Type: mail affecting One application / Times: 2 |
virus
alert |
Security |
NICE |
2002-11-13 |
Type: NICE affecting All / Times: 3 |
power cut |
ST Division |
All |
2002-11-08 |
Type: power affecting All / Times: 4 |
XCHG server |
mail-service |
CERNXCHG02 |
2002-11-08 |
Type: mail affecting One server / Times: 5 |
mail8 |
mail-service |
mail8 |
2002-11-08 |
Type: mail affecting One server / Times: 5 |
mail8 |
mail-service |
mail8 |
2002-11-04 |
Type: mail affecting One server / Times: 5 |
mail |
mail-service |
All |
2002-10-31 |
Type: mail affecting All / Times: 5 |
? |
EST-SU |
? |
2002-10-28 |
Type: special request affecting ? / Times: 1 |
web13 |
web-service |
Pages on web13 |
2002-10-25 |
Type: web affecting One server / Times: 7 |
web4, web11 |
web-service |
Pages on web4,11 |
2002-10-10 |
Type: web affecting One server / Times: 7 |
web7 |
web-service |
Pages on web7 |
2002-10-07 |
Type: web affecting One server / Times: 7 |
web4 |
web-service |
Pages on web4 |
2002-10-03 |
Type: web affecting One server / Times: 7 |
web3 |
web-service |
Pages on web3 |
2002-10-01 |
Type: web affecting One server / Times: 7 |
web3,web4 |
web-service |
Pages on web3,4 |
2002-09-30 |
Type: web affecting One server / Times: 7 |
Estimated cost of the project:
Function |
Name |
Time |
Unix users' environment expert |
T.Smith |
3 hours consultancy 2 working days implementation
(special entries in HEPiX scripts)
|
Afs expert |
R.Toebbicke |
2 hours consultancy 0.5 working day implementation
(automatic email from afs server in trouble to the MoD) |
Mail expert |
M.Christaller |
3 hours consultancy 1 working day implementation
(provision of usernames/mailserver files) |
NICE expert |
Ch.Boissat (advisorA.Pace) |
3 hours consultancy 2 working days implementation
(special entries in NICE user profiles) |
Tool development advisor |
B.Pollermann |
3 hours consultancy 2 working days implementation (MoD scripts) |
MoD team |
All MoDs |
5 hours consultancy |
Project manager |
M.Dimou |
1 day/week coordination & documentation
2 working days implementation (MoD scripts) |
Key issues encountered and results:
During the investigation:
- There is no real expertise or support for zephyr. On the other hand, the
tool it is limited but light, stable and simple.
- There is no alternative to zephyr which can be used today and works across
platforms.
- It is not appropriate to ask the users to take action to subscribe to classes.
- However, it is possible to create and automatically populate classes without
user action.
- The BIS and NICE services can automatically subscribe users to multiple
classes (server, division, building etc).
During the 5-month operation:
- The MoD posesses an easy tool to send zephyr directly.
- The archive (done automatically) became a useful proof of proper information
procedure and a source of useful statistics.
- The checklist rose our awareness on the importance of recording:
- The nature and expected duration of a problem.
- The text in both languages (English and French).
- The coordination between services, operators, MoD, helpdesk, users.
- Getting the information to the users a.s.a.p.
- Keeping track of the requesting authority.
- The templates we now have (as ASCII files in English and French) help us
include all the essential information in the message quickly.
- Experience showed that users don't complain if they receive a message that
doesn't concern them (i.e. when target was too large).
- Users who are not logged on or don't accept zephyr can't receive the messages
(this is valid for both platforms).
- Users out of CERN (maybe half of the community) don't see the messages.
The other information channels, as agreed
at FOCUS (newsgroups, tvscreen, mailing lists) should be used in addition.
- In the course of the project we agreed to email users affected by planned
mailserver changes individually. This is independent of zephyr but a useful
new procedure.
- In the course of the project we agreed to email the MoD automatically when
an afs server is in trouble. This is independent of zephyr but a useful new
procedure.
Conclusions:
- Zephyr is a good tool to get information to the users fast. However, it
is like the radio news: you have to be switched on and it may say things that
don't concern you or that you already know.
- Now that we know we can relatively easily and with no user action implement
targeted classes, we should create them and use them, however:
- The message type most frequently sent (web server problem) cannot be targeted.
On the other hand,
- Users enjoy being informed, even when not concerned, so, we shouldn't worry
too much when impossible to target.
- Zephyr requires a message text with line breaks such that the window has
a reasonable size (~80 chars per line), and no accent in French. ASCII files
are appropriate for this, their WORD copies are not.
- Overall, this was not an "expensive" project but brought in useful,
streamlined procedures.
Acknowledgments:
Thanks to Arash Khodabandeh for offering me the GDPM templates as well as to
Christian Boissat, Michel Christaller, Dan Pop, Harry Renshall, Tim Smith, Rainer
Toebbicke, the asis team and my peer Managers on Duty for valuable advice at
every step of this project.
References:
- Zephyr-related documents on
the CERN intranet.
- The zephyr service writeup http://consult.cern.ch/writeup/zephyr/main.html
- The zephyr document for the CERN NICE environment http://consult.cern.ch/cnls/229/art_zephyr.html
- Service change Announcements FOCUS October 2000 http://ref.cern.ch/CERN/IT/US/2000/040/