This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.
|
Previous: | Major New Linux Public Interactive and Batch Services (LXPLUS and LXBATCH) | (See printing version) | |
Next: | SHIFT Software Evolution: CASTOR |
Harry Renshall , IT/PDP
The Data Management section of PDP group is responsible for four services of IT division namely
/afs/cern.ch/
tree runs on 33
servers under different Operating Systems. There are about 2000
Gigabytes of total disk space on these systems in a wide variety of
configurations and there are hundreds of thousands of successful
accesses per day. This has been built up incrementally over many
years but recently various parts of the server infrastructure have
not kept up with the ever increasing number of clients.
One result of this is recurring periods when some servers become temporarily overloaded and user applications lose contact with them. This is notably seen when many clients run their overnight ASIS updates in the same period. A program of modernisation of the infrastructure has already started and we should improve this situation very soon by making multiple duplicate copies of heavily used volumes spread over the network (this is already done to some extent). Note that failure or congestion on the networks can give the same effect on a client as overload of an AFS server.
When the AFS service was built up disks were more expensive than
is now the case and in particular secure disk space, RAID arrays or
mirroring (two copies of a disk), could only be afforded for home
directory space. Project space, which is divided into scratch (not
backed up) and backed up, is on cheaper raw disk. We now intend
over the next year to replace the raw disk of backed up project
space (so called p.
volumes) with secure disk using
fewer and more homogeneous servers to reduce complexity. We will
move the non-backed up space (so called q.
volumes)
onto separate raw disk only servers. The requests from experiments
for AFS project space for next year are about 1 Terabyte with half
to be backed up. By using cheap Linux PCs with internal EIDE disks,
100GB on each PC, for the non-backed up space we can afford to put
the rest on secure disk. The same PCs will be used for the multiple
duplicate copies as fail-over to another copy is transparent to the
end user.
Finally the heart of the AFS service, the entry point for all file accesses, are 3 Volume Location Data Base servers. These are currently rather old machines on old FDDI networking and will be replaced early in 2000 by new fast, compact machines on Fast Ethernet doing nothing but function as the VLDB servers. A security feature of AFS is that the TCP/IP addresses of these servers are hardwired into each client in a system file called CellServDB. When we replace our machines this file has to be changed because two of them will have to have new TCP/IP addresses. For machines running the Cern SUE environment this will be transparent as SUE will do it. Other machines, e.g. those outside Cern, will have to reboot. We will issue appropriate warning. Note that failure to update will not be fatal as one server will keep its address but users will get error messages from the AFS client software.
Transarc have now been fully taken over by IBM and have revised their licencing strategy. They no longer support new site-wide licences, and the site licence Cern has does not include new Linux or NT workstations, though a replacement of an existing UNIX client by Linux or NT is covered. They have been unable up to now to tell us how this affects the licences for outside sites which, the deal was, became unlimited for more than three clients. We do not know if the rule on replacing an existing UNIX box by Linux or NT applies to outside sites in spite of having asked. We also were informed AFS is now marketed through country IBM software sales implying our centralised licencing is no longer available. In addition some sites have, at our suggestion, asked Transarc for licences for new clients and obtained better terms than the current Cern arranged licence.
We will continue to handle the existing licences but any new requests, from new sites or for new licences at existing sites, must be referred to Transarc. We have e-mail addresses of their world wide sales coordinators and they have proved responsive. In addition any sites who fail to pay us their annual fee will have the licence arrangement terminated, and Transarc so informed, at the end of the licence year, normally the 9th of March.
We are running version 3.2 of this product on a combination of IBM and Compaq disk and tape servers and this is not Y2K certified though it is believed it would work. We have been planning to upgrade to the Y2K compliant version 4.1, which also brings increased functionality and performance, after the heavy ion run ends on 1 December.
The update is rather complicated as there are prerequisite upgrades to the DCE (Distributed Computing environment) and the Encina transactional data base used by HPSS. The upgrade has already been performed on a parallel test HPSS service we run so we have a good estimate of the time needed. It is divisional policy to make no significant system changes between 16 December and 15 January that might provoke, or be confused with, Y2K problems so, with apologies, we are scheduling the necessary stoppages to HPSS before then. Note that many IT staff will be working over the holidays to keep the services running on a best efforts basis and, in particular, to perform the close-own on 29/30 December and restart on 2 January.
The DCE upgrades will be done from 1 December as these should be transparent to HPSS users. The Encina data base upgrade will take up to one day and we are scheduling this for Monday 6 December.
The HPSS upgrade will take up to two days and we are scheduling this for Monday 13 and Tuesday 14 December 1999. We will issue appropriate news as the work progresses.
There will hence be no HPSS access (stage -M, rf
or
hsm
commands) during the working day of 6 December or
from 08.30 on 13th till late afternoon on 14th December. Users who
know they will need a file during these times should copy it out of
HPSS before then.
If these stoppages would seriously damage important work for an experiment and need to be rescheduled please contact the author as soon as possible.
For matters related to this article please contact the author.
Cnl.Editor@cern.ch