CERN Accelerating science

This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.

CERN home pageCERN home pageDocuments by ReferenceDocuments by ReferenceCNLsCNLsYear 2002Year 2002Help, Info about this page

Contents
Index

Editorial Information
Editorial
If you need help
Announcements Physics Computing Desktop Computing Internet Services and Network Scientific Applications and Software Engineering The Learning Zone User Documentation
Previous:DataGrid Moves to Production
Next:AFS Project Space Administration - New Project Quota Structure
 (If you want to print this article)



Data Services Group Activities and Plans

Harry Renshall , IT / Data Services


Abstract

This is an update on the IT Data Services group activities and plans and covers AFS, backup, CASTOR and tape services.


The Data Services group is currently responsible for four major areas. These include the home and project directories infrastructure based on the OpenAFS product and the backup services for computer centre and departmental servers based on the TSM and Legato products. We are responsible for the managed storage software and services of the CERN Advanced Storage Manager (CASTOR) (which includes backwards compatibility with the SHIFT tape access software) and for the magnetic tape drive, robotics and disk server infrastructure that underpins all of these. Changes are planned or have recently happened in all of these areas and it is timely to report on them.

AFS Services

During the first quarter of this year most computer centre AFS client machines were changed from IBM to OpenAFS and this improved the stability of the client and fixed some of the problems where a client would lock up completely any access to AFS. We were still left with a serious bug in handling large directories that were frequently changed from multiple clients (e.g. hundreds of concurrent batch jobs) and this was found to be on the server side. This was fixed in a new release of the OpenAFS server code and we installed a spare machine with this and moved over the busiest directories. Since then we have been systematically migrating our twenty-five servers and have now completed this work. Each server migration took several weeks as live user data was slowly but transparently drained from it to an already migrated server.

Backup Services

Our major activity in this area is planning to reduce from the current three systems we use (AFS internal, IBM Tivoli Storage Manager and Legato) to a single system next year. This will probably be, though not necessarily, one of the existing systems. This will save both manpower and licence fees and is made possible by the TSM and Legato products converging in performance and functionality. In December we will present a proposal for implementation next year. This will be largely transparent to the ordinary central computing users.

Managed Storage

The CASTOR system now handles the bulk of the physics data of CERN with a total volume stored on tape of more than 1 Petabyte and over 7 million files. The software and services using it have reached a good level of stability and performance recording, for example, from 2 to 4 TB of data per day for the COMPASS experiment during their data taking this year. An important test will be the Alice Data Challenge which, during November/December, intends to store data to tape at 200 MB/sec for a continuous week.

A user meeting was held in March 2002 and a prioritised program of work was built based on user and operational requirements. User priorities were to support the LHC Data Challenges and improve reliability and we are introducing large file support, tape re-packing, tape drive fair shares allocation and improved monitoring and statistics. The main items can be seen under the link 'Development Plans for the rest of 2002' on the CASTOR home page it-div-ds.web.cern.ch/it-div-ds/HSM/CASTOR. An important item is to provide a GridFTP interface to CASTOR to run on a second wacdr.cern.ch gateway server and this should be ready in November.

Tapes and Tape Robots

In January of this year we completed the symmetric splitting of our STK tape robots and drives into two physically separate complexes, one on the ground floor of building 513 and the other in a new building 613 some 200 metres away. Each complex consists of 5 silos with each silo having 5500 storage slots and each complex has 14 of our mainstream tape drive, the STK 9940A. These drives move data to and from tape at 10 MB/sec and store 60 Gigabytes per cartridge. STK have now announced the 9940B model drive which moves data at 30 MB/sec and stores 200 GB onto the same cartridge as used by the model A drive. The LCG project ordered 20 of these drives to be used for the Alice data challenge and they were delivered to CERN on 13 November. We have already performed an extensive field test and found they perform reliably and as specified. Next year we hope to upgrade all our 9940A drives to B models and reduce our media storage costs from above FS 2 per GB on tape to below FS 1. The model B drive can read a tape written on the model A but cannot write at the lower density. The upgrades are planned to be ready for the next SPS run from May 2003.

We announced last year the ending of the STK Redwood helical scan magnetic tape service at the end of 2002 (these are tapes with visual identifiers beginning with the letter Y) and we are now in the last stages of this operation. We have already copied most required Redwood tapes into the CASTOR system. All Redwood tapes except for those of NA48 have been locked in our Tape Management System (TMS). The NA48 experiment have 7000 Redwoods of 50 GB each, representing several years of raw data. We had been waiting until we had some of the higher density 9940B drives to copy them but STK kindly allowed us to keep some of the field test drives while waiting for our order to arrive and we have been using those. We are now ejecting the already copied or unwanted Redwood cartridges and putting them in storage. There will be no maintenance possible on Redwood drives from the end of this year and given their short head life we cannot guarantee to be able to re-read any Redwoods next year.

A third major activity will be to provide the resources for the Data Base group to migrate 2001 and 2002 COMPASS data (over 3000 cartridges) from Objectivity format to Date format and create an associated metadata database of a few TB using Oracle. As with the Redwood copies for NA48 this will be done onto the high density 9940B drives so can only start when the Alice Data Challenge has liberated enough drives. We estimate the copying will take 2 to 3 months. Note that the COMPASS and NA48 experiments have had 12 9940A tape drives dedicated to them during this year's SPS run and we plan for eight of those to be dedicated to be the input drives for this COMPASS migration exercise. This temporary dedication is not expected to have a serious effect on other users.

A fourth major activity will be to relocate the tape robots in the building 513 computer room to their final position in the former tape vault of building 513. The vault was emptied at the end of last year and major refurbishment work has been carried out to equip it to take up to 8 tape robots and thousands of PCs. This is part of the long term plan to also equip the ground floor computer room in building 513 to be ready for the LHC computing equipment. This will be done in three phases and relocation of the tape robots is part of the first phase. This work must also be completed well before the start of the SPS next year and we plan to do it in January and February. There will be some disruption to tape mounting but we plan to minimise this by emptying complete silos, made possible by ejecting all Redwood cartridges, and moving them one at a time.

Finally, as well as the tape robots there are some eighty disk servers (out of a total of over 200), many of which also run stagers, that have to be moved early next year. These were the first ones installed and are spread over nearly all Cern experiments. Among them are the primary CASTOR name server machine and four new name-servers scheduled for each LHC experiment next year. The servers are mixed over many experiments and are spread over three network services. Their network infrastructure must be dismantled and moved in synchronisation. It is not physically possible to cleanly move one experiment at a time so we propose to shut down all access to the CASTOR system for the time required to make the moves which is estimated to be three days. We are proposing the dates of Monday/Tuesday/Wednesday 13/14/15 January 2003 as the workload is traditionally lower in early January. We will, of course, try and minimise the actual time and will publish more details later.



For matters related to this article please contact the author.
Cnl.Editor@cern.ch


CERN-CNL-2002-003
Vol. XXXVII, issue no 3


Last Updated on Tue Dec 10 13:41:47 CET 2002.
Copyright © CERN 2002 -- European Organization for Nuclear Research