CERN Accelerating science

This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.

Contents
Index

Editorial
Physics Computing Internet Services and Network Technical Computing Education & Documentation If you need help
Editorial Information
Previous:IBM Joins CERN openlab for DataGrid applications
Next:European DataGrid Getting Ready for Release 2.0
 (If you want to print this article)



CERN Breaks Gigabyte/s Storage-To-Tape Barrier With StorageTek

Bernd Panzer-Steindel , IT / ADC


Abstract

This article is an extract from the official CERN Press Release which was published in May 2003, and available from the CERN Public Web site under the link "Press & Media".


Figure 1: Schematic of the data setup for the storage-to-tape challenge. Data - in this case generated by 40 compute servers, is temporarily stored to disk, then copied to the 45 StorageTek tape servers. Once the Large Hadron Collider is running at CERN, the data will come directly from the experiments and a copy will be distributed onto the DataGrid that CERN and partners are currently developing.

On 28 May 2003, in an official CERN Press Release, CERN announced the successful completion of a major data challenge aimed at pushing the limits of data storage to tape. Using 45 newly installed StorageTek** 9940B tape drives, capable of writing to tape at 30 megabyte/s, the Data Challenges team in IT Division was able to achieve storage-to-tape rates of 1.1 gigabyte/s for periods of several hours, with peaks of 1.2 gigabyte/s - roughly equivalent to storing a whole movie on DVD every four seconds. The average sustained over a three day period was of 920 megabytes/s. Previous best results by other research labs were typically less than 850 megabytes/s.

The significance of this result, and the purpose of the data challenge, was to show that the CERN's IT Division is on track to cope with the enormous data rates expected from LHC experiments. These experiments will produce data at rates in excess of 100 megabytes/s, and Alice alone is expected to produce data at rates of 1.25 gigabytes/s.

In all, the LHC experiments are anticipated to spew out over 10 petabytes of data a year, which will be stored on tape as well as being distributed around the world onto disk, for subsequent analysis using advanced "Grid" technologies for distributed computing and data storage. The data will contain information about the result of protons colliding in the accelerator at unprecedented energies, and recreating for a brief instant the extreme conditions that existed just after the Big Bang. Scientists will spend years sifting painstakingly through this data, in an effort to better understand the fundamental laws that govern matter in the Universe.

While waiting for the LHC to be completed, IT/ADC generated an equivalent stream of artificial data, using 40 compute servers. This data was stored temporarily to 60 disk servers before being transferred to the StorageTek tape servers (see Fig. 1). A data compression factor of 1.3 was deliberately chosen during this data challenge, as this is characteristic of the compression that can be achieved with real experimental data.

Besides the StorageTek equipment, a key contributing factor to the success of the data challenge was a high performance switched network from Enterasys Networks with 10gigabit/s ethernet capability, which routed the data from PC to disk and from disk to tape. This switched network is part of the CERN opencluster, an advanced computer cluster also involving technology from HP and Intel.

About the CERN opencluster
CERN opencluster is the first common project in the CERN openlab for DataGrid applications, a partnership with industry. The CERN openlab is a response to the new level of intensive industrial collaboration needed to solve the unprecedented computing challenge of the Large Hadron Collider project, currently under construction at CERN. The current partners in the CERN openlab are Enterasys Networks, HP, IBM and Intel. The CERN opencluster currently involves 64-bit processor technology from Intel, advanced servers from HP, and a 10 gigabit switching environment from Enterasys Networks.

About StorageTek
StorageTek, a worldwide company with headquarters in Louisville, Colo., delivers a broad range of storage solutions for digitized data. StorageTek is leader in virtual storage solutions for tape automation, disk storage systems and storage networking and is a voting member of the SNIA.



For matters related to this article please contact the author.


Cnl.Editor@cern.ch
CERN-CNL-2003-002
Vol. XXXVIII, issue no 2


Last Updated on Tue Jul 08 11:24:27 CEST 2003.
Copyright © CERN 2003 -- European Organization for Nuclear Research