CERN Accelerating science

This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.

Contents
Index

Editorial
Physics Computing Internet Services and Network Technical Computing Education & Documentation If you need help
Editorial Information
Previous:Physics Computing
Next:IBM Joins CERN openlab for DataGrid applications
 (If you want to print this article)



The CERN openlab for DataGrid applications

Sverre Jarp and François Grey , IT / openlab


Abstract

The CERN openlab is a new model for partnership between CERN and industry is integrating and testing cutting-edge computer technologies that show promise for the LHC Computing Grid. The Board of Sponsors meeting was held on June 13 at CERN, and this article summarises key achievements that were presented to the sponsors.


The CERN openlab fits into CERN's portfolio of Grid activities by addressing a key issue, namely the impact of cutting edge IT technologies - which are currently emerging from industry - on the technological roadmap for the LCG. Peering into the technological crystal ball in this way can only be done in close collaboration with leading industrial partners. The benefits are mutual: through generous sponsorship of state-of-the-art equipment from the partners, CERN gets early access to valuable technology that is still several years from the commodity computing market that the LCG will be based on.

In return, CERN provides demanding data challenges which push these new technologies to their limits - this is the "lab" part of the openlab. CERN also provides a neutral environment for integrating solutions from different partners, to test their interoperability. This is a vital role in an age where open standards (the "open" part of openlab) are increasingly guiding the development of the IT industry.

The CERN openlab for DataGrid applications was launched in 2001 by Manuel Delfino, then IT Division Leader at CERN. After a hiatus during which the IT industry was rocked by the telecoms crash, the partnership took off in September 2002, when HP joined founding members Intel and Enterasys Networks, and integration of technologies from all three partners led to the CERN opencluster project. In April of this year, IBM joined, and the Annual Sponsors meeting held at CERN on June 13th was an opportunity to present the results obtained so far to top managers representing the industrial partners.

CERN opencluster

At present, the CERN opencluster consists of 32 Linux-based HP rack-mounted servers, each equipped with two 1 GHz Itanium-2 Intel processors. Itanium-2 uses 64-bit processor technology, which is anticipated to displace today's 32-bit technology over the next few years. As part of the agreement with CERN openlab partners, this cluster is planned to double in size during 2003, and double again in 2004, making it an extremely high-performance computing engine. Very recently, IBM joined the CERN openlab, contributing advanced storage technology that will be combined with the CERN opencluster (see related article).

For high-speed data transfer challenges, Intel has delivered 10 Gbps Ethernet Network Interface Cards (NICs) which have been installed on the HP computers, and Enterasys Networks has delivered three switches equipped to operate at 10 Gbps and with additional port capacity for 1 Gbps.

Over the next few months, the CERN opencluster will be linked to the EDG testbed, to see how these new technologies perform in a Grid environment. The results will be closely monitored by the LCG project, to determine the potential impact of the technologies involved. Already at this stage, however, much has been learned that has implications for LCG.

For example, thanks to the preinstalled management cards in each node of the cluster, automation has been developed to allow remote system restart and remote power control. This development confirmed the notion that - for a modest hardware investment - large clusters can be controlled with no operator present. This result is highly relevant to LCG, which will need to deploy such automation on a large scale.

Several major physics software packages have been successfully ported and tested on the 64-bit environment of the CERN opencluster, in collaboration with the groups responsible for maintaining the various packages. Benchmarking of the physics packages has begun and first results are promising. For example, PROOF (Parallel ROOT Facility) is a version of the popular CERN-developed ROOT software for data analysis, which is being developed for interactive analysis of very large ROOT data files on a cluster of computers. The CERN opencluster has shown that the amount of data that can be handled by PROOF scales linearly with cluster size: on one cluster node, it takes 325s to analyse a certain amount of data, and only 12s when all 32 nodes are used.

Data challenges

One of the major challenges of the CERN opencluster project is to take maximum advantage of the partners' 10 Gbps technology. In April, a first series of tests was conducted between two of the nodes in the cluster, which were directly connected ("back-to-back" connection) through 10 Gbps Ethernet Network Interface Cards. Transfer reached a data rate of 755 MB/s, that is, approximately 3/4 of the maximum attainable bit rate of the interfaces. The transfer took place over a 10 km fibre and used very big frames (16 KB) in a single stream, as well as the regular suite of Linux Kernel protocols (TCP/IP). The best results were obtained when aggregating the 1 Gbps bi-directional traffic involving 10 nodes in each group. The peak traffic between the switches was then measured to be 8.2 Gbps. The next stages of this data challenge will include evaluating the next version of the Intel processors. The CERN opencluster equipment also contributed to the data challenge carried out in May, where storage-to-tape rates of 1.1 GBps were achieved for periods of several hours by Bernd Panzer and his team. In order to simulate the LHC data acquisition procedure, an equivalent stream of artificial data was generated using 40 compute servers. This data was stored temporarily to 60 disk servers, which included the CERN opencluster servers, before being transferred to the tape servers. A key contributing factor to the success of the data challenge was a high performance switched network from Enterasys Networks with 10 Gbps Ethernet capability, which routed the data from PC to disk and from disk to tape The technology that IBM brings to the CERN openlab partnership is called Storage Tank®. Conceived in IBM Research, the new technology is designed to provide scalable, high-performance and highly available management of huge amounts of data using a single file namespace, regardless of where or on what operating system the data reside. (Recently, IBM announced that the commercial version will be named IBM TotalStorage® SAN File System.) IBM and CERN will work together to extend Storage Tank's capabilities so it can manage of the order of a petabyte of data and provide access to the data from any location worldwide. As part of their openlab sponsorship, IBM has just installed a 28TB TotalStorage system in the Computer Centre.

Storage Tank (ST) goes beyond traditional cluster file systems in two key ways. First, it can support file sharing between heterogeneous computers running different operating systems, whereas typical cluster file systems only support homogeneous (single operating system) sharing. Second, ST employs policy-based storage management to simplify and centralize storage management for all the data in the enterprise. Administrators specify policies for how backup, restore, migration and allocation are to be performed. ST enforces these policies automatically, without intervention.

An open approach

While many of the benefits of CERN openlab for the industrial partners are of a technical nature, there is also a strong emphasis in CERN openlab's mission on the opportunities that this novel partnership provides for enhanced communication and cross-fertilisation between CERN and the partners, and between the partners themselves. Top engineers from the partner companies collaborate closely with the CERN openlab Technical Team. This team includes staff from the ADC and CS groups as well as support from the FIO group. As part of the sponsorship, HP is funding two CERN fellows to work on the CERN opencluster. CERN openlab also organises thematic workshops on specific topics of interest, bringing together leading technical experts from the partner companies, as well as public First Tuesday events on general technology issues related to the CERN openlab agenda, which attract hundreds of participants from the industrial and investor communities. A new meeting room next to the User Area of the Computer Centre, nicknamed the openlab openspace, was inaugurated earlier this year. This is being actively used for VIP visits to the Centre.

A CERN openlab student programme has been created, which will bring together teams of students from different European universities in July in August - 11 students in all - to work on applications of Grid technology. And CERN openlab is actively supporting the establishment of a Grid Café for the CERN Microcosm exhibition - a web café for the general public with a focus on Grid technologies, including a dedicated website that will link to instructive Grid demos.

Efforts are ongoing in the CERN openlab to evaluate other possible areas of technological collaboration, with current or future partners. The concept is certainly proving popular, with other major IT companies expressing interest in joining. This could occur by using complementary technologies to provide added functionality and performance to the existing opencluster. Or it could involve launching new projects that deal with other aspects of Grid technology relevant to the LCG, such as Grid security and mobile access to the Grid.

In conclusion, CERN openlab puts a new twist on an activity - collaboration with leading IT companies - which has been going on in the IT Division for decades. Whereas traditionally, such collaboration was bilateral and focused on here-and-now solutions, CERN openlab brings a multilateral long-term perspective into play. Judging by the enthusiastic response of the industrial partners at the Board of Sponsors meeting, we can expect the CERN openlab to develop into a powerful and highly visible component of the Division's strategy in the coming years.


See also the two articles (extracts from the official CERN Press Release):
  1. IBM Joins CERN openlab for DataGrid applications
  2. CERN Breaks Gigabyte/s Storage-To-Tape Barrier With StorageTek


For matters related to this article please contact the author.


Cnl.Editor@cern.ch
CERN-CNL-2003-002
Vol. XXXVIII, issue no 2


Last Updated on Tue Jul 08 11:24:27 CEST 2003.
Copyright © CERN 2003 -- European Organization for Nuclear Research