CERN Accelerating science

This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.

CERN home pageCERN home pageDocument Access by ReferenceDocument Access by ReferenceComputer News LetterComputer News LetterCNL issues in year 2000CNL issues in year 2000CNL-2000-002CNL-2000-002CNL-2000-002CNL Help pages

Contents

Editorial Information
Editorial
If you need help
Announcements Physics Computing Desktop Computing Internet Services and Network Scientific Applications and Software Engineering Desktop Publishing The Learning Zone User Documentation Just For Fun ...
Previous:Physics Computing
Next:AFS File Service, Status and Plans
 (See printing version)



CMS Completes Large Scale Objectivity Production

David Stickland , EP


Abstract

CMS in collaboration with IT has recently completed a massive data processing aimed at validating its Higher Level Trigger (HLT). The CMS off-line reconstruction baseline is to use an Object Database Management System for both Event data and meta-data and this scenario has been used to process 2 million LHC events with Pileup corresponding to a luminosity of 1034 cm2 sec-1. This production required the use of some 200 Linux/Intel CPU's, the interfacing of Objectivity to the CERN Mass Storage System (HPSS) where 5TB of data are stored and the movement of ~70TB of Pileup hits. The production was completed in two weeks and is now in use by CMS physicists studying Trigger algorithms and their performance.


The Higher Level Trigger (HLT) of CMS will be implemented entirely in software and the validation of its rejection power and efficiency is a milestone of CMS. CMS has decided that this validation should be performed using software that is functionally as close as possible to that which will be used in the final experiment. The Object Reconstruction for CMS Analysis program, ORCA has been developed with this as its primary short-term goal. This is an Object-Oriented program implemented in C++ based on the design guidelines laid out in the CMS Computing Technical Proposal and built on the foundations of several years test-beam experience.

The immediate goal of the Spring ORCA/HLT production was to prepare a fully digitized sample of ~2 million events with full pile-up at a simulated luminosity of 1034 cm2 sec-1. This is a multi-stage process taking ~2million CPU minutes of computing (2×109 SI95·sec) and involving the disk and network I/O of some 70 TB of pile-up hits.

The production starts from ~2TB GEANT3 files which were converted into Objectivity/DB format and stored in the HPSS system. Most of the GEANT3 production was performed off of the CERN site in Finland, Italy, Russia and the USA. Each signal event was then convoluted with ~ 10 bunch crossings of minimum-bias events at a rate of ~20 events/crossing (these are the 70TB of data that must be transported). The final results are again stored in an Objectivity/DB where they are used in reconstruction and analysis studies.

This was achieved making use of a farm of ~100 dual-cpu high performance PC's running the Linux operating system, part of the Event Filter Farm shared between experiments for such development tests.

Since the total quantity of data is in excess of the disk space available, an interface, developed in collaboration between CERN IT/DB and SLAC, between the Objectivity/DB AMS (Advanced Multi-threaded Server) and the CERN HPSS system was used. Via this interface requests for data objects in files that are no longer on disk are automatically staged in. To make maximum usage of the resources available this AMS interface was set up on more than 30 nodes (Linux and Sun/Solaris) of which 8 also accessed HPSS.

The Central Data Recording (CDR) scripts, previously used heavily in test-beams, were used to migrate completed database files out of the production system and into HPSS. Parts of this work were going on at the same time as the ALICE data challenge reported in the previous CNL resulting in considerable extra challenges for the personnel in IT/PDP responsible for keeping these systems operational.

Considerable effort went into monitoring the many different systems all required to be fully operational to carry out this production. Information was collected from all nodes in the system using a protocol based on that of the NetLogger project from LBNL. The information was collated and presented in the form of time dependent histograms. This typically enabled us to understand why a breakdown occurred. In the next production, planned for this autumn, we intend to deploy more active control of running jobs so that we can react before some problems develop into catastrophic ones.

CPU/Disk/Network activities

This Figure shows the CPU/Disk/Network activities on each of the nodes running an AMS. (right-click on image to expand to full size)

The pileup operation, whereby we try to simulate the LHC event environment, requires the superposition of 35MB of "hits" from some 150 minimum bias events in many beam crossings and the hits of each signal event (1 MB) and the subsequent simulation of the detector digitization process to yield digits in roughly the same format as will eventually be recorded by the detector. The Computing time for this process was about 1-1.5 minutes/event and with 140 CPU's working in parallel in the production we achieved a steady state reading rate out of Objectivity/DB of 50MB/sec. Physically this was achieved by distributing the pileup "hits" over 24 PCs each running an AMS. A further 6 PCs with larger disk caches served the signal events and a Sun Server was used to record the output objects into the database files

Features of Objectivity/DB such as the ability to put different types of data into different containers and then to cluster them according to their expected use patterns were invaluable in allowing us to meet this challenge. For example when storing the GEANT3 output into Objectivity/DB we ensured that MC truth information, calorimeter and muon hits and tracker hits went into different files, as for this particular exercise only the calorimeter/muon digits were required in the first production step. Users performing analysis may navigate to the MC truth information. Having performed a selection of events satisfying some criteria users can create either a "shallow copy" (partial copy of the complex event containing links to original data objects) or "deep copy" (new copies of the events), depending on how the collections will be used. In a subsequent user or production step, the Tracker Digitization and Track Finding can be added to the existing digitizations to allow the further analysis of this interesting subset of events.

This is only the second in a series of such challenges designed to reach the goal of a factor of 1000 rejection of the LHC Level-1 triggers before storing the data. Eventually the software to achieve this must be of unprecedentedly high-quality as it will be removing genuine physics events from the data flow at an earlier stage than has been the case in previous experiments. Performing this work with the computing resources available in 2000, instead of in 2005, requires close collaboration between software and computing experts both in CMS and in IT. Many obstacles had to be overcome, and of course many have still to be discovered as the scale of the productions is increased to reach the final scale of LHC computing. There is however no substitute for meeting these challenges one at a time and gaining the experience that will be required to carry out the analysis of the LHC experiments.

In Autumn 2000, the next massive production will take place. This time CMS intends to make use of some of its worldwide computing resources carrying out Objectivity/DB population and analysis in multiple sites and coordinating them using prototype GRID tools derived from the GLOBUS project by CMS.


About the author(s): David Stickland is the CMS Reconstruction Coordinator.


For matters related to this article please contact the author.
Cnl.Editor@cern.ch


CERN-CNL-2000-002
Vol. XXXV, issue no 2


Last Updated on Fri Aug 18 19:49:20 GMT+04:30 2000.
Copyright © CERN 2000 -- European Organization for Nuclear Research