Computing Technical Design Report

6.2 Software Distribution and Deployment

6.2.1 Preparation and Distribution of ATLAS Software

As described in Section 3.14 , ATLAS software builds are organized in multiple domains:

What is relevant for the Computing Operation are the production releases, which hold the stable code, used for production and end-user analysis, and normally occur every 4-6 months. The build processes are supervised by the Release Coordinator; it is his responsibility to ensure that all software components, including those not produced by ATLAS (external packages), can work together in a coherent way.

The physical builds and patching of the releases and the software distribution kits are supervised by the Librarian. He takes care of the installation and validation of the software at CERN and of the preparation of the Distribution Kit [6-2].

After sufficient testing, the kit is distributed to the external sites where the Validation Kit has to be successfully run. If problems occur in the full procedure, patches could be applied and the full distribution and validation be rerun until the validation tests are satisfactory.

6.2.2 ATLAS Software Deployment on the Grid

ATLAS software is installed on the Grid in different ways, depending on the Grid flavour. These different ways are mostly due to the fact that Grid sites are heterogeneous, using different operating systems and following different installation policies. It should be noted that the Grid sites are not necessarily High Energy Physics oriented sites.

Usually the installation of a new version is initiated by one of the Software Group Managers for a particular Grid. For some sites the installation is performed by the site administrator.

6.2.2.1 LCG Grid

The SIT distribution kit is used for the deployment of ATLAS software in LCG Grid sites.

The installation process is executed via special grid jobs, using the standard LCG tools, customized for ATLAS-specific needs. An installation job, when landing on a target machine of a site, downloads and installs the requested release in an area that is shared at the cluster level. This area, the Experiment Software Area, is pointed to at run time by the environment variable VO_ATLAS_SW_DIR. The main directory of the release ($SITEROOT) is set to $VO_ATLAS_SW_DIR/software/<rel_num>, where <rel_num> is the "dots and numbers" release tag (for example "10.0.1"). The standard procedure also installs the correct compiler for the selected release, along with the other binaries in the Experiment Software Area. The software installation is performed using Pacman [6-3].

After the installation step, the software is validated via the Kit Validation tool. If the validation step is also successful, then a tag, used to uniquely identify the installed release, is published into the Information System of the local Computing Element, so that it can be used during the match-making step for the submissions of the jobs. Similar jobs are used to perform the removal of an installed release and to remove the release tags from a site.

The information about the installation operations and status are kept at central level in a MySQL database. A web page for presenting the installation information is also available [6-4].

6.2.2.2 NorduGrid

The Nordic Computing Grid (NorduGrid [6-5]) is built of many different systems (RedHat, SuSe, Debian, etc.). A full rebuild of the ATLAS software is generally required by NorduGrid sites, in order to make it compatible in native mode with the different architectures. The recompiled code is then packaged as RPMs and distributed to the sites. The installation is performed by the site administrator, as no Nordic site allows non-root users to write into the disk areas used for the run-time libraries of the experiment software and to publish the tags needed to identify the installed release in the Information System.

After the installation step the software is validated in local mode through a Kit Validation session. If this step is successful, then a further check, done by executing another Kit Validation through a grid job, is performed. This second check ensures that the release tag is also published correctly and that the script, used for setting up the runtime environment, is working properly.

All ATLAS software used in NorduGrid, up to release 10.0.1, has been fully recompiled from scratch, including the external packages. However, starting from release 10.0.1 the SIT Pacman distribution has been used, owing to the limited manpower available to perform the full rebuild. This limited the number of available sites to <6 sites, since the ~14 remaining sites cannot use the Pacman installations because of management reasons. Pacman, in fact, does not produce any record in the system databases, and this is not accepted by all sites.

6.2.2.3 Grid3/OSG

The installation of ATLAS software on US Grid3/OSG sites is done using Pacman, and built upon the Grid3/OSG Information Service infrastructure. The SIT distribution kit is used for software deployment.

The remote install scripts query the sites, to find the corresponding remote locations, then transfer the installation scripts and perform a local Pacman installation. At the end of the process the ATLAS application information is published and/or updated in the site information services and the installation web page.

6.2.2.4 Future of Software Installation on the Grid

The Software Infrastructure Team (SIT) is working on a new standardized installation procedure which should be identical for Grid and non-Grid sites. The new installation kit will allow both to install the compiled libraries and to perform a full cycle of compilation, creation of libraries and installation. The latter way to proceed is necessary for sites running unusual operating systems.

As mentioned earlier, most of the time the installation of a new version of the software is initiated by the Software Group Manager by sending jobs on the Grid. However, in some cases, especially when the compilation of the source code is required, the installation is initiated by the local ATLAS site administrator.

The new installation kit will trigger a validation operation which will consist in running several applications producing histograms which will be compared with reference ones. Depending on the results of the comparison, the installation will be considered as validated or not, in both cases the person responsible for the installation will be notified.



4 July 2005 - WebMaster

Copyright © CERN 2005