Computing Technical Design Report

8.3 Manpower and Hardware Resources

8.3.1 Manpower Resources

Up to, and including 2005, the totality of the manpower for the Software & Computing project has been made available through voluntary contributions of the funding agencies that are members of the ATLAS Collaboration. As is always the case with voluntary contributions, the effort tends to be concentrated on topics that are more intellectually interesting for the contributors and beneficial to the career development of the people concerned; other vital areas of software and infrastructure development and support are not covered adequately.

As this situation was common to all LHC experiments, in September 2003 the LHCC held a review of the manpower available, concentrating on "Core Software" [8-3]. The definition of "Core Software" used at that time included all developments relative to frameworks, databases, infrastructure and services; the review highlighted a considerable lack of manpower for these activities and resulted in a call for help to the Collaboration. Since then, the situation has improved considerably in the field of databases, thanks also to the organization of the ATLAS-wide Database and Data Management project.

One of the recurring problems of the Software & Computing project has been the lack of formal acknowledgement of the effort committed to it. As the development of the software tools and the computing environment is a necessary component of the construction and operation of the ATLAS detector, the Software & Computing project had to be included in a formal Memorandum of Understanding between the members of the Collaboration. The approval of the Computing Addendum to the ATLAS Maintenance and Operations Memorandum of Understanding ("M&O MoU") [8-4] by the Resource Review Board in April 2005 has remedied this situation.

As described in the Computing Addendum, the "Core Computing" part of the ATLAS Software & Computing project covered by the M&O MoU is concerned with the development and maintenance of the experiment software framework and databases, the web and other documentation, the software infrastructure, with visualization and with the production tools. It also includes ATLAS software distribution and the provision and support of its interfaces to the Grid and LCG software.

The Core Computing activity concerns computing after the online; it does not concern computing local to a detectors subsystem (including the DAQ), nor computing in the trigger system or event filter. The Core Computing M&O contribution provides common infrastructure to the detector systems, as well as to the rest of offline computing activities. While it uses some tools provided by CERN-IT and the LCG project, the work is experiment-specific, and hence not covered by those bodies.

The Core Computing is divided into two categories (A and B) because of a clear distinction in the nature of the tasks involved and the rewards in terms of professional development associated with the tasks.

Category A tasks are extremely technical in nature. They are in general service tasks with required skill sets that are not generally compatible with those of a physicist. They are also at a level whereby they take a significant fraction of a Full-Time Equivalent (FTE) for the people carrying them out. These tasks are, however, crucial to the functioning of the experiment. Several funding agencies are already providing effort in these 'unglamorous' areas. The M&O mechanism ensures a contribution from all agencies without dizadvantaging those who are already contributing. The analogy is with detector construction, where technical effort is brought in to cover many tasks. The M&O Addendum provides such a budget for the infrastructure and services part of Core Computing.

Category B activities are also vital, but the tasks involved have more obvious rewards in terms of status for the physicists fulfilling them, and for their funding agencies.

Both categories of effort are monitored through the project organization, with a resources officer comparing the work reported on a quarterly basis with a defined Work Breakdown Structure. This process has been in place for several years; the allocation of resources and resulting milestones have been systematically monitored and reported. It is this mechanism that has identified the persistent and severe shortfall in category A tasks.

 

8.3.1.1 M&O Category A

The Core Computing project provides a number of services to the entire Collaboration; the cost of these services has to be shared by the Collaboration as contributions to the Category A M&O budget.

Services needed by the Collaboration are listed in Table 8-1, together with the required manpower levels for 2005-2006. So far it has proven impossible to provide the manpower to cover the needed tasks through voluntary contributions only. It is hoped that the inclusion of these tasks in the Category A budget will encourage in-kind (manpower) contributions from the Funding Agencies, thus reducing the need to finance these activities centrally.

As the work items listed in Table 8-1 consist of services to the Collaboration, it is expected that if the required effort is delivered on time, the required manpower in subsequent years will not increase. The present estimate is that the resource needs will continue at the 2006 level up to 2010. A more detailed profile will be provided in 2007, based on the progress made in 2005 and 2006. Even though some tasks will become more efficient and require less effort (such as quality assurance framework development), other activities (such as user support and database services) will grow as the software moves from the hands of the developers to those of the users.

The current planning of core computing (Cat. A) activities assumes that in 2005 the Funding Agencies will make 10.3 FTEs available for computing infrastructure and service-related activities. These people are currently providing partial cover for all tasks listed in Table 8-1. The required effort in 2006 of 16.4 FTEs includes the projected 10 FTEs made available as continuations of existing in-kind contributions. Priorities are set in order to ensure the availability of the computing infrastructure in time for the physics start-up in July 2007. Should the 16.4 FTEs not be available for one reason or another, the level of user services will be most affected, thus resulting in delays in the physics analysis chain.

Table 8-1 Description of work items for work packages covered by the M&O-A budget for 2006

8.3.1.2 M&O Category B

The bulk of the Core Computing effort is based on voluntary, but recognized, manpower contributions, and as such can be treated in the M&O-B framework. For information, we give in Table 8-2 the list of activities and current (2005) manpower levels.

It should be noted that the available manpower has been below 70% of the requirement in almost all software development sectors for several years. This situation has generated delays in the preparation of the software infrastructure and framework, as well as in the development of algorithmic code. In order to catch up and be ready for detector commissioning and experiment turn-on, a much-increased participation in Core Computing Cat. B activities for 2006 and beyond is necessary. The inclusion of the tasks into the M&O process allows due credit to be given for the voluntary contributions made.

Table 8-2 Activities covered by M&O Category B and current (2005) manpower levels.

WBS

Activity

FTEs in 2005

1.1

Computing Coordination and Management

3.2

1.2

Software Project

 

1.2.1

 

Coordination & Management

3.0

1.2.7

 

Simulation: Generators, Geant4 framework, Digitization, fast simulation

6.3

1.2.8

 

Core Services: Athena framework, Databases, Geometry, EDM, Graphics

10.5

1.2.10

 

Event Selection, Reconstruction and Analysis Tools

4.3

1.3

Database & Data Management (all activities, including non-offline)

19.5

1.4

Computing Operations

 

1.4.1

 

Grid, Data Challenges & World-wide operations

15.0

1.4.3

 

Grid Tools and Services development and deployment

12.0

1.4.4

 

Operations Management

2.3

 

Total

76.1

8.3.1.3 Other Software Developments

The development of software for detector and trigger studies, and of algorithmic code for calibration, reconstruction and analysis is not covered by the M&O MoU, as it is thought that these activities fit naturally into what is expected from all physicists who are members of the ATLAS Collaboration. These activities are nevertheless necessary, therefore they are coordinated by the Software & Computing project and acknowledged internally by the Collaboration.

Approximately 90 FTEs are contributing in 2005 to detector software developments (30 each to the Inner Detector and Muon System, 25 to the Liquid Argon Calorimeters and 5 to the Tile Calorimeter). This level of manpower may seem large but has to be compared with the large number of tasks still outstanding in the WBS [8-1]. Our partial success in attracting people to work on core software during last year resulted in a decrease in the number of available expert developers of detector software.

8.3.2 Hardware Resources

All hardware resources necessary for the operation of the ATLAS Computing Model will be provided by the LCG Collaboration. The LCG Memorandum of Understanding [8-5] defines the collaboration framework and the contributed resources (hardware and service levels). In this document we define only the resource needs as a function of time (see Chapter 7).

A small amount of hardware is required for central build and server machines. These are not covered in the online M&O for ATLAS, being primarily for offline code development and distribution, and for offline documentation services. Hardware (CPUs and disk) replacements and extensions are needed for the build system that is currently hosted by the CERN computer centre. A number of machines need to be configured differently from those offered by the IT Department services, and have to be under the control of ATLAS. This is in order to test new platforms, operating systems and compilers, and to build software to be used by collaborating institutions that have different infrastructure from that provided by CERN-IT. The yearly requests for this hardware are submitted to the RRB as part of the M&O Category A contributions.



4 July 2005 - WebMaster

Copyright © CERN 2005