6.4 Productions

The organization of the "Productions" is still under discussion. Today's idea is to have a "Computing Operations Group" responsible for the overall operations at Tier-0, Tier-1 and Tier-2.

The relative priorities of various productions and activities will be decided by a "Production Management Board". A "Support Team" will be in charge of ensuring that the necessary infrastructure is in place and provides support to the users.

All centrally organized operations are expected to be performed using the ATLAS Production System `ProdSys'.

6.4.1 Distributed Production and Analysis Operations

While the operations at Tier-0 will be done in a centralized way, in a dedicated cluster using the T0 Management System, it is foreseen that the operations at Tier-1s, like the reprocessing, or at Tier-2s, like Monte Carlo simulation, will be done in a distributed way, on the Grid, using the ATLAS Distributed Data Management system (DDM).

As described in the Computing Model, the ProdSys will use the resources available at the Tier-1s and Tier-2s in the most efficient way possible. It is assumed that:

The ATLAS DDM, through its own services, will provide the necessary information to run a given application, such as the location of the input data; this assumes accessibility to the necessary Grid catalogue. It will also provide the services to access the conditions databases holding the conditions data (e.g. geometry, calibrations, alignments, etc.).
The Grid information systems will provide information on the availability of resources, such as CPU and data storage, on all sites accepting the ATLAS VO. This assumes that the Tier-1s and Tier-2s have published the relevant information and are able to communicate to the service whether they can accept or refuse to run an ATLAS job.

The ProdSys will use this information to optimize its own work. For example, it will be responsible for splitting a given task into several jobs if necessary and deciding where these jobs should be executed:

Whether the jobs should be run at sites where the data resides, avoiding data replication?
Whether the jobs should trigger a replication of data through the DDM services in order to run where CPU resources are available.
What should be the priority in which the jobs are executed.

It should be noted that this decision may not necessarily be done at the top level of the system but can be delegated to one of its components like a Grid Resource Broker for example.

At the end of a job, it will also be the ProdSys' responsibility to store the produced data at the best place, which could be a requested one, and to ensure that, if requested, the data is stored in a safe way. It is also responsible for ensuring that the relevant information is stored in the bookkeeping and catalogue databases. ProdSys will maintain the complete provenance of all data produced and provide an interface to make this available to the collaboration.

6.4.2 Management of the ATLAS Virtual Organization

A Virtual Organization (VO) is a collection of people, resources, policies and agreements belonging to a real organization, such as the ATLAS Experiment. The Virtual Organization mechanism provides a way to give authorization to the user during task instantiation. User credentials are stored in the VO and organized in groups. Each site, when receiving a request from one user, is then able to decide whether or not (and with which priority) to give access to the underlying resources, on the basis of the information retrieved from the VO to which the user belongs.

Currently the VO is implemented as an LDAP database, hosted at NIKHEF. This database is synchronized with a VOMS server (Virtual Organization Management Service), which keeps track of the user roles and groups in a more efficient way. However, on account of some incompatibility between the European VOMS server and the US server, it is currently impossible to switch completely to the VOMS system.

In order to be able to access grid resources, members of the ATLAS Collaboration must also be part of the ATLAS Virtual Organization.

6.4.2.1 The Authorization System on the Grid

From the authorization point of view, a Grid is established by enforcing agreements between resource providers (facilities offering resources to other parties, according to a specific "common understanding") and VOs, where, in general, both parties control resource access.

The authorization mechanism is implemented taking into account two types of information:

General policies, i.e. the relationship of the users with their VO: groups they belong to, roles they are allowed to play and capabilities they are allowed to exercise.
Local policies of the resource provider: what users are allowed to do, ACLs, etc.

Authorization, as stated before, is based on policies written by VOs and their agreements with resource providers. It is the resource providers who enforce the local authorization policy. A VO can have a complex structure with groups and subgroups in order to clearly divide its users according to their tasks. Moreover, a user can be a member of any number of these groups.

A user, both at VO and group level, may be characterized by any number of roles and capabilities. The enforcement of these VO-managed policy attributes (group memberships, roles, capabilities) at the local level descends from the agreement between the VO and the resource provider. The latter can always override the permission granted by the VO, for example to ban unwanted users.

In ATLAS we foresee several different groups covering the various activities, software and computing, physics, combined performance, detector. Currently we have identified three different roles:

Grid software administrator, in charge of installing and managing the resources.
Production managers, responsible for official productions.
Normal users.

6.4.2.2 ATLAS VO Structure and Responsibilities

A team of managers operates the ATLAS VO. Currently the people involved are:

One general Manager and Coordinator;
One manager for each of the three Grids: LCG/EGEE, Grid3/OSG and NorduGrid.

The VO managers perform controls on the eligibility of the users before admitting them to the ATLAS VO. The checks are generally performed by using the CERN HR database and/or with the help of known people who may act as guarantor for new users.

The VO managers usually inform the users about the status of the request, if it is either accepted or rejected, upon the completion of the checks. The VO administrator has the right to ban a user from the VO at any time if he fails to comply with the usage guidelines.