4.7 Book-keeping for Production and Analysis

4.7.1 Production Database

The production database is the persistent back-end to the ATLAS production system, which is described in detail in Chapter 5. The present production database was developed as part of the production system development effort for DC2. The database currently holds information about tasks (aggregations of jobs), datasets (aggregations of logical files), task transformations (specifications describing the transformation of datasets into other datasets), job transformations (an executable transforming logical files into other logical files), job definitions and job executions (information about every attempt at executing each job).

After almost one year of operation the production database holds 74 job-transformation records, ~1000 tasks and ~2500 datasets, 1 million job definitions, 1.6 million job executions, and about 2.3 million logical files. The size of the database is less than 4 GB.

The database is implemented with Oracle on the physics DB service of CERN IT. In February 2005, after chronic performance problems, it was moved from the general physics cluster to a dedicated server. Performance with the dedicated server has been adequate, but the server load caused by the production activity has been extremely high (I/O rate from disk greater than 100 MB/s, equivalent to reading in the complete DB every 40 seconds). Work has been under way with the assistance of IT experts to understand the reasons for this load and optimize the database and its client applications. This has led recently to a reduction in the load by more than an order of magnitude. The dedicated server should now be able to sustain a load ten times higher than the current production activity.

The production system architecture supports the physical distribution and/or replication of the database, while presenting to its clients the appearance of only a single production database. This keeps the system simple and scalable. It is expected to be straightforward to use this capability to introduce logically several production databases.

By its nature the production database will grow monotonically. However the part of it that is in active use (jobs yet to be processed, in processing, or recently processed) will remain relatively constant, only growing with the CPU capacity available to the experiment. We expect the database back-end to be able to exploit this 'sliding hot data' pattern, with old records archived so as not to impact hot-data access. Access costs for archived data will be higher, but accesses will be infrequent.

The production database contains a wealth of information useful for computing derived statistics such as number of jobs executed per day, daily failure rates, specific-failure incidence evolution over time, correlations, and so on. Extraction of such monitoring statistics must not interfere with the ongoing production activity. At present, apart from exceptional incidents, indications are that these two activities can coexist. To ensure this remains the case, we expect that in the near future the derived data will be computed only once and stored in additional tables. These tables could then easily be moved to another server if necessary to exclude the possibility of interference altogether.

Further development of the production database will take place in the context of production system development, in close collaboration with the database and data management project. An important development activity will be integration with the DDM system, in particular with the DDM system's management of dataset and logical file information. At the start of production-database development it was not clear what file metadata would be useful, and whether the Grid-native metadata catalogues would allow schema evolution in a reasonable way. Hence it was decided to simply store the file metadata in the production database, under our full control. The file metadata problem is now being addressed in the context of the DDM system, and we expect that the additional copy in the production database will become unnecessary.

4.7.2 Offline Book-keeping

ATLAS has used the AMI database framework for offline book-keeping in DC1, DC2, and for the 2004 combined test beam. AMI is designed to be very flexible; it supports different schema, and different relational database backends. This feature has been extremely useful as requirements evolve, and as different and complementary tools have become available. AMI-compliant databases are self-describing, which means that the same software can be used as usages and schema evolve. In particular a generic and configurable web search interface gives access to all data in AMI-compliant databases.

Although MySQL was exclusively used by AMI up to April 2005, an Oracle back-end has been successfully integrated in a development version during May 2005. It is expected to go into production during the summer of 2005.

AMI has been available as a web service since December 2003. Clients have been developed in several languages; the most frequently used being Java and Python, but C++ is also available. The Python client is supported by the GANGA (Python-based interactive interface to Grid services) team, and work is under way to use it for the input of book-keeping data from Athena job options files.

AMI implements a fine-grained authorization system which allows mapping of users to different roles. Integration of VOMS authentication will soon be available.

ATLAS has chosen to keep the production system database, which describes the production jobs, completely separate from the physics metadata book-keeping. The latter system, for DC2 in particular, has suffered from a lack of integration in the production system, impairing the transmission of physics-relevant metadata from the production system to the physics book-keeping database. Considerable work has been done to improve the links between these two systems and an interface for requesting production tasks has been developed. Thus in the future it should be much easier for non-expert users to submit jobs to the production system, with propagation of complete and coherent dataset descriptions to the book-keeping database. We expect this interface to further improve in the next generation of the production system now beginning development.

In the context of the ATLAS Distributed Data Management architecture now under development, AMI will provide the dataset selection catalogue (Section 4.6.5.1.5 ). The major project for AMI will be to add a specific production and analysis web-search interface to complement the current generic one. This interface will be designed following consultation of the physicist user community. It is expected that this interface will not only provide searches on the data contained in the AMI databases themselves, but will also serve as a portal towards other database services such as DQ2 or COOL.

4.7.3 Provenance and Related Metadata

Provenance information is maintained at many scales. Each event "knows" its parent (the RAW event used to produce the ESD, the ESD event used to produce the AOD, and so on), and retains a pointer to it whenever the parent is stored in object format: this is the means by which back navigation is supported. Similarly, the book-keeping system tracks which input file was used to produce a specific output file, along with data describing the transformation used to accomplish this.

The control framework group will deliver in 2005 a "history object" framework encompassing job, service, algorithm and data object histories that will allow one to record and later discover details about the algorithm that created a given physics data object and the job in which that algorithm ran. Persistence of this history information in the ATLAS event store will provide a means to record and query data provenance at the sub-event ("How were these tracks produced?") level.

Other metadata related to provenance include information such as the geometry version used in a simulation, or the calibration tags used in reconstruction. The book-keeping system tracks such things at the dataset level, so one can configure a client job appropriately, though work remains to support dynamic runtime extraction of such information from book-keeping and metadata databases. At the individual event level, an extensible TagInfo object currently records the current geometry tag and version tags from the Athena IOVDbSvc, the service that handles access to time-varying data such as conditions and calibrations.