4.4 Conditions and Configuration Data

4.4.1 Introduction

Conditions data refers to nearly all the non-event data produced during the operation of the ATLAS detector, together with that required to perform reconstruction and analysis. Conditions data varies with time, and is usually characterized by an `interval of validity' (IOV), i.e. a period of time for which it is valid, expressed as a range either of absolute times or run and event numbers. Conditions data includes data archived from the ATLAS detector control system (DCS), online book-keeping data, online and offline calibration and alignment data, and monitoring data characterising the performance of the detector and software during any particular period of time.

The basic function of the conditions database is to store the conditions data objects themselves, together with the associated IOVs. In some cases (e.g. DCS data), the data and IOV are tightly coupled, and stored together in the conditions database. In others (e.g. large calibration datasets), the conditions data can be created and validated without reference to any IOV, and only later assigned to one or more IOVs, when it is decided for which period(s) of time this calibration is valid. In this latter case, the conditions data may be stored independently of the relational database holding the IOVs, and the IOVs act as a cataloguing mechanism to index the conditions data objects by time, allowing the selection of subsets of calibration data corresponding to particular time periods. This view allows a distinction between the narrow IOV database, just holding the IOVs cataloguing the conditions data, and the wider conditions database, also including the data objects. When assigned to an IOV, the data objects are sometimes referred to as the payload of the IOV.

Conditions data is closely related to configuration data, needed to set up and run the ATLAS detector hardware and associated online and event selection software. Configuration data is characterized primarily by version or purpose (e.g. physics run, calibration or cosmic run) rather than interval of validity. However, once it is used for a particular data-taking run it becomes conditions data valid for a particular time, and it must be possible to recover the complete configuration in use at the time any particular event was taken. Particularly in cases when this data needs to be accessed from the offline reconstruction framework, this requirement suggests the use of the conditions database to store such configuration data. The ATLAS TDAQ project has developed a dedicated configuration database [4-2] based on the OKS [4-3] persistent in-memory object manager. This is widely used within TDAQ for configuring the online hardware and software systems and the data-flow system for the data transport and high-level trigger infrastructure. The TDAQ configuration database places great emphasis on good read performance, scalable to the thousands of processors in the online system. However, persistent storage is currently provided only by versioning the XML data files in a CVS repository, and there is no link to Athena. Some feasibility studies to improve archiving and implement Athena access are under way, and the possibility of replacing the XML files with a relational database backend is also being explored. However, the baseline plan is now to use the conditions database for sub-detector and trigger configuration data that is naturally stored in a relational database and needs to be accessed offline.

An ATLAS-wide conditions database was deployed for the 2004 combined test beam, and widely used for the storage of DCS data and calibration/alignment information. This database was based on a MySQL implementation of the conditions database interface developed by the RD45 project [4-4], with significant enhancements provided by ATLAS [4-5]. The database provided the association of conditions data objects with IOVs, with the conditions data itself being stored either within the database, or externally as simple structures mapped onto a separate MySQL database [4-6], or POOL objects stored in POOL ROOT files. Considerable experience was gained in both online and offline use of the database, and this has been very valuable input for the design and implementation of the new LCG conditions database product, COOL.

The final ATLAS conditions database will be based on COOL, making extensive use of all the features that this software offers. COOL is being deployed now for sub-detector commissioning in the ATLAS pit, and will also be heavily used in computing system commissioning late this year and in early 2006. Various ATLAS-specific interfaces and utilities will be built on top of COOL, to support all the different types of conditions data in the experiment.

4.4.2 Conditions Database Architecture

Conditions data is usually characterized by an IOV and a data payload, with an optional version tag. The latter is only appropriate for some types of data, e.g. for identifying different sets of calibration data valid for the same IOV but corresponding to different calibration algorithms or reconstruction passes. The COOL conditions database supports both versioned and un-versioned data, and allows several possibilities for the storage of the data payload itself, either within the conditions database or externally in other databases or files. At the most basic level each set of conditions database objects corresponds to a relational table, with columns giving the interval of validity, an optional channel identifier (to allow several related but independent measurements to be stored together) and one or more data payload columns of simple types. The payload can instead be a reference to data stored elsewhere, e.g. a database table foreign key, a POOL token or an external file identifier. In these cases, the conditions database table is being used to catalogue or index data which is stored elsewhere, and may have an independent existence.

ATLAS plans to make use of all of these possibilities to store the various types of conditions data:

DCS data will typically be stored directly in the conditions database tables; here the amount of payload data per IOV is usually small, and the data has no existence outside an IOV (i.e. a DCS measurement is always associated with a particular timestamp). This solution is also appropriate for other types of data where the payload is relatively small and the ability to make relational database queries on both the IOVs and the data payload is important. One typical example is monitoring summary data, where queries can be made to determine all runs where e.g. a particular part of the detector had high occupancy or was inactive.
Payload data can be stored in the conditions database itself, but in separate auxiliary tables without an IOV structure, the IOV tables then holding foreign key references to row(s) in these payload tables. This solution is appropriate when the data payload is naturally represented in relational tables, but does not have a one-to-one correspondence with IOVs (either because one IOV points to several payload rows sharing a key, or the data payload can exist outside of the context of any IOV). Keeping such payload data directly in the conditions database has advantages both for ease of access and data distribution.
Payload data can be stored elsewhere, either using POOL or in external files of arbitrary format. In the case of POOL, the data can be stored using the POOL storage service, using either streamed ROOT files, or the new POOL object-relational storage service. The latter is particularly interesting as it provides a way to map C++ objects into rows in relational database tables, allowing data to be manipulated both as C++ objects by programs using the POOL software, and by traditional relational database tools as might be used by online software or interactive users. In both cases, the conditions-database IOV tables store POOL object tokens (i.e. the string form of a reference to an object in a particular POOL streamed object file or relational database), allowing the POOL objects themselves to be retrieved from external storage. In the case of other file types, the conditions database simply stores the filename or other reference to the file, and external components are responsible for resolving the reference and providing the data to the application. Additional software will be required to manage and retrieve these files on demand. In the 2004 combined test beam, the CERN Castor system was used for this, although its not really suited to storing small files, and alternatives such as the SealZip package are being considered.

Sub-detector configuration data will typically be stored in the conditions database itself, either directly in the IOV tables or in other relational database structure. In the latter case, the data will be managed outside of the context of COOL, perhaps using the POOL RAL component to access the data in a database-backend independent way; this approach is currently being prototyped for the trigger configuration. COOL would then be used just to track which version of the configuration is used for which data-taking run. In all cases, updating the configuration will have to be done by creating a new version or new IOV, such that data which has already been used in previous data-taking runs is preserved in case it is needed offline. The version tagging features of COOL can also be used, to maintain several parallel versions of a configuration to be used for physics, cosmic or calibration runs. At the present time, sub-detectors (including the trigger system) are beginning to explore the possibilities of using the conditions database for configuration, and the first production use will occur in sub-detector commissioning beginning in spring/summer 2005. It is clear that understanding of the best way to use COOL for configuration data will grow as experience is gained.

ATLAS DCS data will come primarily from the distributed PVSS system controlling and monitoring the experiment and infrastructure. Around 100 PCs (running Microsoft Windows) are expected to act as sources of DCS data to be written into the COOL conditions database. These PCs will each run a dedicated application (known as the PVSS manager) responsible for subscribing to a set of PVSS datapoints and writing data updates to the conditions database. A first version of such an application was used extensively in the 2004 combined test beam to archive data to the MySQL-based conditions database, and much useful experience was gained. The total DCS data volume stored was O(10 GB), compared to around O(100 GB) per year being expected in the full ATLAS detector. An enhanced PVSS manager is now being developed, which will interface to COOL, and allow better configurability, both of which PVSS datapoints are stored in which conditions database table structures, and what filtering options (e.g. store on change, store at regular time intervals or on significant events such as start/end of run) are applied. Since around 100 separate PVSS manager applications are expected in the final system, efficient table structures (e.g. storing many independent but similar datapoints together in common tables), and buffered bulk updates (e.g. sending data as bulk inserts only every few minutes), will be very important. This enhanced manager will be used for the first time during ATLAS commissioning in 2005. Monitoring data will come as two basic types - summary tables consisting of numbers characterising parts of the system during e.g. one run, and histogram data. The former will be stored directly in the conditions database, with IOVs corresponding to the associated run, and the latter as histogram raw data e.g. in ROOT files. These files will be catalogued using the conditions database, with tables holding IOVs and references to the data files. As for conditions data POOL ROOT files, a system will be required for cataloguing and managing these external files. Little systematic work has been done on archiving monitoring histograms to date, but this is a priority area for the initial phase of sub-detector commissioning.

Online book-keeping data consists mainly of metadata on the runs and event data files produced during online data-taking. Some of this data will be stored in the conditions database, and some will be fed to the book-keeping database of the ATLAS offline production system. Several tools were used in the 2004 combined test beam to help with this. The online software `conditions database interface' (CDI) allows datapoints of the online IS information system to be monitored and logged to the conditions database [4-7], and this was used to store parameters of each data-taking run and the associated beam settings. This system is now being updated to interface to the COOL conditions database. Data was also transferred to the AMI offline book-keeping database [4-8] to steer the subsequent reconstruction of combined-test-beam data and enable searches of the available datasets to be performed. This area will be revisited in the context of developments in the ATLAS production system and distributed data-management areas, and will have to be greatly expanded in functionality to deal with the multiple data sources in ATLAS, where event data will be written to many event filter sub-farm outputs (SFOs), with multiple streams (primary physics data, express physics stream, calibration and diagnostic streams) being produced in parallel. Close coordination with the production system managing the ATLAS Tier-0 reconstruction and calibration production will also be required.

Calibration and alignment data will utilize several of the possible data-payload storage options, with small data payloads being stored directly in the conditions database tables, and larger ones being written using POOL, either to streamed POOL ROOT files or with the POOL object-relational storage service. Storing data using POOL is particularly appropriate in offline calibration and alignment contexts, where the data is manipulated primarily as C++ objects, and may be produced in two steps; first as `test' calibrations which are verified and perhaps iterated on, before being committed to the conditions database and assigned to intervals of validity. Several sub-detectors have explored POOL ROOT file-based calibration data in the 2004 combined test beam, and the LAr calorimeter has also used similar external storage in the legacy NOVA database system [4-5]. A similar scheme will be used for ATLAS commissioning and beyond, with the additional possibility of using POOL object-relational storage (expected to replace NOVA for LAr) to store calibration data in such a way that they can also be processed by relational database tools; this may be particularly useful for calibration constants that have to be shared between online and offline. The first version of POOL object-relational storage is available in POOL 2.0, and ATLAS integration is now being actively pursued, with a view to using it first for LAr calorimeter commissioning in summer 2005.

4.4.3 Conditions Database Online Model

The COOL conditions database software is written using the POOL RAL (Relational Access Layer) component, making the API and database schema independent of the underlying database backend (Oracle, MySQL and SQLite being currently supported). The large investment and experience of the CERN-IT database groups in Oracle, together with its proven ability to handle complex demanding applications, makes Oracle the natural choice for the conditions database implementation at CERN.

The CERN instance will have several simultaneous roles:

Provide conditions data to the online system, record online book-keeping, DCS and monitoring information, and track online configuration and calibration data changes.
Provide conditions data to the HLT system and Tier-0 processing farm.
Support diagnostics, debugging and monitoring for online data-taking and prompt reconstruction.
Act as the master repository for ATLAS conditions data, distributing data to the Tier-1 sites as needed, processing calibration updates received from the worldwide collaboration, and keeping backups and archives of old data as necessary.
Provide conditions data for the CERN analysis facility, to support local physics analysis of CERN-based users.

It is clear that a single database instance will not serve all these roles, and a system with several replicas will be required. One possibility is to follow the example of the Tevatron experiments, with a master copy serving the online system, and read-only replicas for e.g. the Tier-0 and CERN analysis efforts. Such an approach was used successfully in the 2004 combined-test-beam MySQL-based conditions database, with offline reconstruction and analysis using a read-only replica of the main online database. However, in ATLAS itself, the online system will access only a fraction of the total conditions data, e.g. it will not require old data from previous years or data taking periods, or the most refined offline calibrations. Such considerations suggest that the online database server contain only the data required online, other data being kept on a separate `offline' master server, with updates being fed from the online system as necessary. Prototyping and scale tests with the various Oracle replication and synchronization technologies are required before a final choice can be made, and such tests are planned during 2005. Another important issue is database server availability and fault tolerance. The conditions database will be essential for ATLAS data taking (particularly for online configuration of sub-detectors), so appropriate measures will be required (e.g. a fail-over server or standby replica) to ensure data is not lost due to database server problems. The physical location of the servers may also be important: in case of network problems or malicious hacker activity, ATLAS requires the ability to shutdown external network connectivity (including to the rest of the CERN site) and continue to take data for 24 hours. This suggests at least one standby server should be located at the ATLAS pit, but developments in network technology, firewalls and routing may mean that the server can still be located in the CERN computer centre and benefit from their database and server administration support. Again, more discussion and tests are required before a final decision can be made.

Database best practices suggest separating reading, writing and schema modification using distinct database roles. In the ATLAS conditions database, there will be many distinct writers (e.g. configuration, calibration and DCS for each sub-detector or subsystem), and these will have write privileges only for their own parts of the database to prevent interference. In contrast, a generic read role will probably be sufficient, since many readers (e.g. the HLT and Tier-0 farm) will need to access data from many sub-detectors at the same time.

Sub-detectors will also want to create new tables and modify their database schema, especially for ad-hoc table structures not created through the COOL API. However, this can have data-integrity implications on a production system, so it is anticipated that such operations will only be allowed freely on a development system, and new structures will only be introduced in production with the participation of the database coordinators. A review procedure will be required for all such changes, to ensure that the new structures use the database resources in an efficient manner and will not adversely impact existing users. This approach is supported by experience from the 2004 combined test beam, where free creation of new tables (particularly for DCS data) on the production server led to an unmanageable schema with 1000s of tables, and which proved very difficult to backup and restore.

Proper use of tablespaces and partitioning will also be vital to ensuring a scalable and maintainable conditions database. Table partitioning by IOV will be used to divide the data into time-based `chunks', e.g. by year or smaller data-taking period, with older data being declared read only or removed from the online server. Efficient organization of data into tables, making full use of the `channel ID' feature in COOL to store many related but distinct values into the same table, will also be important. Discussions with CERN-IT have emphasized the importance of good schema design to help ensure efficient use of database resources, and this will clearly be vital for the ATLAS conditions database, with an expected data volume of several 100 GB of new data each year. Initial deployment of the online conditions database to support sub-detector commissioning at the ATLAS pit starts in spring 2005, with the first significant production use from sub-detectors expected in June. Although the data rates will initially be a small fraction of those expected in ATLAS, these rates will increase rapidly with the whole detector being commissioned in global cosmic runs at the end of 2006. Efforts will therefore be made to have as much as possible of the architecture of the final system in place early, with capacity being added to keep ahead of the increasing load from the ATLAS sub-detectors.

4.4.5 Athena Access to Conditions Data

The Athena framework offers access to the conditions database through the interval-of-validity service (IOVSvc), which ensures that the correct calibration objects for the event being processed are always present in the transient detector store (TDS). An Athena job subscribes to a set of conditions database folders, and corresponding objects for the first analysed event are read in before user algorithm processing of this event begins. The service keeps track of the IOV of each object, and checks before processing each new event which objects, if any, are no longer valid and have to be replaced. In accordance with the Athena split of transient objects and their associated persistent representation, the actual reading of conditions objects is performed through conversion services, just as for event data objects. Writing conditions objects from Athena involves two steps: firstly recording the payload objects in the TDS and triggering the associated conversion services to write the objects on an output stream, and secondly registering the references to the written objects with associated intervals of validity in the conditions database. These two steps are currently being performed in the same job, but will eventually be doable in separate jobs (see below). The writing can be triggered from an algorithm execute method (i.e. during event processing), and work is in progress to extend this to allow also data to be written during the job finalization phase, once all events have been processed.

The scheme described above was used for the 2004 combined test beam, with payload object reading and writing through NOVA and POOL ROOT streamed file converters. A dedicated converter was also developed for conditions data payloads stored directly in the IOV tables, and used to access DCS and online book-keeping information directly from Athena. The software is currently being upgraded to interface to the COOL conditions database and the POOL object-relational storage service, to support access to conditions data that will begin to be stored in COOL as part of sub-detector commissioning in spring 2005. Mechanisms are being developed that allow individuals to produce calibration data sets and verify them before registering them with the main conditions database and distributing it for production use. This may be done by writing calibration data and reading them back directly without registration with an IOV, as mentioned above. It may also be useful to write calibration data and register IOVs in a temporary developer conditions database, and in a second step copy the data and IOVs to the main conditions database. Further developments will include support for pre-fetching conditions data in bulk operations, e.g. retrieving all the DCS data for a particular run in one operation, buffering it locally and then making particular values available in the TDS event-by-event as appropriate.

Use of Athena in the online HLT environment brings extra constraints from real-time operation, in particular that no conditions database access is allowed during the processing of a particular run. This implies that online processing (level-2 trigger and event filter) will use only constants that do not vary during a run, downloaded before event processing for that run begins. This also requires that critical data that may vary during the course of a run (e.g. DCS parameters or changes in readout conditions due e.g. to dead or noisy channels, the temporary loss of a data-taking partition or even a complete sub-detector) must be communicated to the HLT processors in another way, e.g. via information included in the event stream. However, such data may also be entered in the conditions database, for use in subsequent offline reconstruction processing.

4.4.6 Performance, Scalability and Distribution

Use of the conditions database online for sub-detector and HLT configuration presents considerable performance challenges. At run start, several 100 MB of constants will have to be downloaded to the various sub-detector controllers and processors in tens of seconds, often with multiple copies of the data being sent to several destinations. Similarly, the approximately 2000 HLT nodes will each require around 100 MB of configuration and calibration data, though in this case the data to be sent to each node will be similar. Offline reconstruction presents similar challenges, with Tier-0 and subsequent reconstruction involving 1000s of processors with the additional load from time-varying conditions data within one run.

It is clear that such parallel read performance is beyond the capacity of one database server, and that replication will have to be used to share the load amongst many slave servers. The task is eased by the fact that most writing will be done from a small number of well-defined sources (e.g. configuration updates from the online control system, calibration updates from sub-detector workstations or offline). The main issue is distributing these updates to all the replica servers. One interesting possibility comes from the Frontier project [4-9], developed to distribute data using a web-caching technology, where database queries are translated into http requests for web-page content, which can be cached using conventional web proxy server technology. This is particularly suitable for distributed read-only access, when updates can be forced by flushing the proxy caches, e.g. before a run start transition. Implementing the read part of the COOL API in terms of http requests would then provide a transparent way of implementing a scalable online database for configuration and conditions data.

Conditions data will also have to be distributed worldwide, for subsequent reconstruction passes, user analysis and sub-detector calibration tasks. The LCG 3D project [4-10] is prototyping the necessary techniques, based on conventional database replication, with an architecture of Oracle servers at Tier-0 (CERN) and Tier-1 centres, and MySQL-based replicas of subsets of the data at Tier-2 sites and beyond. The use of the RAL database backend-independent access library by COOL and other database applications will be particularly important here, to enable such cross-platform replication. It is also clear that conditions data updates (e.g. improved calibration constants) will be generated worldwide, and these will have to be brought back to the central CERN-based conditions database repository, for subsequent distribution to all sites that require them. Conditions data stored in files (e.g. POOL ROOT files and histogram data) will also have to be distributed, but, being file-based, this will be handled using the same data-management tools as for event data (see Section 4.6 ).

The performance, scalability and distribution requirements are extremely challenging, and a series of increasingly complex tests is planned to explore the limits of the planned solutions. Scalable performance has been a key design goal of the COOL API, and multiple-client read and write tests have already been performed in up to around 50 simultaneous sessions. Further tests are planned, in the context of commissioning the increasingly complex online computer systems at the ATLAS pit, and in the context of the calibration and alignment scalability tests during computing system commissioning in late 2005 and early 2006.

4.4.7 Detector Description

The general features of the ATLAS detector-description software are described in Section 3.5 . The detector description is implemented as a series of modules, one per sub-detector, each of which reads the appropriate `primary numbers' describing the detector geometry configuration from a database and builds the corresponding geometry classes.

The original version of the detector description used the NOVA MySQL-based database [4-5] to store the primary numbers, with different structures and table rows being accessed for different versions of the geometry, controlled by a number of ad-hoc mechanisms and switches. The primary numbers have recently been migrated to the CERN Oracle database and a system of version control has been introduced, based on the HVS hierarchical versioning system [4-11] developed specifically for this application. Each version of the geometry is tagged with an identifier (e.g. `AtlasInitial', `CTB04'), and this top-level tag identifies versions of data for each individual sub-detector and piece of sub-detector in a recursive tree-like structure, implemented via auxiliary tables in the relational database. The actual primary numbers are stored in payload data tables, and the tag finally identifies the row of each table that is appropriate for the geometry version being built. Both the hierarchical tagging mechanism and the data payload retrieval are implemented using the RAL database-independent access library, allowing the primary numbers to be stored either in Oracle or MySQL and facilitating worldwide replication and distributed access as discussed for conditions data above.

The detector-description database has been used as a test case for exploring database-resident data management and distribution procedures, in consultation with CERN IT application and database support experts. Sub-detector developers first enter new primary number data in a development database instance using a web-based tool, and can read from this database whilst debugging and validating their software. When a new release of the ATLAS detector description is being prepared, the HVS tag management tools are used to create a new top-level tag, and all the necessary information for this tag is copied from the development to the production database, from where it is never deleted. This process is synchronized with ATLAS software releases, ensuring that production software always reads `published' primary numbers from the production detector description database. This latter database can also be replicated from Oracle to MySQL, using tools based on the Octopus database replication software [4-12]. Tools have been developed to package and deploy such MySQL replica database servers e.g. for worldwide simulation and reconstruction in a Grid environment. All these steps have been successfully demonstrated and used for the Monte Carlo data productions performed for the June 2005 Rome physics workshop.

As with other types of configuration data, the detector description version associated with any set of event data must be stored. This is done using a `RunInfo' object in the event stream, which contains the top-level tag used to produce the data, and will also be extended to include the conditions database tags used at each stage of the data processing. Eventually, this object itself may be stored in the conditions database, with an IOV corresponding to the run/event range in question.