3.8 Simulation

3.8.1 The Simulation Data Flow

Figure 3-5 shows a simplified view of the processing stages in the simulation data flow. Input for simulation comes from event generators after a particle filtering stage. Data objects representing Monte Carlo truth information from the generators are read by simulation and processed. Hits produced by the simulation can be directly processed by the digitization algorithm and transformed into Raw Data Objects (RDOs). Alternatively they can be sent first to the pile-up algorithm and then passed to the digitization stage.

Figure 3-5 The simulation data flow. Rectangles represent processing stages and rounded rectangles represent objects within the event data model. Pile-up and ROD emulation are optional processing stages.

RDOs produced by the simulation data-flow pipeline are used directly by the reconstruction processing pipeline described in Section 3.9 . Thus the simulation and reconstruction pipelines are coupled together by the RDOs which act as the output from the simulation pipeline and the input to the reconstruction pipeline. However, the ATLAS TDAQ produces byte-stream files and, in order to reproduce this, an optional final stage can be added to the simulation processing chain in order to generate these files from the RDO files. In this case, the initial stage in the reconstruction pipeline first converts the byte-stream information into RDO objects which are then used for subsequent reconstruction processing.

Any Monte Carlo truth information is removed in the conversion from RDOs to byte-stream format such that byte-stream files produced by the simulation pipeline are an accurate representation of the byte-stream files coming from the ATLAS TDAQ. In the context of the High-Level Trigger (HLT), the conversion from byte-stream representation to RDOs is in some cases replaced by a direct conversion to objects (PrepRawData) otherwise created by the next stage in the reconstruction pipeline, but this is a performance optimization only.

The stages in the simulation data-flow pipeline are described in more detail in the following sections. In addition to the full simulation framework, ATLAS has implemented a fast simulation framework that substantially reduces the processing requirements in order to allow larger samples of events to be processed rapidly, albeit with reduced precision. Both these frameworks are described below.

3.8.2 Generators

Event generators are indispensable as tools for the modelling of the complex physics processes that lead to the production of hundreds of particles per event at LHC energies. Generators are used to set detector requirements, to formulate analysis strategies, or to calculate acceptance corrections. They also illustrate uncertainties in the physics modelling.

Generators model the physics of hard processes, initial- and final-state radiation, multiple interactions and beam remnants, hadronization and decays, and how these pieces come together.

The individual generators are run from inside Athena and their output is converted into a common format by mapping into HepMC. A container of these is placed into the transient event store under StoreGate and can be made persistent. The event is presented for downstream use by simulation, for example by G4ATLAS simulation (using Geant4) or the Atlfast simulation. These downstream clients are thereby shielded from the inner details of the various event generators.

Each available generator has separate documentation describing its use. Simple Filtering Algorithms are provided, as well as an example of how to access the events and histogram the data.

The current list of supported Generators includes Herwig, Pythia, Isajet, Hijing, AcerMC, CompHep, AlpGen, Tauola, Photos, Phojet and ParticleGenerator. Some utility classes to enable filtering of events and facilitate handling of Monte Carlo Truth are also provided.

The code is organized into the following packages:

GeneratorModules contains the base classes from which the specific Generators inherit.
Pythia_i contains the code for the Pythia interface and the Algorithm to load Pythia.
Herwig_i contains the code for the Herwig interface and the Algorithm to load Herwig.
Isajet_i contains the code for the Isajet interface and the Algorithm to load Isajet.
Hijing_i contains the code for the Hijing interface and the Algorithm to load Hijing.
Tauola_i contains the code for the Tauola interface and the Algorithm to load Tauola.
Photos_i contains the code for the Photos interface and the Algorithm to load Photos.
AlpGen_i contains the code for the AlpGen interface and the Algorithm to load AlpGen.
Phojet_i contains the code for the Phojet interface and the Algorithm to load Phojet.
ParticleGenerator contains the code for the ParticleGenerator interface and the Algorithm to load ParticleGenerator.
CompHep_i contains the code for the CompHep interface and the Algorithm to load it.
AcerMC_i contains the code for the AcerMC interface and the Algorithm to load it.
MCNLO_i contains the code for the MCNLO interface.
GeneratorUtils contains some utility routines.
GeneratorFilters contains some examples of how to filter events.
GenzModule provides the ability to read events made by the G3 Simulation and pass them into Athena in a uniform manner; for example, so they can be used by Atlfast.
GeneratorObjectsRoot is a package that outputs and inputs the events in Root I/O format.
GeneratorObjects sets up the containers which will hold the events in a collection of HepMC events. McEvent also sets up the serializers for ROOT.
McEventSelector is responsible for assigning run numbers and providing the hooks to Gaudi-Interfaces, it uses EventAthena.
GenAnalysisTools is a set of algorithms for Generator analysis, it is used mainly by reconstruction. It contains three packages:

CBNT_Truth, used for the truth part of the CBNT combined n-tuple.
TruthExamples. This has examples to show histogramming and listing of generated events.
TruthHelper, containing helper classes for extracting stable particles, for example.

PythiaB, the specific version of Pythia used by the B-physics group.

Only the interface code is held in the ATLAS CVS repository. The code supplied by the external authors is built and maintained either by the LCG Genser project [3-23] or by ATLAS members in an external area. The long-term goal is to have it all done by Genser. The HepMC package was an ATLAS product that is now being maintained by CLHEP.

Most of the event generators are in FORTRAN; the ATLAS interfaces are in C++. The migration to C++ has been slow, and currently Sherpa is the only fully functional C++ generator. It has been tested and evaluated by ATLAS in the production for the Rome Physics Workshop. It is run as a stand-alone product that writes data files that are imported to Athena using an interface algorithm; this is currently being re-evaluated.

Two other C++ generators, Herwig++ and Pythia7, are not yet fully functional. Herwig++ cannot yet be used for processes with hadrons in the initial state and Pythia7 lacks the functionality of the FORTRAN version. Given the time needed for development, deployment, testing and validation of a new generator, it is clear that FORTRAN support will be required for some considerable time after LHC data is available.

3.8.3 Fast Simulation (Atlfast)

3.8.3.1 Overview

The ATLAS fast simulation program (Atlfast) simulates ATLAS physics events, including the effects due to detector response and the software reconstruction chain. The input to the program is the collection of four-vectors for a physics event, usually provided by a physics event generator. Atlfast examines the event record for stable particles.

Four-vectors corresponding to electrons, photons and muons are passed to the appropriate smearing function, and the resulting four-vectors are output for use by downstream physics analysis. The calorimeter response to the event is calculated by summing the transverse energy deposits of all particles.

Smearing and jet finding algorithms are applied to the energy deposits, and the resulting jet objects are output for further physics analysis.

Other quantities calculated by Atlfast are track helix parameters and global event quantities such as total E_t and missing momentum.

3.8.3.2 Current Status

The current version of Atlfast is an adaptation of the original stand-alone FORTRAN program [3-24]. It has been rewritten in C++, and the structure has been heavily modified to adapt to run within the Athena framework. The physics results are found to be extremely close to the original FORTRAN version.

3.8.3.3 Goals for the Eventual Turn-On System

A new development is work on a comparator to compare the fast simulation to the full simulation. The idea is to run the full simulation and reconstruction, fit various distributions, and feed the information back into Atlfast, primarily through the smearing functions. The process is to be iterated until an acceptable level of agreement has been obtained. An important aspect of this work will be the documentation and control of the potentially many parameter sets that will become available to Atlfast. Some development on Atlfast itself will be necessary to provide the infrastructure of passing the parameter sets to Atlfast, particularly as many of the original parameter sets are at present explicitly in the code.

Further work is expected on the FastShower library that simulates energy deposits in the towers of the ATLAS calorimeters, where the modelling of the deposition process includes two compartments in depth (electromagnetic and hadronic), and transverse shower spread. The correlation of energy deposits among neighbouring towers is included. FastShower is already included in Atlfast, but further effort to validate the calculations, and to deal with calibration issues, is required. The energy flow algorithms require knowledge of the expected total calorimeter energy deposit probability distributions as a function of particle type ( e , π, μ) and of kinematic properties φ, η, and p_T . It is vitally important that this information be at the electromagnetic scale, to avoid any dependence on the hadronic calibration scheme. Further validation and refinement will be obtained using the full simulation comparator.

3.8.4 ATLAS Geant4 Simulation (G4ATLAS)

The ATLAS detector simulation programs have been heavily based on the Geant3 simulation package and infrastructure since the inception of the experiment. These programs were used in the preparation for the ATLAS Letter of Intent [3-25], in detector optimization studies, for the various sub-detector Technical Design Reports, and they were stress-tested in Phase 1 (i.e. the event-generation and simulation phase) of Data Challenge 1 (DC1), which was run during summer 2002. Geant3 has been a powerful, reliable and successful tool within ATLAS for about ten years.

With the development and implementation of the Geant4 (G4) toolkit [3-11][3-26], starting from the year 2000, ATLAS prepared for moving its simulation suite to the object-oriented (OO) paradigm. Geant3 and Geant4 were run alongside each other for a while in order to validate the new suite against the previous one. The switch-over [3-27] finally happened in 2003, in the early preparation phase of the second Data Challenge (DC2). Since then, Geant4 has become the main simulation engine of ATLAS, and all new developments have been carried out in the new environment. Geant3 support within ATLAS is being phased out during 2005.

The Geant4 toolkit provides both a framework and the necessary functionality for running detector simulation in particle physics and other applications. The functionalities provided include optimized solutions for geometry description and navigation through the geometry, the propagation of particles through detectors, the description of materials, the modelling of physics processes (e.g. a huge effort has been invested in recent years into the development and improvement of hadronic-physics models), visualization, and many more. A basic concept is that of Sensitive Detectors, which allow for the definition of active detector elements, perform corresponding actions within them, and write out hits (which may carry information like position, energy deposit, identifier of the active element, etc.). Geant4 is part of the common LCG application software project, and its development is pursued as a world-wide effort, coordinated by a strong development team from CERN.

Development activities to make use of Geant4 functionality within the ATLAS-specific set-up and software environment started in 2000, taking into account ATLAS-specific requirements. These provide tailored packages for handling geometry, kinematics, materials, physics, fields, sensitive detectors, run-specific issues and visualization, etc. These activities culminated in 2003 with the Geant4 simulation being embedded in Athena. This migration to Athena was also done for the detector simulation packages which had been developed in detail in the stand-alone environment.

In general, Geant4-based detector simulation programs (G4ATLAS) are based on criteria like dynamic loading and action-on-demand, and all user-requested functionality has been added by means of plug-in modules. Since 2003, extended common functionality and new developments have been implemented only in the Athena-based version; examples are updates on physics processes (e.g. transition radiation process for TRT simulation), the implementation of Monte-Carlo truth, simulation of the ATLAS combined test-beam set-up, the usage of POOL to write persistent output, etc. A particularly important new feature is the building of the Geant4 geometry tree from the one implemented in the ATLAS Detector Description package GeoModel (described in Section 3.5 ) using the Geo2G4 conversion package. This procedure has the clear advantage of avoiding duplication of efforts and extra work involved in maintaining and synchronizing two different detector geometry versions, one for the simulation, the other for reconstruction.

A concise, but more detailed overview on the status and functionality of G4ATLAS can be found in See A. Rimoldi et al., The Simulation of the ATLAS Experiment: Present Status and Outlook, ATLAS Internal Note, ATL-SOFT-2004-004 (2004).

Since 2001 a rather extensive physics validation programme has been under way to test the physics models implemented in Geant4, to ensure, through comparison with test-beam results where available, that Geant4 simulation meets the expected precision targets, and to provide feedback to the Geant4 development team. In almost all cases comparison with experimental data from beam tests gives very good agreement, normally at the level of ~1% or better in predictive power.

In addition to the physics performance, the validation and optimization of the G4ATLAS run-time performance, in particular of its CPU and memory requirements, has also been given high priority. Intensive validation tests started in autumn 2003, soon after a fully-fledged G4ATLAS simulation had become available within Athena. This effort produced a stable, robust and high-performance version, which was then run in DC2 simulation half a year later. In DC2, which was the first data challenge entirely based on Geant4, more than 12 million full physics events in more than 100k jobs were successfully simulated over four months in a world-wide, distributed way, with only one reported crash due to Geant4. For example, at NorduGrid a total sample of more than 3.5 million events (including 1 million full events) was processed in about 35k jobs without a single reported failure. The validation programme has continued after DC2; a summary of the activities between autumn 2003 and the end of 2004 can be found in See D. Costanzo et al., Validation of the Geant4-Based Full Simulation Program for the ATLAS Detector: An Overview of Performance and Robustness, ATLAS Internal Note, ATL-SOFT-PUB-2005-002 (2005).

The G4ATLAS developers team is constantly working on improvements to the package and on extensions to its functionality, taking into account user requirements. Recent examples are the "pythonization" of the G4ATLAS user interface, which replaces the Geant4 user interface based on macro files. The power of Python, its flexibility, and the possibility of running and configuring a job interactively are clear advantages, in particular when many different configurations have to be handled, as in combined test-beam simulation. Another recent example is the implementation of a generic utility to enable and facilitate the usage of parametrized detector responses, e.g. for the liquid argon calorimeter, which can be used to speed up simulation jobs.

3.8.4.1 Simulation Performance and Prospects

The ATLAS Computing Model assumes that simulation requires approximately 100 kSI2k-sec per event processing time. Recent measurements on a representative set of physics events range between about 250 kSI2k-sec per event for minimum-bias events, to 850 kSI2k-sec per event for SUSY/SUGRA events. However, several optimizations are under active development:

The parametrization of the detector response, particularly for the calorimeters. The impact of this on the physics performance will need significant study and is expected to be selectively applied only to particular physics channels or pseudorapidity regions.
A re-evaluation of the Geant4 cut optimization. A more detailed study of the present cuts indicate that they are too tight, and can be significantly loosened without significantly impacting the physics performance.
The application of more flexible pseudorapidity acceptance cuts, depending on the event type, since not all physics studies need the full detector coverage. The present performance numbers correspond to the full detector acceptance (pseudorapidity cut |η| < 6) for all physics channels.
Continued optimization of the simulation code.

These optimization developments are expected to significantly improve the CPU performance such that the performance goal will be met.

3.8.5 Pile-up

G4ATLAS produces hits as output, which are a record of the real interactions of particles in the detector. At higher machine luminosities, however, multiple interactions can occur at each beam crossing (typically one signal event with multiple minimum-bias background events), and in addition other backgrounds (e.g. cavern background) need to be taken into account. As seen in Figure 3-5, pile-up (i.e. the overlaying of signal and background events) is an optional processing stage in the simulation processing pipeline.

For DC2, where pile-up at a luminosity of 10³⁴ was processed, over 800 background events were overlaid over one event of the original physics stream. The main requirement was that the digitization algorithms should run unchanged. Many optimizations of data structures and access patterns were necessary in order to allow pile-up to run on a standard processing node, as original memory consumption was excessive (as high as 3 GB). Memory requirements for DC2 at a luminosity of 10³⁴ (which excluded cavern background) were less than 1 GB. The inclusion of cavern background and changes in the timing window (to provide a more realistic description for the muon spectrometer) for the Rome Physics Workshop have increased the memory requirement to 1 GB at a luminosity of 2 x 10³³ , so further development will be needed in order to keep below this memory threshold at higher luminosities.

The Athena-based pile-up application manages multiple input streams. Random permutations of events are selected from a circular buffer of minimum-bias events. Since the various sub-detectors have different data integration times, they require individual cache retention policies. By using a two-dimensional detector and time-dependent event caching policy, memory utilization has been significantly reduced.

Pile-up is an excellent mechanism to stress-test the architecture. Small problems that would normally pass unnoticed may get enormously magnified and become visible far sooner. It is also an excellent tool to expose memory leaks, as they might become magnified by several orders of magnitude (depending on the luminosity).

3.8.6 Digitization

The hits produced either directly by G4ATLAS, or from the merging of pile-up events, need to be translated into the output actually produced by the ATLAS detectors. The propagation of charges (as in the tracking detectors and the liquid argon calorimeter) or light (as in the case of tile calorimeter) into the active media has to be considered as well as the response of the readout electronics. Unlike the previous steps in the simulation chain, this is a very detector-specific task, and the expertise of people building and testing each of the sub-detectors is essential. The final output of the digitization step are Raw Data Objects (RDOs) that should resemble the real detector data. In addition to RDOs, Simulation Data Objects (SDOs) are created to save some simulation information that may be useful to the downstream user. The navigation between SDOs and RDOs is achieved by using identifiers1, and SDOs are otherwise completely decoupled from RDOs to avoid any dependency on simulation that will not be there when real data is reconstructed.

To implement the modular organization of digitization, a package is created for each of the detector subsystems and a single point of contact is available for each package. Design and operating conditions (like magnetic field or voltage) of the detectors are set using job-option parameters or taken from the condition or detector description database. Digitization operates locally at the level of each sub-detector (e.g. a pixel module or a calorimeter cell) and the same code can be used in the context of the full ATLAS simulation, or a test beam or any other test. It is of key importance that digitization is tuned by comparing the RDO output to real data in system tests to produce a realistic tuning of the detector response.

A package is provided to put together and coordinate the sub-detector efforts. Only Python code is available in this digitization package to steer the code from the detectors. Detector flags and global flags are used according to the ATLAS policy to switch on/off detectors and tasks.

The digitization package was successfully used for DC2 and for subsequent physics productions as a separate step to be run after G4ATLAS. Although digitization was used only on Geant4 simulated events, there is no dependency between the ATLAS digitization package and Geant4, with the decoupling done at the Event Data Model (EDM) level by not having a dependency between the hits and Geant4. An automated test for digitization is also maintained in the nightly builds, and is used to spot problems that may arise in any of the sub-detector packages to stimulate rapid corrective action.

1. Identifiers are described in Section 3.5.3