3.9 Event Selection and Reconstruction

3.9.2 Reconstruction

The role of reconstruction is to derive from the stored raw data the relatively few particle parameters and auxiliary information necessary for physics analysis: photons, electrons, muons, tau-leptons, K0s, jets, missing transverse energy, primary vertex. Information from all detectors is combined so that the four-momentum reconstruction is optimal for the full momentum range, full rapidity range and any luminosity, and so that particles are identified with the least background, with the understanding that the optimum between efficiency and background rejection can be analysis-dependent.

A typical reconstruction algorithm takes one or more collections as input, calls a set of modular tools, and outputs typically one collection of reconstructed objects. Common tools are shared between tracking detectors on one side (inner detector and muon spectrometer) and calorimeters on the other side (liquid argon electromagnetic calorimeter, hadronic endcap and forward calorimeter, and tile hadronic detector). Reconstruction tools can share interfaces, for example for different types of calorimeter cluster corrections, or track extrapolation. Abstract interfaces are used to reduce dependencies.

A rich set of algorithms is available. A number of algorithms were originally developed in FORTRAN for detector optimization, as documented in the ATLAS Physics TDR in 1999 [3-19]. They were then migrated to C++ and into the Athena framework, while new algorithms were developed. In most cases, two separate algorithms are available which allow for in-depth performance comparisons and cross-validation.

The event generator truth is used for optimizing and validating reconstruction algorithms: however, the policy is to do this in separate algorithms.

3.9.2.1 Tracking System Reconstruction

The tracking system reconstruction chain is summarized in Figure 3-7.

Figure 3-7 Tracking reconstruction chain. The boxes on the top represent data objects, whilst the boxes on the bottom show the algorithms which work on them. The arrows show the direction of data flow.

A basic requirement for the tracking EDM is to support different tracking devices with shared code, e.g. the muon chambers and drift tubes, the inner-detector transition radiation tubes and silicon detectors must all be provided for by common tracking software. The primary outcome of this requirement is a common track class, but, furthermore, the EDM needs standard definitions of:

Track parameters (on all the various surfaces found along the track);
Interfaces to hit-clusters, drift circles, etc.

Tracking must handle many different coordinate frames, as a track can span the entire detector and have measurements on many different surfaces (i.e. discs, planes, cylinders, and so on). However, the various tracking tools and algorithms should not be expected to handle the geometry of the detector. Generalized tools allow tracking to work on both the Inner Detector and the Muon Spectrometer tracks. This can best be explained with the aid of the picture above, which shows an overview of the tracking reconstruction chain.

Byte-stream converters take the data from the detector, and form the raw data objects. These are then used to create "prepared raw data" (PrepRawData), i.e. clusters from the pixel detector or drift circles from the muon monitored drift tubes.

The PrepRawData (along with the SpacePoints) can then be used to find tracks. Finally, the tracks can be used to find vertices, and to create the TrackParticles (for physics analysis at the AOD level).

Clusters are searched for in the silicon tracker. Then tracks are searched for with two independent pattern-recognition algorithms, sharing a number of common tools. Silicon tracks are extrapolated and validated in the straw tracker. A dedicated algorithm examines the tracks found, and in case of duplication, keeps the one with the highest number of hits. A primary vertex is computed, and a set of track parameters extrapolated to the primary vertex is prepared. Muon track segments in the Muon System are found from a combinatorial search of the single-station track segments, followed by a fit using the single clusters. The tracking, performed in the highly inhomogeneous field, takes into account multiple scattering in the material of the apparatus.

3.9.2.2 Calorimeter Reconstruction

The two types of calorimeter have different data formats at the raw data level. However, for the reconstruction EDM, one common calibrated input object is used, CaloCell. CaloCells can be generated either from the raw data or simulation. For example, Figure 3-8, which is a schematic representation of the calorimeter reconstruction chain, shows the raw data being fed to CellMaker algorithms, which produce CaloCells. After this reconstruction step the calorimeters use a common EDM. In particular, all calorimeter data classes inherit from a four-momentum interface; this allows the use of common tools that require only kinematic information.

Figure 3-8 Schematic diagram of calorimeter reconstruction. The top line contains the data objects, whilst the bottom line shows the algorithms used to process them. Data flows from left to right.

Neighbouring CaloCells are used (by CaloTowerMaker) to produce calorimeter "towers", then these towers (as well as cells) are taken (by CaloClusterMaker) to construct "clusters", collections of calorimeter elements, which can even contain clusters themselves. A navigation scheme allows access to constituent data objects; for example, it is possible to retrieve all the CaloCells used to create an EnergyCluster.

Cell energy is already available at the electromagnetic scale in the raw data. Refinement with calibration parameters is done as a first step of the calorimeter reconstruction.

Electromagnetic clusters can be reconstructed using different methods.

The sliding window algorithm searches for the window where the total energy is maximum. The window can be adjusted to different sizes, so that it can be optimized for different particles/energies. They are then corrected for different modulation effects, and longitudinal weights are computed to further optimize resolution and linearity.
The topological clustering algorithm attempts to aggregate neighbouring cells with signal above threshold over the complete calorimetry. The algorithm selects individual cells with an energy in excess of four sigma of the expected noise. Then neighbouring cells with energies in excess of two sigma are added iteratively. Finally a guard ring of cells without energy requirements is added. Proto-clusters obtained in this way are then split to separate local energy maxima. The intent is to explore the use of topological clustering, possibly with different tuning, for electron identification, jet reconstruction and missing E_t measurement.

3.9.2.3 Combined Reconstruction

The combined-reconstruction step combines information from the different detectors in an optimal way. The output EDM is designed to support a wealth of tagging variables from different algorithms.

Photon/Electron Identification

Electron reconstruction is performed in two ways. High-p_T electrons are searched for by associating tracks to sliding-window clusters, and computing shower-shape variables, track-to-cluster association variables, and TR hits variables. Dedicated track-fitting procedures for electrons are being developed. High-p_T photons are identified in a similar way, with the main difference being that a track veto is performed, except for reconstructed conversions.

Soft-electron reconstruction proceeds by extrapolating a charged track to the calorimeter, and building a cluster around the charged-track impact point. This procedure has a better efficiency for electrons with p_T less than 10 GeV, and for electrons inside jets, which is pertinent for b-tagging.

Muon Identification

Muon measurement and identification is optimized according to the p_T regimes.

High-p_T muons (>100 GeV) are measured by extrapolating the muon-spectrometer track parameters in the muon spectrometer inward through the calorimeters and inner tracker to the interaction point. Combination with the optimum inner-detector track may also be done. Methods to do this are being investigated. Such combination can be particularly effective where there exist acceptance gaps in the spectrometer, such as near η = 1. The extrapolation of the muon trajectory to the inner-tracker track allows computation of the energy loss through the intervening material. Energy-loss parametrizations can be applied to correct the track momenta, as determined at the muon-spectrometer entrance, to the final-state muon momenta at the interaction point. Furthermore, direct measurement of catastrophic energy loss (important at high p_T ) can be used to correct the muon momentum.

For muons in the 6-100 GeV p_T range, momentum determination is performed by both systems. The muon spectrometer provides a flag that uniquely identifies the muon. For momenta below 30 GeV, the measurement resolution derives mostly from the inner tracker as the muon-spectrometer resolution is dominated by multiple Coulomb scatters.

For p_T between 3 and 6 GeV, muons lose a large fraction or most of their energy in the calorimeters, and do not cross the full muon spectrometer and therefore cannot be reconstructed there. In this case, muon tracks are found in the inner detector and extrapolated to hit segments in the spectrometer. Algorithms that extrapolate inner tracks and associate them with a minimal signal in the Inner muon station (e.g. a solitary segment in a multi-layer) are being developed. Muon identification at low p_T can also be enhanced via signatures in the tile calorimeter. This is being investigated.

Tau Identification

Taus are identified in a similar way to electrons. The preliminary clustering is done with a sliding-window algorithm applied on all calorimeters. A tau appears as a very narrow jet in the calorimeter, associated to a small number of charged tracks.

The tau reconstruction can be seeded by a calorimeter cluster or by a charged track depending on the p_T range of interest.

Tau identification is based on calorimeter quantities such as the electromagnetic radius, the isolation in calorimeters, the width in the strips and on quantities given by the tracker such as the number of associated tracks, the charge and the impact parameter. Likelihood and multi-variate analysis techniques are used to discriminate taus from normal jets.

Taus are calibrated using the same cell-weighting scheme as jets.

Jet Reconstruction

Jets can be reconstructed from detector signals, and for Monte Carlo data, from the generated particles. The algorithms available are the seeded and the seedless cone and the k_t algorithm. The cone algorithms have native implementations in ATLAS software, following the guidelines given in See G. Blazey et al., Run II Jet Physics, hep-ex/0005012v2 (2000). The k_t implementation is provided in an external package [3-31], which is wrapped by a specific tool creating the Jet objects of the ATLAS EDM. In the implementation there is only one jet algorithm skeleton, which can be configured externally as a sequence of tools to implement a given jet-finder strategy. This algorithm and most of the tools are designed such that they are not dependent on any specific feature of the input data objects, thus allowing their use in exactly the same way for different inputs. The only requirements on the input objects are that they implement the general four-vector and navigation interfaces.

Calorimeter Jets

The calorimeter system is the principal detector for jet reconstruction. The typically large number of CaloCell objects in an event prohibits using these directly as input to the jet finding, especially in the case of the k_t algorithm. The input multiplicity to the jet finding can be reduced by the calorimeter reconstruction, where cells are grouped into CaloTower and CaloCluster objects. The CaloTower objects represent a tower of cells on a fixed grid in pseudo-rapidity and azimuth, typically with a bin size of for input to the jet finding. CaloCluster objects, on the other hand, represent groups of cells with correlated signals with their location depending only on the cell signals and locations. Both CaloTower and CaloCluster implement the four-momentum and navigation interfaces, as required by the jet algorithms.

All jet algorithms combine the input object into a Jet object following their specific strategies. The total jet kinematics is represented by a four-vector, which is updated when constituents are added or removed. This four-momentum recombination requires all constituents to have meaningful four-vectors themselves, especially a positive signal amplitude (energy). On the other hand, CaloTower objects can have negative signals, indicating a major noise contribution from the cells in this tower. These negative-signal towers are combined with neighbouring towers until the newly created combined tower has a small positive signal, thus cancelling the negative signals before applying the actual jet finder.

Calorimeter jets can be calibrated in various ways. The standard calibration for jets from towers is based on a cell-signal weighting scheme, where weights are applied to the signal contribution from each cell. These weights have been computed such that the response to jets is flat over a large energy range, and using the constraint of an optimized energy resolution. Other approaches apply weights to calorimeter-sampling layer sums in jets, for example.

Truth Jets

Jet finding using the generated particles requires the retrieval of these particles from the truth event associated with the reconstructed event in simulations. The jet reconstruction provides a special tool for this task. After the extraction, all tools used for jet finding in the detector can be used in exactly the same way. In particular, all pre-clustering and pre-sorting, as required by the k_t algorithm, for example, is done using identical software.

Missing E_t Reconstruction

Missing E_t is reconstructed from the energy deposed in all calorimeter cells and from the reconstructed muons. A correction is applied for the energy lost in the cryostat between the electromagnetic and hadronic calorimeters.

The calorimeter cell energy is weighted using the same H1-style weights, depending on cell energy density (E/V) and on the calorimeter region, used for jets. For muons the reconstructed energy from the muon chambers only is used, to avoid double energy-counting in the calorimeters. The correction for the energy lost in the cryostat is calculated from the energy deposited in the cryostat by jets.

To suppress the effect of noise in calorimeters, a cell energy threshold in terms of number of sigma noise is applied.

Missing E_t can alternatively be reconstructed from the energy measured in the topologically clustered calorimeter cells. In this case the noise suppression is given by the thresholds applied in the topological clustering reconstruction.

3.9.2.4 Reconstruction Performance and Prospects

The ATLAS Computing Model described in Chapter 2 assumes that reconstruction (creation of the ESD) requires approximately 15 kSI2k-sec per event processing time. In the absence of pile-up, the current measurement is approx. 22 kSI2k-sec. However, not all of the algorithms have been optimized, and in several cases multiple versions of some algorithms are run (e.g. tracking, calorimetry cluster finding, jet reconstruction), so it is expected that the design goal can be achieved. The situation in the presence of pile-up is worse, but again is expected to be amenable to significant optimization prior to deployment.

Continued development of reconstruction algorithms prior to data taking focuses on the following aspects:

optimize performance;
render algorithms robust with respect to the real data-taking conditions: varying calibration and alignments, noisy/dead channels, etc. With this goal, reconstruction algorithms have been adapted to run on combined test-beam data taken in 2004 (analysis is under way). Commissioning data will also be used as soon as it is available (autumn 2005). Also the simulation is made more and more realistic by allowing deterioration as expected in the data;
adapt algorithms to work in the HLT context;
continue validation and tuning of existing algorithms for varying conditions (in particular low-luminosity pile-up) and for different analysis;
development of new algorithms to extract the maximum amount of information from the data; for example low-p_T particle identification, multiple interaction in the tracker.

3.9.3 Analysis Preparation

3.9.3.1 AOD Building

The analysis preparation consists of the production of AOD and the collection of interesting events for analysis.

The Event Summary Data (ESD) contains the persistifiable output of the combined reconstruction described above. The Analysis Object Data (AOD) is produced from the ESD by using very loose selection criteria applied on objects such as the reconstructed photons or electrons. In addition new AOD-specific objects, e.g. the JetTag, are created. The objects in the AOD can be redundant, can overlap and can be ambiguous. Some of these overlaps are removed during the AOD making, but not all, as the details of the removal may be physics-analysis-dependent.

The event collections are pointers to events in persistent storage, along with event-level metadata (the tags) used for event selection. The collections are arbitrary: they may span many streams, and a given event may appear in different collections. Physics groups and users may extract copies of their interesting events by querying the tag database and filling their dedicated AOD samples. They can also build new collections (probably ROOT-based) containing only the results of the selection.

For fast simulation, the tools exist to produce the AOD using as input generated events, simulated events, digitized events or the ESD, since each of these data formats contains the full record of the event generation, used as input to the standard fast simulation.

Figure 3-9 shows a generic analysis object. Since it represents a physical object, it inherits from a four-momentum, navigation and a basic particle class, where the navigation interface allows (in the same manner as the calorimeter objects) navigation between constituent objects. Pointers to ESD objects are saved, which allow direct navigation to detailed information in ESD if deemed necessary.

Tools which require only kinematic information will just use the four-momentum interface, whilst other analyses might need more detailed information. In any case, the use of common interfaces dramatically simplifies the analysis code.

Electrons, Photons and Muons in AOD

The electrons and photons selected for the AOD are required to have a hadronic-energy fraction of less than 20%. Electrons and photons in the AOD can never overlap because of the requirement of a matching track for the electron, and of no matching track for the photon. The overlaps between the collections of high-p_T and low-p_T combined-reconstruction electrons and muons are also removed when making the AOD; when a low-p_T candidate shares a track with a high-p_T one, the high-p_T candidate is kept. The development of the tools needed to deal with the redundancies, overlaps, and ambiguities in the AOD is discussed in Section 3.10 .

Taus in AOD

The tauObjects reconstructed by the tauRec package and found in the ESD are converted 1-to-1 into the AOD TauJet container. The TauJet object keeps only the most important variables for refining pre-selection (likelihood, number of tracks, hadronic, and electromagnetic energy etc.).

Jets in AOD

Jet objects within the ESD are converted 1-to-1 into the AOD ParticleJets. This is done for cone jets of R = 0.7 and R = 0.4 as well as k_t jets. In addition, some calorimeter information is computed from the ESD jets and stored in the AOD. The ESD jets are converted into the JetTag objects described in the next section. In the future, the two classes will merge to form a single jet class for the AOD.

b-Tagging

Identification of jets containing decay products from bottom-flavoured hadrons, or b-tagging , requires jets with tracks. This implies that the calorimeter jets cannot be used directly. A new jet-object has to be constructed using track objects as well as calorimetric information. The b-tagging output object, the JetTag, is stored in the AOD. Users have the possibility to re-run the tagging on AOD without having to navigate back to information stored in the ESD files.

The relatively long lifetimes of b-hadrons can give rise to displaced vertices. The "secondary" vertices can be tagged by examining the impact parameters of the tracks in the jet. B-jets have a characteristic long, positive tail in the distribution of impact parameters; for the "light" jets (from the light quarks) one expects a symmetrical distribution. Another method is to explicitly reconstruct the secondary vertex using vertex-finding algorithms. If there are two or more tracks in the jet with a significant impact parameter, a secondary vertex can be searched for exclusively. Properties of the secondary vertex, for example, the fraction of the jet energy and the reconstructed vertex mass, can be used to discriminate b-jets from "light" jets. In addition, "soft" leptons (from semi-leptonic decays of B's) can provide a limited but valuable complement to the space tagging (see above for "soft" electron and "soft" muon identification). The results of all methods can be combined into one single discriminating variable in different ways, using different test-statistics and combination methods.

3.9.3.2 Size of the ESD and AOD files

One critical issue is the actual file size of the EDM objects written to disk. The goals for the size of ESD (500 kB/event), AOD (100 kB/event) and data tags (1 kB/event) have been established based on experience from earlier n-tuple-based reconstruction data, and the consideration of the cost of storage. The size and content of the ESD/AOD is currently evolving quite rapidly due to the increasing knowledge of what is actually needed for analysis and the increasingly efficient storage of information. In addition, the size is heavily dependent on the physics process and LHC luminosity. Finally, events produced from simulated datasets have an additional component, the Monte Carlo truth, which allows a detailed comparison to be made between the results of reconstruction and the original event. This additional information significantly increases the event size, but is crucial in arriving at a detailed understanding of the performance of the reconstruction software.

At the time of writing, the 500 kB target size for the ESD has not been reached, with the current size being approximately 1.2 MB/event. However, the number of objects in the ESD has intentionally been chosen to be as inclusive as possible so as to not inhibit any physics analysis. Feedback from the physics community based on operational experience and further work on storage techniques are expected to allow the size to be reduced to meet the target before data taking starts.

Figure 3-10 Simplified ESD and AOD content. The solid lines indicate direct navigation possibilities. The dotted lines indicate duplication of objects.

The detailed content of AOD and ESD is shown in Figure 3-10.