Computing Technical Design Report
3.10 Physics Analysis Tools
The Reconstruction Task Force (RTF) examined the architectural aspects of the ATLAS reconstruction software, and arrived at a set of suggestions/proposals which have come to be known as the RTF recommendations [3-32]. The RTF covered the preparation of data for physics analysis, but its mandate did not extend into physics analysis in detail. The scope of Physics Analysis Tools (PAT) is to span the gap between the combined reconstruction and the analysis on n-tuples. The objectives of PAT include:
-
Development, within the Athena framework, of an environment where reconstruction tools are available, while at the same time taking advantage of n-tuple analysis tools such ROOT [3-7], PAW [3-33], JAS [3-34].
-
Support for both batch and interactive analyses.
The activities carried out within the PAT group are not about ROOT, PAW, JAS, etc. as stand-alone analysis tools, nor about distributed physics analysis. Rather, the idea is to propose a unified, baseline framework for analysis, to explore and propose various options for interactive analysis, and to interact with combined performance and physics groups so as to develop tools satisfying user requirements. The unified, baseline framework for analysis consists of the following:
-
Common classes in the analysis domain, such as the AOD (Analysis Object Data).
-
Common tools to build these objects, such the AOD builder algorithms.
-
General and common tools for analysis.
-
Navigation and association tools.
-
Tools for overlap checking, redundancy removal, and ambiguity resolution.
-
Tools for event views.
-
AOD streaming, event tag and event collection tools.
-
Tools for interactive analysis and event display
-
Documentation.
A dedicated workshop was held in April 2004 to make a first attempt at a baseline, unified, common framework for analysis, thus extending the work of the RTF. A detailed summary of the workshop can be found at reference [3-35]. A follow-up workshop was held in May 2005.
3.10.1 Current Status
Baseline implementations of all the AOD classes exist. The contents of the ESD, AOD and event tags follow the recommendations of the AOD/ESD definition task force [3-36] and of the combined performance groups. The PAT group provides a set of tools useful in the analysis environment. They consist of the following tools:
-
Combination, permutation with or without selection criteria - for example, making jet-jet combinations selecting only the combinations that pass pre-defined criteria.
-
Sorting - to sort any user collection of objects.
-
Filtering according to different criteria - for example filtering the MC event collection to search for a particular decay pattern.
-
Constituent navigation. The original motivation for object navigation came out of jet reconstruction. Jet constituents are of generic type, their concrete type is not exposed to the jet itself; clients need to retrieve objects of specific concrete type at any node of the relational tree behind a jet, thus a navigation system is needed. Constituent objects in the tree can be composites themselves and thus navigation must be possible to any given level in the tree. Constituent objects can contribute their kinematics with a weight to the composite object and thus the weights must be retrievable and propagated correctly.
-
Back navigation. Not all the objects that the user might need at the analysis stage are available in the AOD. When the requested object is not found in the AOD, the process which searches for the object in ESD or even in the raw data is known as back navigation.
-
Composite particles. For example, the Z-boson as a composite of an electron-positron pair with all the constituent navigation features from the Z-boson to the calorimeter clusters or cells of the electrons.
-
Association tools, non-constituent associations. For example, a muon can be associated with a jet without belonging to the jet. One may wish to associate a muon to a jet for the purpose of b-tagging but also associate a further muon to the same jet as a candidate for the decay of a top quark.
-
The user analysis package. As a part of the analysis, to help the user get started quickly with his analysis code, a user analysis package is provided in the CVS repository under PhysicsAnalysis/AnalysisCommon/UserAnalysis/. It contains a skeleton analysis algorithm and sets up the CMT environment for the user. The idea is to provide an environment where novice users can get started quickly in developing their own analysis code.
-
Interactive analysis in Athena. The PAT group also provides tools for interactive analysis. It is possible to browse the content of the AOD and make plots of the raw AOD data. It is also possible to examine the raw AOD data interactively without writing a single piece of code. The user may define, fill and manipulate histograms and n-tuples, and it is possible to access the histograms and n-tuples that are defined in the analysis algorithms.
-
Analysis in Python. The tools exist for the user to write complete analysis codes in Python.
-
In collaboration with the database group, the PAT group provides tools for the event tag definition, the AOD streaming, and for collections of interesting physics events.
-
Special utilities. The PAT group provides tools to address specific class of problems. For example, to solve for neutrino objects in X → ττ or in W → lν using the collinear approximation or the W-mass constraint, respectively.
3.10.2 Short-Term Objectives
Some of the tools described above are being improved with added functionality, following requests from individual users or groups. Concurrently, the design and implementation of other tools are being discussed. These consist of the following:
-
Overlap/redundancy/ambiguity. A set of tools to check for overlaps between different objects in the analysis domain; to remove redundancies (for example, the collection of jets includes the jets that are already tagged as b-jets and put in separated b-jet collections); to solve ambiguities (an object reconstructed as both a high-p
T
and a low-p
T
muon, which thus appears in both collections).
-
SymLink is a tool which allows one to record a container of objects as one type and retrieve it as a container of a different type. For example, through SymLink, one can record a container of electrons and later, for whatever reason or purpose, retrieve the same container as a container of IParticles - the Electron and the Muon classes derive from the IParticle class but the Electron container or the Muon container classes do not derive from the IParticle container class. How to relate the Electron container to the IParticle container is known as SymLink. Many analysis use-cases of SymLink exist: for example, when one would like to treat the container of Electrons and Muons as containers of IParticles with no regard to the detailed differences in the Electron and the Muon implementations. Although the current implementation of the SymLink works for most use-cases, it does not work in certain circumstances and it is not compiler safe nor portable.
-
A fast simulation tool, known as the Atlfast comparator, is being developed. The objective is to tune the fast simulation parameters by making detailed comparison with fully simulated or real data. This is described in Section 3.8.3.3 above.
-
One needed feature of the interactive analysis is the ability to find and read-in an arbitrary event from the input data stream; in general, the ability to re-initialize the event loop without exiting the interactive session. A prototype tool to do this exists, PyPoolSeek. The extension of this tool for the run number in addition to the event number, i.e., seek(run number, event number), and for suppressing the processing of intermediate events when one needs to get to a specific event would be good to have and should be investigated. Two other features are needed to make the interactive analysis truly useful. One is adequate processing speed, i.e., to be able to run over a moderately-sized sample and make plots in a few seconds. At the moment, we are at the level of a few minutes; a caching schema should be considered to improve the processing time. The other needed feature is the ability to read different multiple samples in a single interactive session, e.g., open both signal and background samples, make histograms on each and overlay the histograms.
-
Interactive analysis and event display: Atlantis is moving in that direction so that it will be able to do everything available in interactive Athena. The development surrounds three related technologies: interactive analysis in Athena, the XML RPC Server, and Atlantis. One can run a remote interactive Athena session and steer it with XML RPC. In that set-up, one asks the server to execute an interactive Athena command on a remote interactive Athena session. Essentially, this supplies the reverse form of the communication. The plan for Atlantis is to be able to transmit information back to the interactive prompt.
-
Event view. This is a coherent and exhaustive list of physics objects that are mutually exclusive. The user may wish to consider different views of the same event. By coherent, it is meant that the user does not need to carry out additional checks nor call additional tools to guarantee the self-consistency of the view. The sum of the energies of the objects in the view should come out as the total energy in the event; the total sum of the transverse momenta, including the missing transverse momentum, should go to zero: the view is exhaustive when these criteria are met. Objects in the view are mutually exclusive; e.g. a jet should not also be listed as an electron. Thus, the overlap checking, redundancy removal and ambiguity resolving tools are integral parts of the tools for the event views.
The analysis tools [3-37] described here are already quite well documented. Web and wiki documentation is available on the PAT web page [3-38]. This page is linked from the ATLAS main page and also from the ATLAS computing page.
4 July 2005 - WebMaster Copyright © CERN 2005