The GAUSS Project

Automatic generation of Generator (and Simulation) Statistics Tables for MC09 production

Introduction

The purpose of this project is to generate some statistics over several production jobs belonging to a specific simulation condition. It was used to calculate the generator (simulation) statistics associated to a certain EventType and simulation condition (see examples in next section), i.e. in each case more production jobs belonging to different Gauss versions or PRODIDs are processed together. It can be run over a single file or a directory containing a list of files.

How to use it

The GaussStat.py script (which can also be accessed via CVS in Sim/Gauss/ under the "scripts" directory) contains a usage description:

$ python GaussStat.py --help
Usage: python GaussStat.py -e EVENTTYPE -f filename.log.gz
Usage: python GaussStat.py -e EVENTTYPE -v GaussVersion -f filename.log.gz
Usage: python GaussStat.py -e EVENTTYPE -v GaussVersion --path directory_with_logs
[--simulation/--generation]
[-h/--help]
[-d/--debug]
[-i/--install]
[-a/--addToIndex]

The parameter EventType is mandatory. The scripts can process a single log file (-f/--file option) or a set of log files stored in a directory (--path option).
The output will be in the form of an html page containing statistics tables for the generators only (--generator option), for the simulation only (--simulation option) or for both (--both option, the default). The output html files will have the name of the mode (Generation or Simulation) followed by the name of the simulation condition (example: Generation_MC09-b5TeV-md100.html).
The output is by default produced in the local directory where the scripts is run; the user can choose to move it to the public official area ( $LHCBDOC/STATISTICS/MC09STAT/) using the option (-i/-install) and to link it to the main summary page (-a/--addToIndex) if the simulation condition processed is not yet present in the summary table page.

Other files used

The idea is to run the GaussStat.py script over a list of production log files classified according to their EventType and Simulation Condition. In the following you can find a set of utility scripts which helps you:

Downloading log files from production

  • Through the bkk scripts (NOTE: before executing the scripts, "SetupProject LHCbDirac" and "lhcb-proxy-init" are needed):

    The dirac script dirac-production-dowloads-logs.py (new version dirac-production-dowloads-logs-withoutcheck.py )allows to download log files corresponding to a certain PRODID. If you need to download the log files corresponding to a certain Gauss version and Eventype, you can use a script like access_bkk.py which exploit the bookeeping scripts to have the complete list of PRODIDs corresponding to a certain Gauss version/ EventType. This script then calls the script dirac-production-download-logs.py.
    Example usage:

    $ python access_bkk.py Gauss v37r2 12143001

    will download all the corresponding log files in directories named (Gauss_EvenType_PRODID):


    Gauss_12143001_00004990 / Gauss_00004990_00000041_1.log
    Gauss_00004990_00000041_2.log
    ...
  • Copying them from castor :

    Most of the output of the workflows jobs have been stored on castor (Ex: /castor/cern.ch/grid/lhcb/backup/log/00005117_0000.tgz contains the complete output, all logs, html etc. corresponding to the PRODID 5117). You can use the python script Untar_Castor.py to retrieve the Gauss logs you need like:

    $ python Untar_Castor.py 12135010 5117

    The script will download all the corresponding log files in directories named (Gauss_EvenType_PRODID) (like seen in previous bullet).


    Re-ordering the log files according to the Simulation Conditions

    The script order_simcond.py re-classify the log files in a set of directories according to the simulation conditions (APPCONFIG/conditions) name. In this way each directory can contain more than one PRODID logs. Referring to the previous example:


    Gauss_12143001_MC09-b5TeV-md100 / Gauss_00004990_00000041_1.log
    Gauss_00004886_00000001_1.log
    ...

    Run the statistics script over a set of directories at once

    The script wrap_statistics.py allows you to run the statistics.py script over a list of directories specifying the main directory like:

    $ python wrap_statistics.py Gauss_50_logs/

    where the main directory Gauss_50_logs/ contains a set of dirs like:

    drwxr-xr-x 2 silviam z5 8192 Sep 11 11:18 Gauss_13442001_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_13144400_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_13152400_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_18112001_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 8192 Sep 11 11:18 Gauss_11102201_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_12145004_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_42100000_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 8192 Sep 11 11:18 Gauss_13102201_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_15144103_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_20000000_MC09-b5TeV-md100
    ...

    The output html page will be still unique (e.g. Generation_MC09-b5TeV-md100.html) but it will contain in the header part the list of the links to the corresponding EventType statistics table. See an example in here.