The GAUSS Project

Automatic generation of Generator (and Simulation) Statistics Tables for 2010 production

Introduction

The purpose of this project is to generate some statistics over several production jobs belonging to a specific simulation condition. It was used to calculate the generator (simulation) statistics associated to a certain EventType and simulation condition (see examples in next section). It can be run over a single file or a directory containing a list of files.

How to use it

The GaussStat.py script (which can also be accessed via CVS in Sim/Gauss/ under the "scripts" directory) contains a usage description:

$ python GaussStat.py --help
Usage: python GaussStat.py -e EVENTTYPE -f filename.log.gz
Usage: python GaussStat.py -e EVENTTYPE -v GaussVersion -f filename.log.gz
Usage: python GaussStat.py -e EVENTTYPE -v GaussVersion --path directory_with_logs
[--simulation/--generation]
[-h/--help]
[-d/--debug]
[-i/--install] *only for MC09*
[-a/--addToIndex] *only for MC09*

The parameter EventType is mandatory. The scripts can process a single log file (-f/--file option) or a set of log files stored in a directory (--path option).
The output will be in the form of an html page containing statistics tables for the generators only (--generator option), for the simulation only (--simulation option) or for both (--both option, the default). The output html files will have the name of the mode (Generation or Simulation) followed by the name of the simulation condition (example: Generation_MC09-b5TeV-md100.html).
The output is by default produced in the local directory where the scripts is run; once you have produced the generator statistics web html page you can send it to Gloria.Corti@cern.ch or Silvia.Miglioranzi@cern.ch in order to have it linked to the official summary table page. The script contains options which enables the user to move it direclty to the public official area ( $LHCBDOC/STATISTICS/MC10STAT/) using the option (-i/-install) and to link it to the main summary page (-a/--addToIndex) if the simulation condition processed is not yet present in the summary table page but this was a procedure valid for MC09, now please send the local page to us.

How to obtain the production log files

The idea is to run the GaussStat.py script over a list of production log files classified according to their EventType and Simulation Condition. In the following you can find a set of utility scripts which helps you:

Downloading log files from production

  • Through the DIRAC script (NOTE: before executing the script, "SetupProject LHCbDirac" and "lhcb-proxy-init" are needed):

    The dirac script dirac-production-dowloads-logs-withoutcheck.py allows to download log files (by default 100) corresponding to a certain PRODID. $ ./dirac-production-download-logs-withoutcheck.py PRODID [Njobs]
    Example:
    $ python dirac-production-download-logs-withoutcheck.py 8559
    This script searches on the log web server and downloads the Gauss log files corresponding to the PRODID specified. Only recent productions are kept on that location so please check if the PRODID you need is on that web list. For older productions the logs have been moved to CASTOR so you can try to obtain them using the instructions in the next bullet.

  • Copying them from castor :

    Most of the output of the workflows jobs have been stored on castor (Ex: /castor/cern.ch/grid/lhcb/backup/log/00005117_0000.tgz contains the complete output, all logs, html etc. corresponding to the PRODID 5117). You can use the python script Untar_Castor.py to retrieve the Gauss logs you need like:

    $ python Untar_Castor.py EventType PRODID [Njobs]
    Example:
    $ python Untar_Castor.py 12135010 5117

    The script will download all the corresponding log files in directories named (Gauss_EvenType_PRODID). By default Njobs (number of workflows job output the script will search for) is 100. It can happen that in a job output subdirectory the Gauss log is not present because the job did not end successfully, in this case the script just gives a message stating that it cannot find the log but it then procede to inspect the following subdirs and to download the Gauss logs whenever available. N.B.: the script needs to copy and untar from CASTOR the PRODID tarball so it can take up to few minutes to execute.


    Utilities

    Re-ordering the log files according to the Simulation Conditions

    There are cases in which more PRODIDs in the same production are associated to an EventType; it is useful to run the statistics script per simulation conditions grouping together the logs corresponding to the same EventType and conditions. Suppose you have different directories containing logs (classified according to EvenType and PRODID):
    Gauss_12143001_00004990/
    Gauss_12143001_00004886/
    Gauss_33102100_.../
    ...
    the script order_simcond.py re-classify the log files in a set of directories according to the simulation conditions (APPCONFIG/conditions) name. In this way each directory can contain more than one PRODID logs. Referring to the previous example:


    Gauss_12143001_MC09-b5TeV-md100 / Gauss_00004990_00000041_1.log
    Gauss_00004886_00000001_1.log
    ...
    Usage: $ python order_simcond.py dir_containing_all_the_log_dirs

    Run the statistics script over a set of directories at once

    The script wrap_statistics.py allows you to run the statistics.py script over a list of directories specifying the main directory like:

    $ python wrap_statistics.py Gauss_50_logs/

    where the main directory Gauss_50_logs/ contains a set of dirs like:

    drwxr-xr-x 2 silviam z5 8192 Sep 11 11:18 Gauss_13442001_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_13144400_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_13152400_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_18112001_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 8192 Sep 11 11:18 Gauss_11102201_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_12145004_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_42100000_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 8192 Sep 11 11:18 Gauss_13102201_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_15144103_MC09-b5TeV-md100
    drwxr-xr-x 2 silviam z5 4096 Sep 11 11:18 Gauss_20000000_MC09-b5TeV-md100
    ...

    The output html page will be still unique (e.g. Generation_MC09-b5TeV-md100.html) but it will contain in the header part the list of the links to the corresponding EventType statistics table. See an example in here.