The GAUSS Project

Automatic generation of Statistics and Simulation Tables

Automatic generation of Statistics and Simulation Tables

Introduction

The purpose of this project is to generate some statistics over several jobs. These jobs belong to a production, which is generated for a workflow version, and with other configuration parameters ( Gauss version, Dec Files version, ParamsFiles verion, ... ).

Normally, it will calculate the statistics associated to an Event Type and to a version of Gauss, but it's possible also to generate these statistics for a given production and luminosity. These data are passed as argument to the script.

For each workflow ( DC04, Higgs, RTTC, DC06 ) there are different versions ( v1, v2, v2r3, v1-lumi2, etc... ), which have associated several productions referenced by a production identifier ( prodID ). These identifiers contain jobs referenced with a job identifier ( jobID ).

How to use it

The script has to be used as follows:

$ python script.py -e EVENTTYPE -g GaussVersion
[--simulation/--generation]
[-p PRODUCTION_IDENTIFIER][-l Luminosity]
[--publish][-h/--help]

The parameters EventType and GaussVersion are mandatory. There are a lot productions generated associated to an EventType, and generated with maybe a different Gauss version. With the Event Type and the Gauss version the script can find the productions created with this configuration and complete a good statistic with them.

If we want to have the statistics from a specific production, it can be indicated through the -p parameter. The script would generate the statistics for the given production, even if there are not enough jobs to have a good one.

The workflow version or luminosity can be also specified if we want, generating the statistics results only for the productions associated to it. If not the statistics will be created for all workflow versions associated with productions for the given Event Type, creating different results files for each one.

The results produced are stored in different files, one for the generation and one for the simulation. In these files the results are kept par Event Type.

There will be results files for the different gauss version and for the all the luminosities. It means that the results for all the event types generated for the luminosity 2 with the gauss version v25r3, will be stored in the corresponding files Simulation_v1-lumi2_v25r3.html and Generation_v1-lumi2_v25r3.html.

The results are kept localy, but it's possible to publish them on the Gauss web with the --publish argument.

Other files used

Configuration File

Several information is taken from a configuration file config_file, stored in the same directory than the main script, which contains a lot of useful information. These data are kept following several rules. For each useful information there is a keyword defined which allows the script to recover it. Below is the list of these keywords.

PATH

In this file is stored the web main path where the log files are kept.

PATH = http://lhcb-logs.cern.ch/storage/lhcb/production/

Notice that the keyword is in capital letters, followed by the sign "=" and by the complete information needed ( http://.../ )

HTMLFILES

Another useful path in this file, is the path where the statistics and the simulation tables will be stored. This results have an html format, and the keyword which tells the correct location is HTMLFILES. There it could be also found an index or list showing all the results files availables.

HTMLFILES = $LHCBRELEASES/DOC/
SIMULATION/GENERATION

Both keywords make reference to the name of the results files for the corresponding execution.

SIMULATION = Simulation_DC06
GENERATION = Generation_DC06

The results file will be, for a simulation, Simulation_DC06_<workflow version>_<gauss version>.html and will be kept the HTMLFILES path.

PRODUCTIONS

During the execution, the script look for the productions information corresponding to the parameters given as arguments of the script. That data can be found in several files in another directory, which is given beside the PRODUCTIONS keyword.

PRODUCTIONS = $LHCBRELEASES/DOC/productions
Number of jobs for each Event Type

To have a good statistic we need to check a minimum number of log files. It means that it is necessary a number of jobs to can validate the execution and see good results. This number of jobs needed changes between several Event Types, so it can be also specified in the configuration file.

As all the Event Types can not be written in the configuration file, there is also a default entry for the majority of Event Types. It means that if for one EventType any number of jobs is found, the default value will be taken. For example:

default = 300
( '10000000' ) = 500
( '30000000', '12345678' ) = 400

Functions File

There are some common functions to the generation process and the creation of simulation tables one which are stored into another file, as a library. That makes the main file clearer to read, understand and, maybe, modify. It has just to be written in the main script:

from functions import *

Getting Productions

In the directory productions there should be several files, one for each gauss version, containing information about the productions generated for the correponding gauss version. These files are generated by a script called, getproductions.py, which ask to the bookkeeping database for these informations.

Sometimes the script won't can find information about the given arguments. It could be due to there is no information about it or the productions files are not updated. In order to solve this problem, the script uses getproductions.py to update this files. The script will ask us if we want do it because the operation can take several minutes.

The script can be executed by hand as well, in order to make easier and faster the generation of statistics. Getproductions.py takes an argument, the gauss version, and it creates a file, Productions_<gauss version>.txt and stores it in the directory productions.

$ python getproductions.py -g GAUSSVERSION

Each production file has the following format:

#EventType Description Workflow ProdID GaussVersion
10000000 incl_b v1-lumi5 00001325 v25r0
11154100 Bd_JpsiKS,ee v1-lumi2 00001329 v25r0
11144100 Bd_JpsiKS,mm v1-lumi2 00001330 v25r0
11102200 Bd_Kstgamma v1-lumi2 00001331 v25r0

This page last edited by M Barbera on September 21, 2006