Starting point
- login on lcgui003
- tcsh
- cd /afs/cern.ch/sw/arda/install/ITU
- source env.csh
- grid-proxy-init should not be needed if you created a long proxy for a few weeks
Operations
CELIST and WNLIST are created and mantained by CERN (in the install/ITU dir).
Software installation (Patricia)
Distribute the software to the sites in the CE list (TU-ALL-0 is the name of the ITU sw tgz file)
./submitter_itu_v2.pl -tag ITU-ALL-0
How to query?
./query_itu.pl -tag ITU-ALL-0
DIANE recommended operation for the 4th week of ITU production
We are using two masters on lcgui003 and lxarda01. Each master may connect a maximum of 350 workers.
You should split the full production in two equal parts. Each job file should do different executables, the order is
from the slowest to the shortest. It is best to split each executable in two parts as well. So each job file contains the same number of executables
but each executable contains only half of total number of requirements.
The
MonaLisa monitoring identifes all masters which belond g to the same production using the file:
run_seqno
Before beginning the production you should do
update_runseqno.csh
. This will create a new run number. If you forget to do this, your
production will be accounted to the old (previous) production.
Minimal granularity supported by the master is 2 for d2d and 50 for o2d,d2o. Below these number the efficiency problems start....
Open 4 windows.
Window 1 : lcgui003
start diane master
diane.startjob2 -j ITU-1st.job --inactive >& /dev/null &
Monitor what happens with the master:
diane.master.command ping
Create HTML report:
diane.report diane.workspace/jobs/XXX
Full monitoring is available only when the master is activated (i.e. the production is started)!
Window 2 : lxarda01
Start another diane master:
diane.startjob2 -j ITU-2nd.job --inactive >& /dev/null &
Now there are two masters running in the inactive mode. You will have to use their identifiers in order to distinguish them.
You can get the identifiers from the log files (master*.log).
Monitoring information about specific masters.
diane.master.command --master-file diane.workspace/jobs/XXX/MasterOID ping
diane.plotprofile diane.workspace/jobs/XXX - cumulative
diane.plotprofile diane.workspace/jobs/XXX - power
Window 3 : lxplus
You should do one step after the other - wait until the previous step completes!
submit the CERN workers to the master number XXX
./submit_to_lsf ITU.job 150 XXX
Window 4 : lcgui003
then submit to other GRID sites - to the master number XXX
./submit_to_grid ITU.job 150 XXX
finally you can submit to the desy site
./submit_to_desy ITU.job 20 XXX
Activation of the masters
When you are ready with the tarball and you want to activate the masters, you should issue one command for each master:
diane.startclient --job ITU-1st.job --jobid XXX &
The job will start.
DIANE recommended operation for the 3rd week of ITU production
Open 3 windows.
Window 1 : lcgui003
start diane master
diane.startjob2 -j ITU.job >& /dev/null &
Monitor what happens with the master:
diane.master.command ping
Create HTML report:
diane.report diane.workspace/jobs/XXX
Window 2 : lxplus
You should do one step after the other - wait until the previous step completes!
submit the CERN workers
./submit_to_lsf ITU.job 150
Window 3 : lcgui003
then submit to other GRID sites
./submit_to_grid ITU.job 150
finally you can submit to the desy site
./submit_to_desy ITU.job 20
Using DIANE - general information
There are two alternatives:
- when the ITU tarball is available immediately, start DIANE in active mode (preffered solution now)
- OR start DIANE in the inactive mode a few hours before the software tarball is available (may have some problems)
In both cases Ganga is used to submit the worker agents. You will get the stderr and stdout from Ganga and also the worker status updates.
The submission to Ganga is finished if in the logfile of the master you can see a string:
submission of worker agents through GANGA finished!
**************************************************
You can also start Ganga with monitorign disabled, which means that it is safe to run it while not of the worker jobs are fully submnitted.
This option can only be used to look into the current status of the jobs, NOT for submission.
ganga -o'[PollThread]autostart=0'
It is better to wait until the submission finishes before starting another ganga session.
Starting DIANE in active mode
diane.startjob2 -j ITU.job -w300@LCG --wms=$PWD/WNLIST.txt --ganga
Starting DIANE in inactive mode
Start the master and submit workers. They are not activated yet.
diane.startjob -j ITU.job -w300@LCG --wms=$PWD/WNLIST.txt --ganga --inactive
Activate the job: workers will start initializing i.e. waiting for the tarball in the sw area of the site (OK file) and once
it arrives they start the computation.
ITU.job file defines the tag, the number of requirements, executables, ...
diane.startclient --job=ITU.job --jobid=AUTO
Submitting more workers later.
You may submit more workers later if you need more CPUs. Make sure that the initial submission has been finished and also that you do not have
other ganga sessions running at the same time.
Submitting more workers in the gear VO:
diane.ganga.submitworkers --job=ITU-patricia.job --nw=1 --bk=lcg
Submitting to DESY is done via another script becausew the VO is Geant4. The script temporarily changes ~/.gangarc file so be careful NOT to use it
at the same time as the script above. Also if you kill the desy script make sure that your ~/.gangarc is copied back from the backup (~/.gangarc-BACKUP).
Also make sure that
submit_to_desy
script uses the correct .job and WNLIST files.
./submit_to_desy 104 3 # 104 - master id, 3 - number of new workers
Submitting workers to LSF at CERN (on lxplus):
diane.startjob ... -w20@lsf --wms '-q itu'
ganga --config kuba_test/gangarc-lsf `which diane.ganga.submitworkers` --job ITU-manara2.job --nw=10 --bk=lsf --wopts 'itu'
<!-- diane.ganga.submitworkers --job=ITU-patricia.job --nw=20 --bk=lsf --wopts '-q itu' -->
Killing the system.
Kill master:
diane.master.command --master-file ~/diane.workspace/jobs/105/MasterOID kill
Kill workers from Ganga:
for j in jobs['DIANE_104']:
j.kill()
First-time Setup
# Login on lcgui003
tcsh
1 ITU working area
cd /afs/cern.ch/sw/arda/install/ITU
2 Get the environment right
source env.csh
3 This creates a config file: ~/.gangarc
#--> ganga -g <--
4 Then you should specify your Virtual Organisation in the [LCG] section in the ~/.gangarc
5 [VirtualOrganisation] = gear
5.0.1 If you ran ganga before you may want to delete old jobs:
#--> rm -rf ~/gangadir <--