Starting point

  • login on lcgui003
  • tcsh
  • cd /afs/cern.ch/sw/arda/install/ITU
  • source env.csh
  • grid-proxy-init should not be needed if you created a long proxy for a few weeks

Operations

CELIST and WNLIST are created and mantained by CERN (in the install/ITU dir).

Software installation (Patricia)

Distribute the software to the sites in the CE list (TU-ALL-0 is the name of the ITU sw tgz file)
./submitter_itu_v2.pl -tag ITU-ALL-0

How to query?

./query_itu.pl -tag ITU-ALL-0

DIANE recommended operation for the 4th week of ITU production

We are using two masters on lcgui003 and lxarda01. Each master may connect a maximum of 350 workers.

You should split the full production in two equal parts. Each job file should do different executables, the order is from the slowest to the shortest. It is best to split each executable in two parts as well. So each job file contains the same number of executables but each executable contains only half of total number of requirements.

The MonaLisa monitoring identifes all masters which belond g to the same production using the file: run_seqno

Before beginning the production you should do update_runseqno.csh . This will create a new run number. If you forget to do this, your production will be accounted to the old (previous) production.

Minimal granularity supported by the master is 2 for d2d and 50 for o2d,d2o. Below these number the efficiency problems start....

Open 4 windows.

Window 1 : lcgui003

start diane master

diane.startjob2 -j ITU-1st.job --inactive >& /dev/null &

Monitor what happens with the master:

diane.master.command ping

Create HTML report:

diane.report diane.workspace/jobs/XXX

Full monitoring is available only when the master is activated (i.e. the production is started)!

Window 2 : lxarda01

Start another diane master:

diane.startjob2 -j ITU-2nd.job --inactive >& /dev/null &

Now there are two masters running in the inactive mode. You will have to use their identifiers in order to distinguish them. You can get the identifiers from the log files (master*.log).

Monitoring information about specific masters.

diane.master.command --master-file diane.workspace/jobs/XXX/MasterOID ping
diane.plotprofile diane.workspace/jobs/XXX - cumulative
diane.plotprofile diane.workspace/jobs/XXX - power

Window 3 : lxplus

You should do one step after the other - wait until the previous step completes!

submit the CERN workers to the master number XXX

./submit_to_lsf ITU.job 150 XXX

Window 4 : lcgui003

then submit to other GRID sites - to the master number XXX

./submit_to_grid ITU.job 150 XXX

finally you can submit to the desy site

./submit_to_desy ITU.job 20 XXX

Activation of the masters

When you are ready with the tarball and you want to activate the masters, you should issue one command for each master:

diane.startclient --job ITU-1st.job --jobid XXX &

The job will start.

DIANE recommended operation for the 3rd week of ITU production

Open 3 windows.

Window 1 : lcgui003

start diane master

diane.startjob2 -j ITU.job >& /dev/null &

Monitor what happens with the master:

diane.master.command ping

Create HTML report:

diane.report diane.workspace/jobs/XXX

Window 2 : lxplus

You should do one step after the other - wait until the previous step completes!

submit the CERN workers

./submit_to_lsf ITU.job 150

Window 3 : lcgui003

then submit to other GRID sites

./submit_to_grid ITU.job 150

finally you can submit to the desy site

./submit_to_desy ITU.job 20

Using DIANE - general information

There are two alternatives:

  • when the ITU tarball is available immediately, start DIANE in active mode (preffered solution now)
  • OR start DIANE in the inactive mode a few hours before the software tarball is available (may have some problems)

In both cases Ganga is used to submit the worker agents. You will get the stderr and stdout from Ganga and also the worker status updates. The submission to Ganga is finished if in the logfile of the master you can see a string:

submission of worker agents through GANGA finished!
**************************************************

You can also start Ganga with monitorign disabled, which means that it is safe to run it while not of the worker jobs are fully submnitted. This option can only be used to look into the current status of the jobs, NOT for submission.

ganga -o'[PollThread]autostart=0'

It is better to wait until the submission finishes before starting another ganga session.

Starting DIANE in active mode

diane.startjob2 -j ITU.job -w300@LCG --wms=$PWD/WNLIST.txt --ganga

Starting DIANE in inactive mode

Start the master and submit workers. They are not activated yet.

diane.startjob -j ITU.job -w300@LCG --wms=$PWD/WNLIST.txt --ganga --inactive

Activate the job: workers will start initializing i.e. waiting for the tarball in the sw area of the site (OK file) and once it arrives they start the computation.

ITU.job file defines the tag, the number of requirements, executables, ...

diane.startclient --job=ITU.job --jobid=AUTO

Submitting more workers later.

You may submit more workers later if you need more CPUs. Make sure that the initial submission has been finished and also that you do not have other ganga sessions running at the same time.

Submitting more workers in the gear VO:

diane.ganga.submitworkers --job=ITU-patricia.job --nw=1 --bk=lcg

Submitting to DESY is done via another script becausew the VO is Geant4. The script temporarily changes ~/.gangarc file so be careful NOT to use it at the same time as the script above. Also if you kill the desy script make sure that your ~/.gangarc is copied back from the backup (~/.gangarc-BACKUP). Also make sure that submit_to_desy script uses the correct .job and WNLIST files.

./submit_to_desy 104 3  # 104 - master id, 3 - number of new workers

Submitting workers to LSF at CERN (on lxplus):

diane.startjob ... -w20@lsf --wms '-q itu'

ganga --config kuba_test/gangarc-lsf `which diane.ganga.submitworkers` --job ITU-manara2.job --nw=10 --bk=lsf --wopts 'itu'

<!-- diane.ganga.submitworkers --job=ITU-patricia.job --nw=20 --bk=lsf --wopts '-q itu' -->

Killing the system.

Kill master:

diane.master.command --master-file ~/diane.workspace/jobs/105/MasterOID kill

Kill workers from Ganga:

for j in jobs['DIANE_104']:
  j.kill()

First-time Setup

# Login on lcgui003

tcsh

1 ITU working area

cd /afs/cern.ch/sw/arda/install/ITU

2 Get the environment right

source env.csh

3 This creates a config file: ~/.gangarc

#--> ganga -g <--

4 Then you should specify your Virtual Organisation in the [LCG] section in the ~/.gangarc

5 [VirtualOrganisation] = gear

5.0.1 If you ran ganga before you may want to delete old jobs:
#--> rm -rf ~/gangadir <--
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2006-06-08 - JakubMoscicki
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback