Latest batch system ...
31/7/98
CERN is now using a thing called LSF.
The primer
seems most useful reference.
An email about batch server from Hans Grote, concerning our local machines.
Subject: batch server
Date: Fri, 27 Mar 1998 11:34:39
+0100 (MET)
From: Hans Grote <Hans.Grote@cern.ch>
To: all hp users
--
Our batch server slapb01 is now in operation.
The batch submission
system used is LSF both for this server,
and for shiftnap. The job
submission is done with "bsub". The documentation
can be found in
http://wwwinfo.cern.ch/pdp/lsf/
Jobs submitted on slap will go by default
to the HP (slapb01). If you
want to use the DEC machines, when submitting
specify the option '-R dux',
or '-R type==DigitalUNIX'. If you want to
use the HP specify '-R hpux' or
'-R type==HPPA'. If you do not care specify
'-R shift' or '-R type==any'.
Hans 27.3.98
Commands I like:
bsub scriptname
to submit a job in a file and
bjobs -l
to see complete information about my jobs. There is a more
useful graphical interface in the command
xlsbatch
With my setup, the default is to run on the NAP machine when the above
bsub command is executed on a slap machine. However
you must first log in to the slapb01 machine so that a klog is done there
and the machine has an AFS token to save files.
Joys of Unix.
Another thing is that the job script must contain
the option line
#BSUB -R shift
Another thing to watch: MAD pool files created
on HP or IBM do not work on NAP (Digital) machines.!!
To run on RSPLUS, there is probably some option but the easiest thing
is just to submit the job from an rsplus terminal window. The klog
nonsense does not apply.
Traps discovered to date
11/1/1999 To run jobs on the NAP (or whatever they're called) machines,
like slapb01, I seem to have to execute the bsub command on slapb01 itself.
One way is to rlogin slapb01 but I can also do things like
rsh slapb01 bsub public/scripts/BasicBatchJob.csh
to submit a job.
-
Watch how you specify path to a batch job script. E.g. if currently
in AFS home directory, the command
bsub public/scripts/BasicBatchJob.csh
will submit a job which will not find the script. You have to
type something more explicit, like
bsub ~/public/scripts/BasicBatchJob.csh
to get the desired job to run
Some basic example batch jobs follow. When the job has run, the results
will generally be found in a sub-directory of the directory in which
the bsub command was run. It will be called
LSFJOB_<some job numbers>
Usually, it will be more convenient to manage the file locations explicitly.
Generic batch job
Here is a trivial basic job, based on example from above primer, showing
how to do some trivial Unix commands.
/afs/cern.ch/user/j/jowett/public/scripts/BasicBatchJob.csh
Mathematica batch jobs
Here is a basic job, showing how to do some typical things.
This is meant to be used as a model and is quite heavily commented.
I should add any new tricks to it.
/afs/cern.ch/user/j/jowett/public/scripts/MathematicaBatch.csh
18/3/1999. Improvement on above: I just wrote a function
that will prepare a batch script to run a Mathematica job in CERN.
It is part of my BumpEtc package (loaded by default in my setup and that
or SLAP Unix users). The usage message is:
makeBatchScript[mathFile,batchFile] creates a batch job script
for the CERN batch system that will run the commands in the file mathFile.
Some points about how this works:
-
A handy way to use this is by developing a procedure in a notebook, making
each step an iniitialisation cell and saving the auto-generated package.
This .m file is the Mathematica input to be given as the mathFile
argument.
-
For more on this, see a Useful reference for how to prepare Mathematica
jobs for running without the Front End
DN
- Re: running calculations in a batch mode
-
Unless the mathematica input file explicitly sets a current directory,
the working directory will be the batch job's working directory (usually
some temporary directory on the batch machine).
-
The script includes a list of patterns for filetypes that will be returned
to the directory LSFJOB... created for the output files. This
works by creating a file RETURN in the batch job's working directory as
explained under the heading "Other files" in A
quick introduction to LSF a CERN.
-
The mathematica input (.m) file is actually copied into the batch script.
The reason for this is to allow the output for each step to appear.
Otherwise only the value of the last expression would appear.
-
The main output is in the stdout file and uses settings for $Echo
and "stdout" taken from R. Maeder, Programming in Mathematica, 3rd edn.,
p. 250.
-
The function makeBatchScript works by Splicing a template
header file, copying the Mathematica input file after it and then adding
a template tail file to produce the batch script batchFile.
-
These template files are in my /public/scripts/ directory under AFS.
(MathBatchHead.csh and MathBatchTail.csh).
-
You may wish to modify some parts of the batch file, e.g. the BSUB options.
-
Note that any '$' characters appearing in the input file will be translated
to "\$" in the script. This is a consequence of having to use the
primitive Bourne shell.
MAD batch job example
There seems to be no LSF version of the old madbatch command (??) with
a standard method for dealing with a certain standard set of output files
from MAD. Meanwhile, here is a current basic MAD job, showing how to do
some triviall things. In general it will
require explicit paths to any files that it uses.
I should add any new tricks to it.
/afs/cern.ch/user/j/jowett/public/scripts/MADBatch.csh
This job contains the MAD commands. Obviously, it may often be more
convenient to use a command like
mad -new < file.mad
instead.
I should really make something similar to the Mathematica batch job
preparer above.