CERN Accelerating science

This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.



next up previous
Next: Transferring Files from Up: VM Migration Issues Previous: Migration of VM

Changes to the Public Unix Batch Environment on CERNSP

  Harry Renshall CN/PDP

The replacement for the public CERNVM Batch services is provided on the CERNSP service using the IBM Loadleveler product. There is a chapter on this in the CERNSP Introductory User Guide available from the UCO or at http://consult.cern.ch/writeups/cernspintro. We will be making three significant changes to this environment shortly after the appearance of this CNL.

Change 1 --

To submit a file as a batch job the command llsubmit filename is used. Loadleveler will assign a unique job identifier to the resulting job of the form sp008.1234.0. This identifier is TODAY displayed under the Id column by the llq command to show the current job queue and it can be used as argument to various Loadleveler commands, such as the llcancel command to cancel a job.

Jobs are also assigned jobnames, either through a Loadleveler statement in the submitted script (the job_name parameter), or defaulted by the llsubmit command. llsubmit uses the string it finds in the $HOME/.lljobcount file to build a job name. It looks for a 3-digit integer at the end of this string and adds one then saves it back. If their is no such integer it will create 001. If the file does not exist it will create it to contain uuu001 where uuu is the first part of the unix account field before the \$ sign (you can see yours via 'grep your-loginid /etc/account'). You can edit the .lljobcount file at any time. The first change users will see is that llq will by default display the jobname instead of the jobid. Of course, Loadleveler commands which use jobid can also take jobname as argument. And of course, jobid will still be available with a special option (-r jobid) to the llq command.

Change 2 --

Loadleveller currently creates a new directory in the directory of the submitted file plus the two job files known as standard error and standard out. Hence the command:

llsubmit mydir/myjob

resulting in job sp008.1234.0 will make

Inside the ``return'' directory Loadleveler puts at job end time all the files that the batch job leaves in its \$WORK scratch directory (there are parameters to control this). We are going to change our system so that the .err

and .out files are written into the local \$WORK directory and then at jobend time, copied back to the ``return'' directory.

This way,

a)
jobs will be more immune to network problems .
b)
all files returned by the batch job will be clustered inside the same subdirectory.

Change 3 --

Currently users can login to batch nodes to inspect the progress of batch jobs though this is only tolerated if the jobs are misbehaving. We have now introduced a set of commands which will allow remote inspection of running batch jobs so will be disallowing login to the batch and parallel nodes. The commands all begin ll and you can see a summary of them by typing man batch (from an SP2 node) and details on individual commands through man commandname as usual.



next up previous
Next: Transferring Files from Up: VM Migration Issues Previous: Migration of VM



Michel Goossens
CN Division
Tel. 3363
Tue Nov 28 18:14:41 MET 1995