CERN Accelerating science

This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.

next up previous
Next: Migration of 3480s to Redwood: Progress Satisfactory Up: cnl230.html Previous: The RSPLUS Public Login Service

The Public Batch Service: An Upgrade and the Migration to LSF

  Tony Cass, Pietro Martucci and Alessandro Miotto, IT/PDP


With the end of maintenance for most of the SP2 nodes, the public batch service will be entirely based on IBM PowerPC workstations. 18 PowerPC nodes are in use today and a further 15 systems were purchased at the end of last year. These new machines are equipped with 332 MHz PowerPC 604e CPUs, each rated at 100 CERN Units and thus provide a significant increase in capacity. The total PowerPC based capacity will be 2040 CERN Units compared to 990 CERN Units for the current configuration (and 450 units for the original SP2 batch service in 1995).

However, as we move away from the SP2, we will also be changing to use LSF as the batch scheduler rather than LoadLeveler. LSF is already in wide use for SHIFT batch services and will replace both NQS and LoadLeveler to become the single batch scheduling system at CERN by the end of 1998.

LSF has already been installed on the 15 new machines and on the RSPLUS nodes. Adventurous users can look at the existing LSF Web pages (http://wwwinfo.cern.ch/pdp/lsf) and try submitting LSF jobs now. We will, however, be providing more documentation in the coming weeks. A brief introduction to the LSF queues and a sample LSF job are included below.

The key step for the introduction of the LSF based public batch service will be the introduction of CPU quotas based on the COCOTIME allocations, expected to be at the beginning of April. Following this we will start to migrate the existing PowerPC nodes to LSF and leave only the SP2 nodes running LoadLeveler. The end of the LoadLeveler batch service is currently scheduled for June 28th.

LSF queues

The batch queues on virtually all of the batch clusters running LSF are defined with an indication of the queue length in "Normalised" (or "New") CERN Units. This "NCU" is simply scaled by a factor of 100 from the existing CERN Unit definition. Conveniently, therefore, the queue definitions for the new Public Batch machines show how much real CPU time a job can utilise. The 4 queues defined on the new cluster are as follows:
8nm	8 New Minutes, equivalent to about 13.5 CERN Unit hours
1nh	1 New Hour, equivalent to 100 CERN Unit hours
8nh	8 New Hours, equivalent to 800 CERN Unit hours
1nd	1 New Day, equivalent to 2400 CERN Unit hours
We will be reviewing this set of queues in the light of experience.

A Sample LSF Job

Finally, we include here a sample script ("example2" from the LSF examples under http://wwwinfo.cern.ch/pdp/lsf) that will run a Fortran job under LSF. If this script is saved as "myjob", the command to submit and run the job is simply "bsub < myjob".

As you can see, LSF directives are given in lines starting "#BSUB", e.g. "#BSUB -J example2" or "#BSUB -q 8nm", where the options "q" and "J" are just those that can be given on the bsub command line. A full list of bsub options is available in the bsub man page (type "man bsub").

#!/usr/local/bin/zsh
#
# As usual, the first line defines the shell
# Lines beginning '#BSUB' are LSF directives
#
# Give the job a name
#
#BSUB -J example2
#
# We don't need much time so the 8 normalised minute queue is fine
#
#BSUB -q 8nm
#
# Copy the in-line Fortran to a work file.
#
cat > temp.f <<EOF
      PROGRAM TIME
      Do M = 1,100000
         If ( 2*(k/2) .eq. k ) k = 0
         Do I = 1,100
            Do J = 1,100
               K = K + I*J
            Enddo
         Enddo
      Enddo
      call timel(tleft)
      call timex(tused)
      write (6,'("The answer:",I9)') k
      write (6,'("Time used =",F9.2/"Time left =",F9.2)') tused,tleft
      END
EOF
#
# Compile and link with CERNLIB
#
hepf77 temp.f `cernlib`
#
# Execute the program
#
a.out


next up previous
Next: Migration of 3480s to Redwood: Progress Satisfactory Up: cnl230.html Previous: The RSPLUS Public Login Service

Cnl.Editor@cern.ch