Changes to the CERN Batch System (LSF)

This website is no longer maintained. Its content may be obsolete. Please visit http://home.cern/ for current CERN information.

Contents
Index

Editorial Information
Editorial
If you need help
Announcements

2002 CERN School of Computing

Physics Computing

European DataGrid Project Demonstrated Successfully
Changes to the CERN Batch System (LSF)

Desktop Computing

Starting Remote X Applications at CERN

Internet Services and Network

Scientific Applications and Software Engineering

Rational Unify Process (RUP), a Software Process Knowledge Base.

Desktop Publishing

XML Applications at CERN

The Learning Zone

User Documentation

Just For Fun ...

Previous:		European DataGrid Project Demonstrated Successfully
Next:		Desktop Computing
(If you want to print this article)

Changes to the CERN Batch System (LSF)

Tim Smith and Ulrich Fuchs , IT/FIO

The manner in which CPU allocations are managed on LXBATCH has been changed in order to to optimise the overall resource usage. We made this change to introduce "LSF FairShare" on Monday March 18th at 08:00 hrs, and the details are described in this article.

Global Allocations

CPU resources were allocated in a rather static manner by partitioning the majority of LXBATCH, with experiments having an allocation of fixed machines with dedicated queues to submit to these machines. In addition there was the general usage public partition. There was no dynamic sharing of resources between experiment partitions, nor any automatic way to use the spare cycles from under-utilised partitions. Therefore we removed the static partitions, putting all CPUs back into a common pool, and let LSF share these resources using the LSF FairShare algorithm. The initial configuration for this algorithm allocates shares to each experiment in direct proportion to the previous statically available resources they had. If the experiment partition had a special configuration, like extra memory, then this is specified as an extra resource on these nodes which can be selected in the job specifications, to ensure that the jobs go to nodes with the necessary configuration.

Allocations within an experiment

CPU resources within an experiment were typically allocated to different groups using different queues of varying priorities. This is handled in the new scheme using the Multi-Level FairShare capabilities of LSF. This means that within the experiment share, further sharing can be defined between the various experiment groups, without the need for dedicated queues.

Instant Response

It is often argued that the reason experiment dedicated resources are idle is to ensure an instant response when necessary. We wish to not leave nodes completely idle, but also give instant response. We try to do this by reserving a fraction of the nodes to run only certain queues, with restrictions on the number of jobs per user. The size of this fraction will be tuned, but one could imagine for example that in this way we guarantee that 20 independent users will get instant access to the 8nm queue. Tuning the number of nodes according to the load on the queues should ensure instant access for the majority.

Transition measures

Initially the dedicated experiment queue names (e.g. zz_1nw) were not destroyed, but are channelled into the common queues, in order to give continuity during the transition phase. Subsequently we will removed all these dedicated queue names to simplify the cluster from both the user and administrators point of view.

Assessment of fairness of sharing

The LSF FairShare algorithms rely on historical information about the resource usage. The reference period of this history can (and will) be tuned, but obviously it takes several days for the system to stabilise into giving correct allocations. We therefore asked for the patience of the users to not flood the lists in the initial days after introduction with questions about not getting the right response, or not being treated fairly. This could only be truly assessed after the system has stabilised and been tuned a little.

Exceptions

We wish to keep the exceptions to a minimum, but acknowledge that some resources may have to be treated specially. All COCOTIME controlled resources went straight into the common pool, but some LXBATCH nodes were bought from private experiment funds. Ideally these would simply be contributed to the common pool also, and the share for that experiment boosted by the appropriate fraction. In cases where this is unacceptable, special partitions have to be maintained, but no attempt is made to balance between the experiments share of the common pool and the private resource. It is the responsibility of the users to submit their jobs to the public or private resource in order to keep them busy, which is obviously less convenient than letting the system do this scheduling.

Future Developments

The next major development stage will be to install a new version of LSF (v4.x) in mid May. This has increased stability, functionality and bug fixes to problems we currently encounter.

Training

A series of LSF training courses will be organized during the summer. Please watch the Bulletin for announcements.

Considering the sub-optimal overall-CPU usage of our clusters and the constant cry for more, moving towards the policy of sharing all available resources across experiment boundaries is not only a necessary step for successful resource management, but also a must in times of financial restrictions. In the long term, the increased efficiency and available capacity of the installation will repay for any difficulties during the transition from the present setup to a fair-shared common cluster.

For matters related to this article please contact the author.

Cnl.Editor@cern.ch

CERN-CNL-2002-001
Vol. XXXVII, issue no 1

CERN Accelerating science

Contents Index

Changes to the CERN Batch System (LSF)

Global Allocations

Allocations within an experiment

Instant Response

Transition measures

Assessment of fairness of sharing

Exceptions

Future Developments

Training

Contents
Index