Generic Installation and Configuration Guide for gLite 3.0

Important note: gLite 3.1 should be now used. Please, update your nodes and check the 3.1 Generic Installation and Configuration guide. This guide is no longer maintained and may be out of date

This document is addressed to Site Administrators in charge of middleware installation and configuration. It is a generic guide to manual installation and configuration for any supported node types. Links to the latest configuration tools (like YAIM) and to their release independent descriptions are provided in-line, where necessary.

Introduction to Manual Installation and Configuration

This document is addressed to Site Administrators in charge of middleware installation and configuration. It is a generic guide to manual installation and configuration for any supported node types. It provides a fast method to install and configure the gLite middleware on the various node types (WN, UI, CE, SE ...) on the top of the following Linux distributions:

  • Scientific Linux 3.0
  • Scientific Linux 4.0 (only for UI and WN)
  • Debian (only for the so called TAR_UIWN)

The proposed installation and configuration method for SL3 is based on the Debian apt-get tool. And on a set of shell scripts built within the YAIM framework. For description on YAIM see the web page for the proposed version be used: YAIM guide

The provided scripts can be used by Site Administrators with no need for in-depth knowledge of specific middleware configuration details. three configuration files, according to provided examples. The resulting configuration is a default site configuration. Local customizations and tuning of the middleware, if needed, can then be done manually.

New versions of this document will be distributed synchronously with the middleware releases and they will contain the current "state-of-art" of the installation and configuration procedures. A dual document with the upgrade procedures to manually update the configuration of the nodes from the previous LCG/gLite version to the current one is also part of the release.

The OS Installation

The current version of the gLite Middleware runs on Scientific Linux 3 (SL3). We give here a link to the web page with all the needed information is the following:
http://www.scientificlinux.org
The site where the sources, and the images (iso) to create the CDs can be found is
ftp://ftp.scientificlinux.org/linux/scientific/30x/iso/
Most middleware testing has been carried out on CERN Scientific Linux 3 (SLC3)
http://linuxsoft.cern.ch/
but should run on any binary compatible distribution.

Java Installation

You should install java sdk (1.4) on your system before installing the middleware. Download it from SUN java web site (1.4.2 or greater is required - http://java.sun.com/j2se/1.4.2/download.html ). You should absolutely install the J2SDK as an rpm package (if you do not install it in RPM format you'll not be able to install the middleware), on the sun java web page follow the link RPM in a self extracting file. Then follow instructions provided by SUN. Set in your site-info.def (YAIM configuration file) the variable JAVA_LOCATION to your java installation directory.

Node synchronization, NTP installation and configuration

A general requirement for the gLite nodes is that they are synchronized. This requirement may be fulfilled in several ways. If your nodes run under AFS most likely they are already synchronized. Otherwise, you can use the NTP protocol with a time server.

Instructions and examples for a NTP client configuration are provided in this section. If you are not planning to use a time server on your machine you can just skip it and jump to the next section.

Use the latest ntp version available for your system. If you are using APT, an apt-get install ntp will do the work.

  • Configure the file /etc/ntp.conf by adding the lines dealing with your time server configuration such as, for instance:
           restrict <time_server_IP_address> mask 255.255.255.255 nomodify notrap noquery
           server <time_server_name>
       
    Additional time servers can be added for better performance results. For each server, the hostname and IP address are required. Then, for each time-server you are using, add a couple of lines similar to the ones shown above into the file /etc/ntp.conf.

  • Edit the file /etc/ntp/step-tickers adding a list of your time server(s) hostname(s), as in the following example:
          137.138.16.69
          137.138.17.69
       
  • If you are running a kernel firewall, you will have to allow inbound communication on the NTP port. If you are using iptables, you can add the following to /etc/sysconfig/iptables
          -A INPUT -s NTP-serverIP-1 -p udp --dport 123 -j ACCEPT 
          -A INPUT -s NTP-serverIP-2 -p udp --dport 123 -j ACCEPT
       
    Remember that, in the provided examples, rules are parsed in order, so ensure that there are no matching REJECT lines preceding those that you add. You can then reload the firewall
          # /etc/init.d/iptables restart
       
  • Activate the ntpd service with the following commands:
          # ntpdate <your ntp server name>
          # service ntpd start
          # chkconfig ntpd on
       
  • You can check ntpd's status by running the following command
          # ntpq -p
       

The rpm installation tools: apt-get, yum

Please before you proceed further make sure that Java is installed in your system.

The apt package manager

  • Download the latest version of the apt tool (If it is not already installed) 2 The apt rpm for Scientific Linux (this is an example, use the one appropriate to your OS).
                # wget ftp://ftp.scientificlinux.org/linux/scientific/30x/i386/SL/RPMS/apt-XXX.i386.rpm
       
  • Install apt On Scientific Linux:
                # rpm -ivh apt-XXX.i386.rpm
      
  • Configure apt In order to perform the Middleware and CA installation with the methods described in this guide, you just need to configure in the Site Configuration File (sit-info.def) the variable LCG_REPOSITORY, CA_REPOSITORY as follows:
          LCG_REPOSITORY="rpm http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/ rhel30 externals Release3.0 updates"
          CA_REPOSITORY="rpm http://linuxsoft.cern.ch/ LCG-CAs/current production"
       
    Please note that for the dependencies of the middleware to be met, you'll have to make sure that apt can find and download your OS rpms. This typically means you'll have to install an rpm called 'apt-sourceslist', or else create an appropriate file in your /etc/apt/sources.list.d directory.

The yum package manager

TO BE COMPLETED

Important note on automatic updates

Several site use auto update mechanism. Sometimes middleware update, upgrade requires non-trivial configuration changes or simply a reconfiguration of the service. This could involve database schema changes, service restarts, appearence of new configuration files, etc, which makes it difficult to ensure that automatic updates will not break a service. Thus

WE STRONGLY RECOMMEND NOT TO USE AUTOMATIC UPDATE PROCEDURE OF ANY KIND,

which uses the gLite middleware repositories, but do the upgrade manually when an Update has been released !

About platforms and OSes.

Using RHEL3 compatible distributions other than CERN Scientific Linux

If you are not using SLC3 but another OS binary compatible distribution is highly recommended that you configure apt-get in order to give priority, during the installation, to packages listed within your distribution.

In order to have all the known dependencies possibly solved by apt-get you should have at least the following lists in your /etc/apt/sources.list.d/:

  • lcg.list
  • lcg-ca.list
  • your-os.list

The first two are distributed by the 'apt-sourceslist' rpm, the third one is your local one.

Since the deployment team is based at CERN and it uses the local installation, it is still possible that with this bare configuration, some dependencies, though dealt with, cannot be solved because the binary compatible distribution you use does not provide the entire set of packages which CERN SL3 does.

If you prefer not to handle these issues manually you could add in the /etc/apt/sources.list.d/ another list (e.g. cern.list)

### List of available apt repositories available from linuxsoft.cern.ch
### suitable for your system.
###
### See http://cern.ch/linux/updates/ for a list of other repositories and mirrors.
### 09.06.2004
###

# THE default
rpm http://linuxsoft.cern.ch  cern/slc30X/i386/apt  os updates extras
rpm-src http://linuxsoft.cern.ch  cern/slc30X/i386/apt  os updates extras

Then you have to configure your apt-get preferences in order to give priority to your Os and not to CERN SLC3.

A /etc/apt/preferences file like the following one will give priority to your Os in any case except when the package that you need is not present in your-os repository :

Package: *
Pin: release o=your-os.your-domain.org
Pin-Priority: 980

Package: *
Pin: release o=linux.cern.ch
Pin-Priority: 970

If you are not using apt to install, you can pull the packages directly from SLC3's repository using wget. The address is http://linuxsoft.cern.ch/cern/slc305/i386/apt/.

You can use the apt-cache policy command to verify that the preferences are properly configured.

Configuration Tool: YAIM

In order to know what's the latest version of YAIM running in production, you can check the YAIM planning page where each yaim module is listed.

Note on YAIM and gLite nodes

This release of gLite contains components from earlier versions where all configuration was done through XML files. When configuring these components, yaim populates the appropriate XML files and runs their config scripts. Please note that any direct modifications you make to the XML files to parameters not managed by yaim, will be preserved after a reconfig by YAIM. Parameters managed by yaim will be clearly marked in the XML after it has been run. The intention is that yaim offers a simple interface if prefered, but the ability to use the more powerful native mechanism is retained.

Please use yaim to configure pool accounts. Yaim allows non-contiguous ranges of uids which some sites require and is therefore the default user configuration mechanism.

Installing YAIM

The necessary YAIM modules needed to configure a certain node type are automatically installed with the middleware. However, if you want to install YAIM rpms separately, you can run apt-get install glite-yaim-node-type after configuring properly the APT string mentioned in the APT package manager section.

This will automatically install the YAIM module you are interested in together with yaim core, which contains the core functions and utilities used by all the YAIM modules.

For a list of available YAIM modules please check this list.

For a detailed description on how to configure the middleware with YAIM, please check the YAIM guide.

Middleware installation, configuration in general

Consult the Yaim Guide for details on how to install the middleware.

Certification Authorities

The installation of the up-to-date version of the Certification Authorities (CA) is automatically done by the Middleware Installation described in 8. Anyway, as the list and structure of Certification Authorities (CA) accepted by the LCG project can change independently of the middleware releases, the rpm list related to the CAs certificates and URLs has been decoupled from the standard gLite/LCG release procedure. You should consult the page

http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html

in order to ascertain what the version number of the latest set of CA rpms is. In order to upgrade the CA list of your node to the latest version, you can simply run on the node the command:

# apt-get update && apt-get -y install lcg-CA
In order to keep the CA configuration up-to-date on your node we strongly recommend Site Administrators to program a periodic upgrade procedure of the CA on the installed node (e.g. running the above command via a daily cron job).

Host Certificates

All nodes except UI, WN and BDII require the host certificate/key files before you start their installation. Contact your national Certification Authority (CA) to understand how to obtain a host certificate if you do not have one already. Instruction to obtain a CA list can be found in http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html

From the CA list so obtained you should choose a CA close to you.

Once you have obtained a valid certificate, i.e. a file

  • hostcert.pem containing the machine public key and a file
  • hostkey.pem containing the machine private key

make sure to place the two files in the target node into the directory and check the access right hostkey.pem only readable by root and the certificate readable by everybody.

/etc/grid-security

Middleware configuration for this release, node-specific installation advices

In this section we list configuration steps actually needed to complete the configuration of the desired node but not supported by the automatic configuration scripts. If a given node does not appear in that section it means that its configuration is complete

The gLite WMS and LB service

To install the glite WMS + glite LB (recommended deployment scenario)
./yaim -i -s site-info.def -m glite-WMSLB
./yaim -c -s site-info.def -n glite-WMS -n glite-LB
For the installation you have to have the following repository added:
rpm  http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/ rhel30 externals.condor

The gLite CE service

You don't want to install a glite-CE !

The MON and E2EMONIT service

You can add E2EMONIT to your MON box like this

yaim -i -s site-info.def -m glite-MON_e2emonit
yaim -c -s site-info.def -n MON E2EMONIT

The WN service without batch system

To install the glite-WN + Torque client

yaim -i -s site-info.def -m glite-WN 
yaim -c -s site-info.def -n glite-WN

The FTS service

There is still a manual step required in configuring FTS

https://uimon.cern.ch/twiki/bin/view/LCG/FtsServerInstall15

At the present time, the FTS requires a different proxy server to that used by the broker. Please ensure this restriction is respected in the site-info.def file you use to configure the File Transfer Server.

Please see the FTS install guides for more information

https://uimon.cern.ch/twiki/bin/view/LCG/FtsRelease15

https://uimon.cern.ch/twiki/bin/view/LCG/FtsServerInstall15

The Classic SE

You should not use/install Classic SE service any more. Just forget it and have a look to the DPM service.

The Site BDII

The following steps are needed to configure a site BDII:
./yaim -i -s site-info.def -m glite-BDII
apt-get install lcg-info-templates
./yaim -c -s site-info.def -n BDII_site

If you want your site-BDII on the lcg-CE you have to run the configuration on one shoot.

./yaim -c -s site-info.def -n lcg-CE -n BDII_site

The Top-Level BDII

There is no special steps to do. Simply:
./yaim -i -s site-info.def -m glite-BDII
./yaim -c -s site-info.def -n BDII_top

The dCache service

The complete description of dCache installation via YAIM is available here.

The LFC service

  • When installing an LFC_oracle then it has to be the site admin who configures and ensures the correct oracle environment settings. Namely the /home/$LFCUSER/.tnsadmin file should contain to correct settings or be a symlink to your configuration files.

Otherwise just do

./yaim -i -s site-info.def -m glite-LFC_mysql/oracle
./yaim -c -s site-info.def -n glite-LFC_mysql/oracle

The DPM service

There is no special steps to do. Simply:
./yaim -i -s site-info.def -m glite-SE_dpm_mysql
./yaim -c -s site-info.def -n glite-SE_dpm_mysql

The VOBOX service

Site admins must ensure that the experiment software installation area is accessible (i.e. mounted) from the VOBOX. In the VOBOX installation it is crucial to have the $MYPROXY_SERVER env variable (PX_HOST in the yaim site-info.def) set to the CERN myproxy server (myproxy.cern.ch). Even if you have a private myproxy server in your site, configure the VOBOX to point to the CERN one. The site administrator must communicate the name of the VOBOX to the myproxy.cern.ch service administrator (email both hep-project-grid-cern-testbed-managers@cernNOSPAMPLEASE.ch and support-eis@cernNOSPAMPLEASE.ch ) so that it is included in the list of authorized renewers. If this is not done, the renewal agent of the VOBOX will not work.

The relocatable distribution, the TAR UI and TAR WN

Introduction

We are now supplying a tarred distribution of the middleware which can be used to install a UI or a WN. It can be used on Debian as well as SL3. You can untar the distribution somewhere on a local disk, or replicate it across a number of nodes via a network share. You can also use this distribution to install a UI without root privileges - there is a quick guide here to do that.

Once you have the middleware directory available, you must edit the site-info.def file as usual, putting the location of the middleware into the variable INSTALL_ROOT.

If you are sharing the distribution to a number of nodes, commonly WNs, then they should all mount the tree at INSTALL_ROOT. You should configure the middleware on one node (remember you'll need to mount with appropriate privileges) and then it should work for all the others if you set up your batch system and the CA certificates in the usual way. If you'd rather have the CAs on your share, the yaim function install_certs_userland may be of interest. You may want to mount your share ro after the configuration has been done.

Getting the software

You can download the latest gliteUI_WN-3.x.y-z.tar.gz and gliteUI_WN-3.x.y-z-userdeps.tar.gz tar files from

http://grid-deployment.web.cern.ch/grid-deployment/download/relocatable/

Dependencies

The middleware in the relocatable distribution has certain dependencies.

We've made this software available as a second tar file which you can download and untar under $INSTALL_ROOT. This means that if you untarred the main distribution under /opt/LCG, you must untar the supplementary files in the same place. Please note that in earlier distributions the deps were untarred elsewhere.

If you have administrative access to the nodes, you could alternatively use the TAR dependencies rpm.

 /opt/glite/yaim/scripts/install_node site-info.def glite-TAR
For Debian, here is a list of packages which are required for the tarball to work
perl-modules python2.2 libexpat1 libx11-6 libglib2.0-0 libldap2 libstdc++2.10-glibc2.2 tcl8.3-dev 
libxml2 termcap-compat libssl0.9.7 tcsh rpm rsync cpp gawk openssl wget
To configure a UI or WN

Run the configure_node script, adding the type of node as an argument;

 /opt/glite/yaim/bin/yaim -c -s site-info.def -n [ TAR_WN | TAR_UI ]

Note that the script will not configure any LRMS. If you're configuring torque for the first time, you may find the config_users and config_torque_client yaim functions useful. These can be invoked like this

# ${INSTALL_ROOT}/glite/yaim/bin/yaim -r -s site-info.def -f config_users
# ${INSTALL_ROOT}/glite/yaim/bin/yaim -r -s site-info.def -f config_torque_client

Installing a UI as a non-root user

You can find a quick guide to this here.

If you don't have root access, you can use the supplementary tarball mentioned above to ensure that the dependencies of the middleware are satisfied. The middleware requires java (see 3.), which you can install in your home directory if it's not already available. Please make sure you set the JAVA_LOCATION variable in your site-info.def. You'll probably want to alter the OUTPUT_STORAGE variable there too, as it's set to /tmp/jobOutput by default and it may be better pointing at your home directory somewhere.

Once the software is all unpacked, you should run

# $INSTALL_ROOT/glite/yaim/bin/yaim -c -s site-info.def -n TAR_UI
to configure it.

Finally, you'll have to set up some way of sourcing the environment necessary to run the grid software. A script will be available under $INSTALL_ROOT/etc/profile.d for this purpose. Source grid_env.sh or grid_env.csh depending upon your choice of shell.

Installing a UI this way puts all the CA certificates under $INSTALL_ROOT/etc/grid-security and adds a user cron job to download the crls. However, please note that you'll need to keep the CA certificates up to date yourself. You can do this by running

# /opt/glite/yaim/bin/yaim -r -s site-info.def -f install_certs_userland

Batch system specific issues

Important Note on batch systems

Important: Note, that the support and documentation of the different batch systems are very different. This is done in a kind of 'best effort' basis. If you feel that anything is missing from this guide then feel free to send the appropriate info, to the maintainer of this guide.

The Torque/PBS batch system

To find some info about Torque itself see: the Torque home page. An instance of the Torque is included into the release and installable from the gLite repository.

The gLite CE for Torque batch system

In the site-info.def now you have to use:

JOB_MANAGER=pbs
intstead of the old lcgpbs value. If you set it to torque the CE won't work properly.

Running the Torque-server on the glite-CE

yaim -i -s site-info.def -m glite-CE -m glite-torque-server-config
yaim -c -s site-info.def -n gliteCE -n TORQUE_server

TORQUE_server is a configuration target provided to help configure Torque with the gliteCE or on a separate machine. There is no directly associated meta-rpm, but please use glite-torque-server-config to combine with the gliteCE (as illustrated above).

Running a separate Torque server

  • Run the following on the torque server
          /opt/glite/bin/BLParserPBS -p 33332 -s /var/spool/pbs
       
    BLParserPBS is from the glite-ce-blahp package.
  • Insert gLite CE hostname into /etc/hosts.equiv on torque server
  • On WNs put gLite CE hostname into /opt/edg/etc/edg-pbs-knownhosts.conf and run /opt/edg/sbin/edg-pbs-knownhosts (it will run as cron later anyway)

Note that the log-parser daemon must be started on whichever node is running the batch system. If your CE node is also the batch system head node, you have to run the log-parser here.

If you are running two CEs (typically LCG and gLite versions) please take care to ensure no collisions of pool account mapping. This is typically achieved either by allocating separate pool account ranges to each CE or by allowing them to share a gridmapdir.

The LCG CE for Torque batch system

Running Torque-server on lcg-CE

yaim -i -s site-info.def -m lcg-CE_torque 
yaim -c -s site-info.def -n lcg-CE_torque
In the CE configuration context (and also in the 'torque' LRMS one), a file with a a list of managed nodes needs to be compiled. An example of this configuration file is given in /opt/glite/yaim/examples/wn-list.conf Then the file path needs to be pointed by the variable WN_LIST in the Site Configuration File.

The Maui scheduler configuration provided with the script is currently very basic.

The WN for Torque batch system

WN with batch client configuration

yaim -i -s site-info.def -m glite-WN -m glite-torque-client-config
yaim -c -s site-info.def -n WN_torque

The LSF batch system

You have to make sure that the necessary packages for submitting jobs to your LSF batch system are installed on your CE. By default, the packages come as tar balls. At CERN they are converted into rpms so that they can be automatically rolled out and installed in a clean way (in this case using Quattor).

Since LSF is a commercial software it is not distributed together with the gLite middleware. Visit the Platform's LSF home page for further information. You'll also need to buy an appropriate number of license keys before you can use the product.

The documentation for LSF available on Platform Manuals web page. You have to register in order to be able to access it.

For questions related to LSF and LCG/gLite interaction, you can use the project-eu-egee-batchsystem-lsf@cernNOSPAMPLEASE.ch mailing list.

The CEs for LSF batch system

There is some special configuration settings you need to apply when configuring your LSF batch system for the Grid. The most important parameters to set in YAIM's site-info.def file. (Only example.)

JOB_MANAGER="lcglsf"
TORQUE_SERVER="machine where the gLite LSF log file parser runs"
BATCH_LOG_DIR="/path/to/where/the/lsf/accounting/and/event/files/are"
BATCH_BIN_DIR="/path/to/where/the/lsf/executables/are"
BATCH_VERSION="LSF_6.1"  
CE_BATCH_SYS="lsf"
For gLite installations you may use the gLite LSF log file parser daemon to access LSF accounting data over the network. The daemon needs access to the LSF event log files which you can find on the master or some common file system which you may use for fail over. By default, yaim assumes that the daemon runs on the CE in which case you have to make sure that the event log files are readable from the CE. The above setting for TORQUE server is only needed if you run the log file parser daemon on a different node than the CE. Note that it is not a good idea to run the LSF master service on the CE.

Make sure that you are using lcg-info-dynamic-lsf-2.0.36 or newer.

To configure your CE, use the

./yaim -i -s site-info.def -m glite-CE
./yaim -c -s site-info.def -n glite-CE

commands.

The WNs for LSF batch system

Apart from the LSF specific configurations settings there is nothing special to do on the worker nodes. Just use the plain WN configuration target.
./yaim -i -s site-info.def -m glite-WN
./yaim -c -s site-info.def -n glite-WN

Note on site-BDII for LSF batch system

When you configure your site-BDII you have to populate the [vomap] section of the /opt/lcg/etc/lcg-info-dynamic-scheduler.conf file yourself. This is because LSF's internal group mapping is hard to automaticaly figure out from yaim, and to be on the safe side the site admin has to crosscheck. Yaim configures the lcg-info-dynamic-scheduler in order to use the LSF info provider plugin which comes with meaningful default values. If you would like / need to change it edit the /opt/glite/etc/lcg-info-dynamic-lsf.conf file. After YAIM's configuration you have to list the LSF group - VOMS FQAN - mappings in the [vomap] section of the /opt/lcg/etc/lcg-info-dynamic-scheduler.conf file. As an example you see here an extract from CERN's config file:
.
.
.
vomap :
   grid_ATLAS:atlas
   grid_ATLASSGM:/atlas/Role=lcgadmin
   grid_ATLASPRD:/atlas/Role=production
   grid_ALICE:alice
   grid_ALICESGM:/alice/Role=lcgadmin
   grid_ALICEPRD:/alice/Role=production
   grid_CMS:cms
   grid_CMSSGM:/cms/Role=lcgadmin
   grid_CMSPRD:/cms/Role=production
   grid_LHCB:lhcb
   grid_LHCBSGM:/lhcb/Role=lcgadmin
   grid_LHCBPRD:/lhcb/Role=production
   grid_GEAR:gear
   grid_GEARSGM:/gear/Role=lcgadmin
   grid_GEANT4:geant4
   grid_GEANT4SGM:/geant4/Role=lcgadmin
   grid_UNOSAT:unosat
   grid_UNOSAT:/unosat/Role=lcgadmin
   grid_SIXT:sixt
   grid_SIXTSGM:/sixt/Role=lcgadmin
   grid_EELA:eela
   grid_EELASGM:/eela/Role=lcgadmin
   grid_DTEAM:dteam
   grid_DTEAMSGM:/dteam/Role=lcgadmin
   grid_DTEAMPRD:/dteam/Role=production
   grid_OPS:ops
   grid_OPSSGM:/ops/Role=lcgadmin
module_search_path : ../lrms:../ett
For further details see the /opt/glite/share/doc/lcg-info-dynamic-lsf file.

The Condor batch system

To get the condor middleware go to the Condor home page. You have to ensure yourself that the necessary condor packages are installed on the CEs and on the WNs. On the site-BDII YAIM configures the lcg-info-dynamic-scheduler to use the condor infor provider plugin.

You can use the project-eu-egee-batchsystem-condor@cernNOSPAMPLEASE.ch mailing list if you have problems concerning gLite and Condor interaction and not only Condor.

IMPORTANT Please be careful setting up and configuring your local Condor batch system. Read carefuly the following advices : http://www.cs.wisc.edu/condor/osg_security_recommendations.html

The gLite CE for Condor batch system

https://twiki.cern.ch/twiki/bin/view/EGEE/InstallationInstructionsForCondorOnTheGLite-CE

The LCG CE for Condor batch system (paragraph under construction)

https://twiki.cern.ch/twiki/bin/view/EGEE/InstallationInstructionsForCondorOnTheLcg-CE

The SGE batch system

The integration of SGE in gLite is still work in progress. The sites using SGE have specified their local configurations in here and are now working together to provide a common way to deploy and install SGE using standard EGEE tools. This part of the guide will contain the common steps to be performed during the installation of an SGE site. For questions related to SGE and LCG/gLite interaction, you can use the project-eu-egee-batchsystem-sge@cernNOSPAMPLEASE.ch mailing list.

The gLite CE for SGE batch system

SGE support for the gLite CE is still under development. We expect to fill this gap soon...

The LCG CE for SGE batch system

WARNING: The software distributed here is still considered as beta. You use it at your own risk. It may be not fully optimized or correct and therefore, should be considered as experimental. There is no guarantee that it is compatible with the way in which your site is configured.

We will assume that the standard lcg-CE meta-package is already installed (but not configured) in the proper machines. The installation should have been performed using the instructions proposed in the previous sections of this manual. You should start to follow the following instructions right before you reach the Middleware Configuration section.

SGE instalation and configuration

  • Install the following SGE rpms (require openmotif >= 2.2.3-5 which can be installed from the SLC3 repository):
sge-V60u7_1-3.i386.rpm
sge-utils-V60u7_1-3.i386.rpm
sge-daemons-V60u7_1-3.i386.rpm
sge-qmon-V60u7_1-3.i386.rpm
sge-ckpt-V60u7_1-3.i386.rpm
sge-parallel-V60u7_1-3.i386.rpm
sge-docs-V60u7_1-3.i386.rpm
  • Install lcgCE-yaimtosge-0.0.0-2.i386.rpm which includes the modifications to the standard yaim tool allowing the SGE scheduler configuration. This rpm will require perl-XML-Simple >= 2.14-2.2 package which you can download from here. It also requires glite-yaim >= 3.0.0-34.
  • Add the following values to your site-info.def file:
SGE_QMASTER=$CE_HOST
DEFAULT_DOMAIN=$MY_DOMAIN
ADMIN_MAIL=<your_admin_email>
  • Configure the CE running SGE using the CE_sge node definiton
[root@<your_ce> ~]#/opt/glite/yaim/scripts/configure_node <path_to_your_site-info.def_file> CE_sge

Notes
  • The SGE rpms will install a Qmaster service, which for now, we assume it will be deployed in the CE. This SGE package set was built under SLC4 with the additional packaging of the libdb-4.2.so library in order for it to work in SLC3.
  • Check that the "WN_LIST", "USERS_CONF", "VOS" and "QUEUES" variables are properly defined in your site-info.def file. The content of these variables will be used to build the SGE exec node list, the SGE user sets and the SGE local queues. For the time being, VO users in the USERS_CONF file have to be defined following the same order as the QUEUES definition. Otherwise, the VO SGE userset will not correspond to the correct VO QUEUE. This will be fixed in the future...
  • The CE configuration must be always run before the WN configurations, otherwise the SGE daemons in the WNs will not be started since there is no Qmaster host associated to them.
  • SGE prompt commands will be accessible after a new login (to source the /etc/profile.d/ scripts).
  • To start SGE GUI, using the "qmon" comand (the SGE GUI), you need to install xorg-x11-xauth >= 6.8.2-1. Unfortunately, this package is not available in the SLC3 repository and you have to download it from here in the SLC4 repository
  • If you have configured your CE with wrong values for the "WN_LIST", "USERS_CONF", "VOS" and "QUEUES" variables, an easy way to solve the question is to delete the /usr/local/sge/pro/default directory and run the CE configuration again.

RPMS Description:
  • lcgCE-yaimtosge-0.0.0-2.i386.rpm: Modification to standard glite yaim tool for lcg-CE integration using SGE as scheduler system. It will install:
/etc/profile.d/sge.sh (csh): To set the proper environment;
/opt/glite/yaim/scripts/configure_sgeserver.pm: SGE installation directories;
/opt/glite/yaim/scripts/nodesge-info.def: SGE nodes functions definition;
/opt/glite/yaim/functions/config_sge_server: Configures SGE QMASTER
/opt/globus/lib/perl/Globus/GRAM/JobManager/lcgsge.pm: The SGE jobmanager;
/opt/lcg/libexec/lcg-info-dynamic-sge: The SGE CE GRIS/GIIS perl script.
  • sge-V60u7_1-3.i386.rpm: Contains the binaries and libraries needed to run sge commands;
  • sge-utils-V60u7_1-3.i386.rpm: Instalation scripts and SGE utilities;
  • sge-daemons-V60u7_1-3.i386.rpm: The SGE daemons;
  • sge-ckpt-V60u7_1-3.i386.rpm: For checkpointing purposes;
  • sge-parallel-V60u7_1-3.i386.rpm: For running parallel environments, as OpenMpi, Mpich, etc;
  • sge-docs-V60u7_1-3.i386.rpm: Documentation, manuals and examples;
  • sge-qmon-V60u7_1-3.i386.rpm: The SGE GUI interface;

RPMS Download:

The WNs for SGE batch system

  • Please install the following sge packages:
sge-V60u7_1-3.i386.rpm 
sge-utils-V60u7_1-3.i386.rpm 
sge-daemons-V60u7_1-3.i386.rpm 
sge-parallel-V60u7_1-3.i386.rpm 
sge-docs-V60u7_1-3.i386.rpm
  • Install gliteWN-yaimtosge-0.0.0-2.i386.rpm which includes the modifications to the standard yaim tool allowing the SGE client configuration.
  • Use the same site-info.def file as in the CE Gatekeeper case. This file should already include definitions for “SGE_QMASTER”, “DEFAULT_DOMAIN”, “ADMIN_MAIL” variables
  • Configure the WN using the “WN_sge” node definiton.
[root@<your_wn> ~]# /opt/glite/yaim/scripts/configure_node  <path_to_your_site-info.def_file> WN_sge

RPMS Description:
  • gliteWN-yaimtosge-0.0.0-2.i386.rpm: Modification to standard glite yaim tool for glite-WN integration using SGE as scheduler system. It will install:
/etc/profile.d/sge.sh (csh): To set the proper environment;
/opt/glite/yaim/scripts/configure_sgeclient.pm: SGE installation directories;
/opt/glite/yaim/scripts/nodesge-info.def: SGE nodes functions definition;
/opt/glite/yaim/functions/config_sge_client: Configures SGE exec host;
  • sge-V60u7_1-3.i386.rpm: Contains the binaries and libraries needed to run sge commands;
  • sge-utils-V60u7_1-3.i386.rpm: Instalation scripts and SGE utilities;
  • sge-daemons-V60u7_1-3.i386.rpm: The SGE daemons;
  • sge-parallel-V60u7_1-3.i386.rpm: For running parallel environments, as OpenMpi, Mpich, etc;
  • sge-docs-V60u7_1-3.i386.rpm: Documentation, manuals and examples;

RPMS Download:

How to do an upgrade

There is no general description of an upgrade procedure. Usually it contains a repository update and upgrade then a reconfiguration. When reconfiguration of any kind is necessary will always be explicitly mentioned in the description provided by the patch/update.
apt-get update
apt-get dist-upgrade
/opt/glite/yaim/bin/yaim -c -s site-info.def -n (nodetypetoupgrade)

Firewalls

No automatic firewall configuration is provided by this version of the configuration scripts. If your nodes are behind a firewall, you will have to ask your network manager to open a few "holes" to allow external access to some service nodes. A complete map of which port has to be accessible for each service node is maintined in CVS; http://jra1mw.cvs.cern.ch:8180/cgi-bin/jra1mw.cgi/org.glite.site-info.ports/doc/?only_with_tag=HEAD , or you can have a look to it's html version.

Docs, support

For further documentation you can visit:

Your contact point for support is your ROC. http://egee-sa1.web.cern.ch/egee-sa1/roc.html

Gergely Debreczeni Gergely.Debreczeni@cernNOSPAMPLEASE.ch

Edit | Attach | Watch | Print version | History: r42 < r41 < r40 < r39 < r38 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r42 - 2008-07-23 - MariaALANDESPRADILLO
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback