Generic Installation and Configuration Guide for gLite 3.0
Important note: gLite 3.1 should be now used. Please, update your nodes and check the 3.1 Generic Installation and Configuration guide. This guide is no longer maintained and may be out of date
This document is addressed to Site Administrators in charge of middleware
installation and configuration. It is a generic guide to manual installation and configuration for
any supported node types. Links to the latest configuration tools (like
YAIM) and to their release independent
descriptions are provided in-line, where necessary.
Introduction to Manual Installation and Configuration
This document is addressed to Site Administrators in charge of middleware
installation and configuration. It is a generic guide to manual installation and configuration for
any supported node types. It provides a fast method to install and configure the gLite middleware on the
various node types (WN, UI, CE, SE ...) on the top of the following Linux distributions:
- Scientific Linux 3.0
- Scientific Linux 4.0 (only for UI and WN)
- Debian (only for the so called TAR_UIWN)
The proposed installation and configuration method for SL3 is based on the Debian
apt-get
tool.
And on a set of shell scripts built within the YAIM framework. For description on YAIM see the web page for the proposed version be used:
YAIM guide
The provided scripts can be used by Site Administrators with no need for in-depth knowledge of specific middleware configuration details.
three configuration files, according to provided examples.
The resulting configuration is a default site configuration. Local customizations and tuning of the middleware, if needed, can then be done manually.
New versions of this document will be distributed synchronously with the middleware releases and they will contain the current "state-of-art" of the installation and configuration procedures.
A dual document with the upgrade procedures to manually update the configuration of the nodes from the previous LCG/gLite version to the current one is also part of the release.
The OS Installation
The current version of the gLite Middleware runs on Scientific Linux 3 (SL3).
We give here a link to the web page with all the needed information is the following:
http://www.scientificlinux.org
The site where the sources, and the images (iso) to create the CDs can be found is
ftp://ftp.scientificlinux.org/linux/scientific/30x/iso/
Most middleware testing has been carried out on CERN Scientific Linux 3 (SLC3)
http://linuxsoft.cern.ch/
but should run on any binary compatible distribution.
Java Installation
You should install java sdk (1.4) on your system before installing the middleware. Download it from SUN java web site (1.4.2 or greater is required -
http://java.sun.com/j2se/1.4.2/download.html ). You should absolutely install the
J2SDK as an rpm package (if you do not install it in RPM format you'll not be able to install the middleware), on the sun java web page follow the link RPM in a self extracting file. Then follow instructions provided by SUN.
Set in your
site-info.def
(YAIM configuration file) the variable JAVA_LOCATION to your java installation directory.
Node synchronization, NTP installation and configuration
A general requirement for the gLite nodes is that they are synchronized. This requirement may be fulfilled in several ways. If your nodes run under AFS most likely they are already synchronized. Otherwise, you can use the NTP protocol with a time server.
Instructions and examples for a NTP client configuration are provided in this section. If you are not planning to use a time server on your machine you can just
skip it and jump to the
next section.
Use the latest ntp version available for your system. If you are using APT, an apt-get install ntp will do the work.
- Edit the file /etc/ntp/step-tickers adding a list of your time server(s) hostname(s), as in the following example:
137.138.16.69
137.138.17.69
- If you are running a kernel firewall, you will have to allow inbound communication on the NTP port. If you are using iptables, you can add the following to /etc/sysconfig/iptables
-A INPUT -s NTP-serverIP-1 -p udp --dport 123 -j ACCEPT
-A INPUT -s NTP-serverIP-2 -p udp --dport 123 -j ACCEPT
Remember that, in the provided examples, rules are parsed in order, so ensure that there are no matching REJECT lines preceding those that you add. You can then reload the firewall
# /etc/init.d/iptables restart
- Activate the ntpd service with the following commands:
# ntpdate <your ntp server name>
# service ntpd start
# chkconfig ntpd on
- You can check ntpd's status by running the following command
# ntpq -p
The rpm installation tools: apt-get, yum
Please before you proceed further
make sure that Java is installed in your system.
The apt package manager
- Download the latest version of the apt tool (If it is not already installed) 2 The
apt
rpm for Scientific Linux (this is an example, use the one appropriate to your OS).
# wget ftp://ftp.scientificlinux.org/linux/scientific/30x/i386/SL/RPMS/apt-XXX.i386.rpm
- Install apt On Scientific Linux:
# rpm -ivh apt-XXX.i386.rpm
- Configure apt In order to perform the Middleware and CA installation with the methods described in this guide, you just need to configure in the Site Configuration File (
sit-info.def
) the variable LCG_REPOSITORY
, CA_REPOSITORY
as follows:
LCG_REPOSITORY="rpm http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/ rhel30 externals Release3.0 updates"
CA_REPOSITORY="rpm http://linuxsoft.cern.ch/ LCG-CAs/current production"
Please note that for the dependencies of the middleware to be met, you'll have to make sure that apt can find and download your OS rpms. This typically means you'll have to install an rpm called 'apt-sourceslist', or else create an appropriate file in your /etc/apt/sources.list.d directory.
The yum package manager
TO BE COMPLETED
Important note on automatic updates
Several site use auto update mechanism. Sometimes middleware update, upgrade requires non-trivial configuration changes or simply a reconfiguration of the service.
This could involve database schema changes, service restarts, appearence of new configuration files, etc, which makes it difficult to ensure that automatic updates
will not break a service. Thus
WE STRONGLY RECOMMEND NOT TO USE AUTOMATIC UPDATE PROCEDURE OF ANY KIND,
which uses the gLite middleware repositories, but do the upgrade manually when an Update has been released !
About platforms and OSes.
Using RHEL3 compatible distributions other than CERN Scientific Linux
If you are not using SLC3 but another OS binary compatible distribution is highly recommended that you configure apt-get in order to give priority, during the installation, to packages listed within your distribution.
In order to have all the known dependencies possibly solved by apt-get you should have at least the following lists in your
/etc/apt/sources.list.d/
:
- lcg.list
- lcg-ca.list
- your-os.list
The first two are distributed by the 'apt-sourceslist' rpm, the third one is your local one.
Since the deployment team is based at CERN and it uses the local installation, it is still possible that with this bare configuration, some dependencies, though dealt with, cannot be solved because the binary compatible distribution you use does not provide the entire set of packages which CERN SL3 does.
If you prefer not to handle these issues manually you could add in the /etc/apt/sources.list.d/ another list (e.g. cern.list)
### List of available apt repositories available from linuxsoft.cern.ch
### suitable for your system.
###
### See http://cern.ch/linux/updates/ for a list of other repositories and mirrors.
### 09.06.2004
###
# THE default
rpm http://linuxsoft.cern.ch cern/slc30X/i386/apt os updates extras
rpm-src http://linuxsoft.cern.ch cern/slc30X/i386/apt os updates extras
Then you have to configure your apt-get preferences in order to give priority to your Os and not to CERN SLC3.
A
/etc/apt/preferences
file like the following one will give priority to your Os in any case except when the package that you need is not present in your-os repository :
Package: *
Pin: release o=your-os.your-domain.org
Pin-Priority: 980
Package: *
Pin: release o=linux.cern.ch
Pin-Priority: 970
If you are not using apt to install, you can pull the packages directly from SLC3's repository using wget. The address is
http://linuxsoft.cern.ch/cern/slc305/i386/apt/
.
You can use the
apt-cache policy
command to verify that the preferences are properly configured.
Configuration Tool: YAIM
In order to know what's the latest version of YAIM running in production, you can check the
YAIM planning page where each yaim module is listed.
Note on YAIM and gLite nodes
This release of gLite contains components from earlier versions where all configuration was done through XML files. When configuring these components, yaim populates the appropriate XML files and runs their config scripts. Please note that any direct modifications you make to the XML files to parameters not managed by yaim, will be preserved after a reconfig by YAIM. Parameters managed by yaim will be clearly marked in the XML after it has been run. The intention is that yaim offers a simple interface if prefered, but the ability to use the more powerful native mechanism is retained.
Please use yaim to configure pool accounts. Yaim allows non-contiguous ranges of uids which some sites require and is therefore the default user configuration mechanism.
Installing YAIM
The necessary YAIM modules needed to configure a certain node type are automatically installed with the middleware. However, if you want to install YAIM rpms separately, you can run
apt-get install glite-yaim-node-type
after configuring properly the APT string mentioned in the
APT package manager section.
This will automatically install the YAIM module you are interested in together with yaim core, which contains the core functions and utilities used by all the YAIM modules.
For a list of available YAIM modules please check
this list.
For a detailed description on how to configure the middleware with YAIM, please check the
YAIM guide.
Middleware installation, configuration in general
Consult the
Yaim Guide for details on how to install the middleware.
Certification Authorities
The installation of the up-to-date version of the Certification Authorities (CA) is automatically done by the Middleware Installation described in 8.
Anyway, as the list and structure of Certification Authorities (CA) accepted by the LCG project can change independently of the middleware releases, the rpm list related to the CAs certificates and URLs has been decoupled from the standard gLite/LCG release procedure. You should consult the page
http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html
in order to ascertain what the version number of the latest set of CA rpms is. In order to upgrade the CA list of your node to the latest version, you can simply run on the node the command:
# apt-get update && apt-get -y install lcg-CA
In order to keep the CA configuration up-to-date on your node we strongly recommend Site Administrators to program a periodic upgrade procedure of the CA on the installed node (e.g. running the above command via a daily cron job).
Host Certificates
All nodes except UI, WN and BDII require the host certificate/key files before you start their installation.
Contact your national Certification Authority (CA) to understand how to obtain a host certificate if you do not have one already.
Instruction to obtain a CA list can be found in
http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html
From the CA list so obtained you should choose a CA close to you.
Once you have obtained a valid certificate, i.e. a file
- hostcert.pem containing the machine public key and a file
- hostkey.pem containing the machine private key
make sure to place the two files in the target node into the directory and check the access right hostkey.pem only readable by root and the certificate readable by everybody.
/etc/grid-security
Middleware configuration for this release, node-specific installation advices
In this section we list configuration steps actually needed to complete the configuration of the desired node but not supported by the automatic configuration scripts.
If a given node does not appear in that section it means that its configuration is complete
The gLite WMS and LB service
To install the glite WMS + glite LB (recommended deployment scenario)
./yaim -i -s site-info.def -m glite-WMSLB
./yaim -c -s site-info.def -n glite-WMS -n glite-LB
For the installation you have to have the following repository added:
rpm http://glitesoft.cern.ch/EGEE/gLite/APT/R3.0/ rhel30 externals.condor
The gLite CE service
You don't want to install a glite-CE !
The MON and E2EMONIT service
You can add
E2EMONIT to your MON box like this
yaim -i -s site-info.def -m glite-MON_e2emonit
yaim -c -s site-info.def -n MON E2EMONIT
The WN service without batch system
To install the glite-WN + Torque client
yaim -i -s site-info.def -m glite-WN
yaim -c -s site-info.def -n glite-WN
The FTS service
There is still a manual step required in configuring FTS
https://uimon.cern.ch/twiki/bin/view/LCG/FtsServerInstall15
At the present time, the FTS requires a different proxy server to that used by the broker. Please ensure this restriction is respected in the site-info.def file you use to configure the File Transfer Server.
Please see the FTS install guides for more information
https://uimon.cern.ch/twiki/bin/view/LCG/FtsRelease15
https://uimon.cern.ch/twiki/bin/view/LCG/FtsServerInstall15
The Classic SE
You should not use/install Classic SE service any more. Just forget it and have a look to the DPM service.
The Site BDII
The following steps are needed to configure a site BDII:
./yaim -i -s site-info.def -m glite-BDII
apt-get install lcg-info-templates
./yaim -c -s site-info.def -n BDII_site
If you want your site-BDII on the lcg-CE you have to run the configuration on one shoot.
./yaim -c -s site-info.def -n lcg-CE -n BDII_site
The Top-Level BDII
There is no special steps to do. Simply:
./yaim -i -s site-info.def -m glite-BDII
./yaim -c -s site-info.def -n BDII_top
The dCache service
The complete description of dCache installation via YAIM is available
here.
The LFC service
- When installing an LFC_oracle then it has to be the site admin who configures and ensures the correct oracle environment settings. Namely the
/home/$LFCUSER/.tnsadmin
file should contain to correct settings or be a symlink to your configuration files.
Otherwise just do
./yaim -i -s site-info.def -m glite-LFC_mysql/oracle
./yaim -c -s site-info.def -n glite-LFC_mysql/oracle
The DPM service
There is no special steps to do. Simply:
./yaim -i -s site-info.def -m glite-SE_dpm_mysql
./yaim -c -s site-info.def -n glite-SE_dpm_mysql
The VOBOX service
Site admins must ensure that the experiment software installation area is accessible (i.e. mounted) from the VOBOX.
In the VOBOX installation it is crucial to have the
$MYPROXY_SERVER
env variable (
PX_HOST
in the yaim site-info.def) set to the CERN myproxy server (myproxy.cern.ch). Even if you have a private myproxy server in your site, configure the VOBOX to point to the CERN one.
The site administrator must communicate the name of the VOBOX to the myproxy.cern.ch service administrator (email both
hep-project-grid-cern-testbed-managers@cernNOSPAMPLEASE.ch and
support-eis@cernNOSPAMPLEASE.ch ) so that it is included in the list of authorized renewers. If this is not done, the renewal agent of the VOBOX will not work.
The relocatable distribution, the TAR UI and TAR WN
Introduction
We are now supplying a tarred distribution of the middleware which can be used to install a UI or a WN. It can be used on Debian as well as SL3. You can untar the distribution somewhere on a local disk, or replicate it across a number of nodes via a network share. You can also use this distribution to install a UI without root privileges - there is a quick guide
here to do that.
Once you have the middleware directory available, you must edit the site-info.def file as usual, putting the location of the middleware into the variable INSTALL_ROOT.
If you are sharing the distribution to a number of nodes, commonly WNs, then they should all mount the tree at INSTALL_ROOT. You should configure the middleware on one node (remember you'll need to mount with appropriate privileges) and then it should work for all the others if you set up your batch system and the CA certificates in the usual way. If you'd rather have the CAs on your share, the yaim function install_certs_userland may be of interest. You may want to mount your share ro after the configuration has been done.
Getting the software
You can download the latest gliteUI_WN-3.x.y-z.tar.gz and gliteUI_WN-3.x.y-z-userdeps.tar.gz tar files from
http://grid-deployment.web.cern.ch/grid-deployment/download/relocatable/
Dependencies
The middleware in the relocatable distribution has certain dependencies.
We've made this software available as a second tar file which you can download and untar under $INSTALL_ROOT. This means that if you untarred the main distribution under /opt/LCG, you must untar the supplementary files in the same place. Please note that in earlier distributions the deps were untarred elsewhere.
If you have administrative access to the nodes, you could alternatively use the TAR dependencies rpm.
/opt/glite/yaim/scripts/install_node site-info.def glite-TAR
For Debian, here is a list of packages which are required for the tarball to work
perl-modules python2.2 libexpat1 libx11-6 libglib2.0-0 libldap2 libstdc++2.10-glibc2.2 tcl8.3-dev
libxml2 termcap-compat libssl0.9.7 tcsh rpm rsync cpp gawk openssl wget
To configure a UI or WN
Run the configure_node script, adding the type of node as an argument;
/opt/glite/yaim/bin/yaim -c -s site-info.def -n [ TAR_WN | TAR_UI ]
Note that the script will not configure any LRMS. If you're configuring torque for the first time, you may find the config_users and config_torque_client yaim functions useful. These can be invoked like this
# ${INSTALL_ROOT}/glite/yaim/bin/yaim -r -s site-info.def -f config_users
# ${INSTALL_ROOT}/glite/yaim/bin/yaim -r -s site-info.def -f config_torque_client
Installing a UI as a non-root user
You can find a quick guide to this
here.
If you don't have root access, you can use the supplementary tarball mentioned above to ensure that the dependencies of the middleware are satisfied. The middleware requires java (see 3.), which you can install in your home directory if it's not already available. Please make sure you set the JAVA_LOCATION variable in your site-info.def. You'll probably want to alter the OUTPUT_STORAGE variable there too, as it's set to /tmp/jobOutput by default and it may be better pointing at your home directory somewhere.
Once the software is all unpacked, you should run
# $INSTALL_ROOT/glite/yaim/bin/yaim -c -s site-info.def -n TAR_UI
to configure it.
Finally, you'll have to set up some way of sourcing the environment necessary to run the grid software. A script will be available under $INSTALL_ROOT/etc/profile.d for this purpose. Source grid_env.sh or grid_env.csh depending upon your choice of shell.
Installing a UI this way puts all the CA certificates under $INSTALL_ROOT/etc/grid-security and adds a user cron job to download the crls. However, please note that you'll need to keep the CA certificates up to date yourself. You can do this by running
# /opt/glite/yaim/bin/yaim -r -s site-info.def -f install_certs_userland
Batch system specific issues
Important Note on batch systems
Important: Note, that the support and documentation of the different batch systems are very different. This is done in a kind of 'best effort' basis. If you feel
that anything is missing from this guide then feel free to send the appropriate info, to the maintainer of this guide.
The Torque/PBS batch system
To find some info about Torque itself see:
the Torque home page.
An instance of the Torque is included into the release and installable from the gLite repository.
The gLite CE for Torque batch system
In the
site-info.def
now you have to use:
JOB_MANAGER=pbs
intstead of the old
lcgpbs
value. If you set it to
torque
the CE won't work properly.
Running the Torque-server on the glite-CE
yaim -i -s site-info.def -m glite-CE -m glite-torque-server-config
yaim -c -s site-info.def -n gliteCE -n TORQUE_server
TORQUE_server is a configuration target provided to help configure Torque with the gliteCE or on a separate machine. There is no directly associated meta-rpm, but please use
glite-torque-server-config
to combine with the gliteCE (as illustrated above).
Running a separate Torque server
Note that the log-parser daemon must be started on whichever node is running the batch system. If your CE node is also the batch system head node, you have to run the log-parser here.
If you are running two CEs (typically LCG and gLite versions) please take care to ensure no collisions of pool account mapping. This is typically achieved either by allocating separate pool account ranges to each CE or by allowing them to share a gridmapdir.
The LCG CE for Torque batch system
Running Torque-server on lcg-CE
yaim -i -s site-info.def -m lcg-CE_torque
yaim -c -s site-info.def -n lcg-CE_torque
In the CE configuration context (and also in the 'torque' LRMS one), a file with a a list of managed nodes needs to be compiled. An example of this configuration file is given in
/opt/glite/yaim/examples/wn-list.conf
Then the file path needs to be pointed by the variable
WN_LIST
in the Site Configuration File.
The Maui scheduler configuration provided with the script is currently very basic.
The WN for Torque batch system
WN with batch client configuration
yaim -i -s site-info.def -m glite-WN -m glite-torque-client-config
yaim -c -s site-info.def -n WN_torque
The LSF batch system
You have to make sure that the necessary packages for submitting jobs to your
LSF batch system are installed on your CE. By default, the packages come as tar balls. At
CERN they are converted into rpms so that they can be automatically rolled out and installed in a clean way (in this case using Quattor).
Since
LSF is a commercial software it is not distributed together with the gLite middleware. Visit the
Platform's LSF home page for further information. You'll also need to buy an appropriate number of license keys before you can use the product.
The documentation for
LSF available on
Platform Manuals web page. You have to register in order to be able to access it.
For questions related to
LSF and LCG/gLite interaction, you can use the
project-eu-egee-batchsystem-lsf@cernNOSPAMPLEASE.ch mailing list.
The CEs for LSF batch system
There is some special configuration settings you need to apply when configuring your
LSF batch system for the Grid. The most important parameters to
set in YAIM's
site-info.def
file. (Only example.)
JOB_MANAGER="lcglsf"
TORQUE_SERVER="machine where the gLite LSF log file parser runs"
BATCH_LOG_DIR="/path/to/where/the/lsf/accounting/and/event/files/are"
BATCH_BIN_DIR="/path/to/where/the/lsf/executables/are"
BATCH_VERSION="LSF_6.1"
CE_BATCH_SYS="lsf"
For gLite installations you may use the gLite
LSF log file parser daemon to access
LSF accounting data over the network. The daemon needs access to the
LSF event log files which you can find on the master or some common file system which you may use for fail over. By default, yaim assumes that the daemon runs on the CE in which case you have to make sure that the event log files are readable from the CE. The above setting for TORQUE server is only needed if you run the log file parser daemon on a different node than the CE. Note that it is not a good idea to run the
LSF master service on the CE.
Make sure that you are using lcg-info-dynamic-lsf-2.0.36 or newer.
To configure your CE, use the
./yaim -i -s site-info.def -m glite-CE
./yaim -c -s site-info.def -n glite-CE
commands.
The WNs for LSF batch system
Apart from the
LSF specific configurations settings there is nothing special to do on the worker nodes. Just use the plain WN configuration target.
./yaim -i -s site-info.def -m glite-WN
./yaim -c -s site-info.def -n glite-WN
Note on site-BDII for LSF batch system
When you configure your site-BDII you have to populate the [vomap] section of the
/opt/lcg/etc/lcg-info-dynamic-scheduler.conf
file
yourself. This is because
LSF's internal group mapping is hard to automaticaly figure out from yaim, and to be on the safe side the site admin
has to crosscheck. Yaim configures the lcg-info-dynamic-scheduler in order to use the
LSF info provider plugin which comes with meaningful default values.
If you would like / need to change it edit the
/opt/glite/etc/lcg-info-dynamic-lsf.conf
file. After YAIM's configuration you have to list the
LSF group -
VOMS FQAN - mappings in the [vomap] section of the
/opt/lcg/etc/lcg-info-dynamic-scheduler.conf
file.
As an example you see here an extract from CERN's config file:
.
.
.
vomap :
grid_ATLAS:atlas
grid_ATLASSGM:/atlas/Role=lcgadmin
grid_ATLASPRD:/atlas/Role=production
grid_ALICE:alice
grid_ALICESGM:/alice/Role=lcgadmin
grid_ALICEPRD:/alice/Role=production
grid_CMS:cms
grid_CMSSGM:/cms/Role=lcgadmin
grid_CMSPRD:/cms/Role=production
grid_LHCB:lhcb
grid_LHCBSGM:/lhcb/Role=lcgadmin
grid_LHCBPRD:/lhcb/Role=production
grid_GEAR:gear
grid_GEARSGM:/gear/Role=lcgadmin
grid_GEANT4:geant4
grid_GEANT4SGM:/geant4/Role=lcgadmin
grid_UNOSAT:unosat
grid_UNOSAT:/unosat/Role=lcgadmin
grid_SIXT:sixt
grid_SIXTSGM:/sixt/Role=lcgadmin
grid_EELA:eela
grid_EELASGM:/eela/Role=lcgadmin
grid_DTEAM:dteam
grid_DTEAMSGM:/dteam/Role=lcgadmin
grid_DTEAMPRD:/dteam/Role=production
grid_OPS:ops
grid_OPSSGM:/ops/Role=lcgadmin
module_search_path : ../lrms:../ett
For further details see the
/opt/glite/share/doc/lcg-info-dynamic-lsf
file.
The Condor batch system
To get the condor middleware go to the
Condor home page.
You have to ensure yourself that the necessary condor packages are installed on the CEs and on the WNs.
On the site-BDII YAIM configures the
lcg-info-dynamic-scheduler
to use the condor infor provider plugin.
You can use the
project-eu-egee-batchsystem-condor@cernNOSPAMPLEASE.ch mailing list if you have problems concerning
gLite and Condor interaction and not only Condor.
IMPORTANT Please be careful setting up and configuring your local Condor batch system. Read carefuly the following advices :
http://www.cs.wisc.edu/condor/osg_security_recommendations.html
The gLite CE for Condor batch system
https://twiki.cern.ch/twiki/bin/view/EGEE/InstallationInstructionsForCondorOnTheGLite-CE
The LCG CE for Condor batch system (paragraph under construction)
https://twiki.cern.ch/twiki/bin/view/EGEE/InstallationInstructionsForCondorOnTheLcg-CE
The SGE batch system
The integration of
SGE in gLite is still work in progress. The sites using
SGE have specified their local configurations in
here and are now working together to provide a common way to deploy and install
SGE using standard EGEE tools.
This part of the guide will contain the common steps to be performed during the installation of an
SGE site. For questions related to
SGE and LCG/gLite interaction, you can use the
project-eu-egee-batchsystem-sge@cernNOSPAMPLEASE.ch mailing list.
The gLite CE for SGE batch system
SGE support for the gLite CE is still under development. We expect to fill this gap soon...
The LCG CE for SGE batch system
WARNING: The software distributed here is still considered as
beta. You use it at your own risk. It may be not fully optimized or correct and therefore, should be considered as experimental. There is no guarantee that it is compatible with the way in which your site is configured.
We will assume that the standard lcg-CE meta-package is already installed (but not configured) in the proper machines. The installation should have been performed using the instructions proposed in the previous sections of this manual. You should start to follow the following instructions right before you reach the Middleware Configuration section.
SGE instalation and configuration
- Install the following SGE rpms (require openmotif >= 2.2.3-5 which can be installed from the SLC3 repository):
sge-V60u7_1-3.i386.rpm
sge-utils-V60u7_1-3.i386.rpm
sge-daemons-V60u7_1-3.i386.rpm
sge-qmon-V60u7_1-3.i386.rpm
sge-ckpt-V60u7_1-3.i386.rpm
sge-parallel-V60u7_1-3.i386.rpm
sge-docs-V60u7_1-3.i386.rpm
- Install lcgCE-yaimtosge-0.0.0-2.i386.rpm which includes the modifications to the standard yaim tool allowing the SGE scheduler configuration. This rpm will require perl-XML-Simple >= 2.14-2.2 package which you can download from here. It also requires glite-yaim >= 3.0.0-34.
- Add the following values to your site-info.def file:
SGE_QMASTER=$CE_HOST
DEFAULT_DOMAIN=$MY_DOMAIN
ADMIN_MAIL=<your_admin_email>
- Configure the CE running SGE using the CE_sge node definiton
[root@<your_ce> ~]#/opt/glite/yaim/scripts/configure_node <path_to_your_site-info.def_file> CE_sge
Notes
- The SGE rpms will install a Qmaster service, which for now, we assume it will be deployed in the CE. This SGE package set was built under SLC4 with the additional packaging of the libdb-4.2.so library in order for it to work in SLC3.
- Check that the "WN_LIST", "USERS_CONF", "VOS" and "QUEUES" variables are properly defined in your site-info.def file. The content of these variables will be used to build the SGE exec node list, the SGE user sets and the SGE local queues. For the time being, VO users in the USERS_CONF file have to be defined following the same order as the QUEUES definition. Otherwise, the VO SGE userset will not correspond to the correct VO QUEUE. This will be fixed in the future...
- The CE configuration must be always run before the WN configurations, otherwise the SGE daemons in the WNs will not be started since there is no Qmaster host associated to them.
- SGE prompt commands will be accessible after a new login (to source the /etc/profile.d/ scripts).
- To start SGE GUI, using the "qmon" comand (the SGE GUI), you need to install xorg-x11-xauth >= 6.8.2-1. Unfortunately, this package is not available in the SLC3 repository and you have to download it from here in the SLC4 repository
- If you have configured your CE with wrong values for the "WN_LIST", "USERS_CONF", "VOS" and "QUEUES" variables, an easy way to solve the question is to delete the /usr/local/sge/pro/default directory and run the CE configuration again.
RPMS Description:
- lcgCE-yaimtosge-0.0.0-2.i386.rpm: Modification to standard glite yaim tool for lcg-CE integration using SGE as scheduler system. It will install:
/etc/profile.d/sge.sh (csh): To set the proper environment;
/opt/glite/yaim/scripts/configure_sgeserver.pm: SGE installation directories;
/opt/glite/yaim/scripts/nodesge-info.def: SGE nodes functions definition;
/opt/glite/yaim/functions/config_sge_server: Configures SGE QMASTER
/opt/globus/lib/perl/Globus/GRAM/JobManager/lcgsge.pm: The SGE jobmanager;
/opt/lcg/libexec/lcg-info-dynamic-sge: The SGE CE GRIS/GIIS perl script.
- sge-V60u7_1-3.i386.rpm: Contains the binaries and libraries needed to run sge commands;
- sge-utils-V60u7_1-3.i386.rpm: Instalation scripts and SGE utilities;
- sge-daemons-V60u7_1-3.i386.rpm: The SGE daemons;
- sge-ckpt-V60u7_1-3.i386.rpm: For checkpointing purposes;
- sge-parallel-V60u7_1-3.i386.rpm: For running parallel environments, as OpenMpi, Mpich, etc;
- sge-docs-V60u7_1-3.i386.rpm: Documentation, manuals and examples;
- sge-qmon-V60u7_1-3.i386.rpm: The SGE GUI interface;
RPMS Download:
The WNs for SGE batch system
- Please install the following sge packages:
sge-V60u7_1-3.i386.rpm
sge-utils-V60u7_1-3.i386.rpm
sge-daemons-V60u7_1-3.i386.rpm
sge-parallel-V60u7_1-3.i386.rpm
sge-docs-V60u7_1-3.i386.rpm
- Install gliteWN-yaimtosge-0.0.0-2.i386.rpm which includes the modifications to the standard yaim tool allowing the SGE client configuration.
- Use the same site-info.def file as in the CE Gatekeeper case. This file should already include definitions for “SGE_QMASTER”, “DEFAULT_DOMAIN”, “ADMIN_MAIL” variables
- Configure the WN using the “WN_sge” node definiton.
[root@<your_wn> ~]# /opt/glite/yaim/scripts/configure_node <path_to_your_site-info.def_file> WN_sge
RPMS Description:
- gliteWN-yaimtosge-0.0.0-2.i386.rpm: Modification to standard glite yaim tool for glite-WN integration using SGE as scheduler system. It will install:
/etc/profile.d/sge.sh (csh): To set the proper environment;
/opt/glite/yaim/scripts/configure_sgeclient.pm: SGE installation directories;
/opt/glite/yaim/scripts/nodesge-info.def: SGE nodes functions definition;
/opt/glite/yaim/functions/config_sge_client: Configures SGE exec host;
- sge-V60u7_1-3.i386.rpm: Contains the binaries and libraries needed to run sge commands;
- sge-utils-V60u7_1-3.i386.rpm: Instalation scripts and SGE utilities;
- sge-daemons-V60u7_1-3.i386.rpm: The SGE daemons;
- sge-parallel-V60u7_1-3.i386.rpm: For running parallel environments, as OpenMpi, Mpich, etc;
- sge-docs-V60u7_1-3.i386.rpm: Documentation, manuals and examples;
RPMS Download:
How to do an upgrade
There is no general description of an upgrade procedure. Usually it contains a repository update and upgrade then a reconfiguration.
When reconfiguration of any kind is necessary will always be explicitly mentioned in the description provided by the patch/update.
apt-get update
apt-get dist-upgrade
/opt/glite/yaim/bin/yaim -c -s site-info.def -n (nodetypetoupgrade)
Firewalls
No automatic firewall configuration is provided by this version of the configuration scripts.
If your nodes are behind a firewall, you will have to ask your network manager to open a few "holes" to allow external access to some service nodes.
A complete map of which port has to be accessible for each service node is maintined in CVS;
http://jra1mw.cvs.cern.ch:8180/cgi-bin/jra1mw.cgi/org.glite.site-info.ports/doc/?only_with_tag=HEAD , or
you can have a look to it's
html version.
Docs, support
For further documentation you can visit:
Your contact point for support is your ROC.
http://egee-sa1.web.cern.ch/egee-sa1/roc.html
Gergely Debreczeni Gergely.Debreczeni@cernNOSPAMPLEASE.ch