Notes on the gLite 3.0 RC2 release to pre-production.
The gLite 3.0 PPS release is now available. It is based on LCG-2_7_0 with the addition of
gLite WMS/LB
gLite CE
Combined gLite/LCG WN
Combined gLite/LCG UI
FTS server
FTA
There is an apt-get repository for PPS;
rpm http://lxb2042.cern.ch/gLite/APT/R3.0-pps rhel30 externals Release3.0 updates
The CAs have been decoupled from the release - further info on how to install them can be found here
http://grid-deployment.web.cern.ch/grid-deployment/lcg2CAlist.html
Result of Certificaiton
glite 3.0 RC2 has been evaluated on the Certification Testbed.
The glite WMS failed stress testing as the network server failed due to bug #15761. The cron job that restarts the network server also failed (see note).
The glite bulk submission was not tested due to the above failure.
The FTS also failed as the configuration is still incomplete.
Note: The new cron, does not run cron jobs in cron.d if the file has executable permission. The following cron jobs will fail it this is set. Please ensure that after an install you ensure these will work by running "chmod a-x /etc/cron.d/*"
UI
-rwxr-xr-x 1 root root 267 Mar 24 15:09 glite-fetch-crl.cron
WMS
-rwxr-xr-x 1 root root 268 Apr 6 15:03 glite-fetch-crl.cron
-rwxr-xr-x 1 root root 160 Apr 6 15:04 glite-wms-check-daemons.cron
-rwxr-xr-x 1 root root 158 Apr 6 15:02 glite-wms-ns-proxy.cron
-rwxr-xr-x 1 root root 680 Apr 6 15:02 glite-wms-purger.cron
-rwxr-xr-x 1 root root 241 Apr 6 15:02 glite-wms-wmproxy-purge-proxycache.cron
MON
-rwxr-xr-x 1 root root 211 Mar 27 19:45 glite-iperf-check
-rwxr-xr-x 1 root root 207 Mar 27 19:21 glite-udpmon-check
CE glite
-rwxr-xr-x 1 root root 267 Apr 6 12:26 glite-fetch-crl.cron
List of targets;
Please use yaim's install_node script for fresh installs. For upgrades from RC1, use apt-get dist-upgrade.
The repository and yaim now support yum. If you use yaim for installation, set REPOSITORY_TYPE="yum" in site-info.def before running install_node. This will configure yum for you.
Many meta-rpm names have now been changed to rationalise the naming (lcg-* -> glite-*). For upgrading a node whose name has changed, please do the following (for example)
rpm -e lcg-WN
apt-get install glite-WN
apt-get dist-upgrade
The metapackages available are;
glite-UI (a combined LCG/gLite UI)
glite-WN (a combined LCG/gLite UI)
glite-FTS (FTS server plus related services)
glite-CE (the gLite CE)
glite-WMSLB (WMS and LB, recommended deployment of the WMS)
glite-BDII
glite-LFC_mysql
glite-LFC_oracle
glite-MON
glite-PX
glite-SE_classic
glite-SE_dpm_mysql
glite-SE_dpm_oracle
glite-SE_dpm_disk
glite-SE_dcache
glite-SE_dcache_gdbm
glite-VOBOX
glite-VOMS_mysql
glite-VOMS_oracle
lcg-RB
lcg-CE
lcg-CE_torque
glite-FTA
Many of these node types are described in the LCG Manual Install Guide
http://grid-deployment.web.cern.ch/grid-deployment/documentation/LCG2-Manual-Install/
Configuration
Configuration for all above components is now supported via yaim (FTS still requires a manual step). Note that the configuration targets have not yet been fully synchronised with the installation targets and some names are different.
Yaim has been renamed glite-yaim and has been relocated to /opt/glite/yaim. Please
- Ensure any customised files are moved from /opt/lcg/yaim
- Ensure your site-info.def references the new location for FUNCTIONS_DIR and perhaps others (eg USERS_CONF)
- Put configuration files in /opt/glite/yaim/etc
Configuration for all 'gLite' components is also supported via the native (XML) system.
Where yaim is configuring a gLite node type, it populates the XML files and runs the gLite config scripts. Please note that any modifications you make to the XML files, to parameters not managed by yaim, should be
preserved. Parameters managed by yaim will be clearly marked in the XML after it has been run. The intention is that yaim offers a simple interface if prefered, but the ability to use the more powerful native machanism is retained.
Please use yaim to configure pool accounts. Yaim allows non contiguous ranges of uids which some sites require and is therefore the default user configuration mechanism.
Yaim is in the apt-get repository.
New Yaim parameters;
WMS_HOST - gLite WMS + LB
FTS_HOST - for building an FTS server
REPOSITORY_TYPE - defaults to apt, but yum can be used.
BATCH_BIN_DIR - The path of the lrms commands, eg /usr/pbs/bin
BATCH_VERSION - The version of the Local Resource Managment System, eg OpenPBS_2.3
LFC_DB_HOST - Set this to use a separate db server for LFC
LFC_DB - Set this to define the name of LFC's db
Some parameters have changed for the DPM
DPM_FILESYSTEMS - The filesystems/partitions parts of the pool
DPM_DB_USER - The database user (was DPMMGR)
DPM_DB_PASSWORD - The database user password (was DPMUSER_PWD)
so the following are no longer used
DPMMGR
DPMUSER_PWD
DPMPOOL_NODES
There is more information in the example site-info.def file
Notes on particular node types;
lcg-RB
Condor is upgraded to 6.7.10 there is a new condor-lcg package which provides LCG modifications to the gahp_server and grid_monitor. Configuration of these is handled by yaim.
glite WMS + LB
To install the glite WMS + glite LB (recommended deployment scenario)
install_node site-info.def glite-WMSLB
configure_node site-info.def WMSLB
Combined UI
The gLite 3.0 UI is a 'combined' UI, incorporating LCG and gLite components.
On the combined node, please watch out for glite commands which are symlinked to edg commands and may appear earlier in the PATH than their edg counterparts. The extent to which the glite symlinks can provide the functionality of the edg commands they replace is untested. These symlinks will be removed in future releases.
The RPM based userland installation finished without conflicts but there are lots of warnings and errors due to install scripts which require root privilege.
install_node site-info.def glite-UI
configure_node site-info.def UI_combined
WN
The gLite WN has combined gLite and LCG components
install_node site-info.def glite-WN
configure_node site-info.def WN_combined
glite-WN + Torque client
install_node site-info.def glite-WN glite-torque-client-config
configure_node site-info.def WN_combined_torque
FTS
In the case of the FTS yaim will configure all related services such as crl downloads, info provider etc but the FTS server itself must be configured using the usual gLite system. A yaim component will follow.
install_node site-info.def glite-FTS
configure_node site-info.def FTS
gLite CE
The gLite CE is configured to support only
VOMS proxies.
install_node site-info.def glite-CE
configure_node site-info.def gliteCE
If you want your gliteCE to run the site
BDII;
configure_node site-info.def gliteCE BDII_site
The glite-CE configuration configures also software and scheduler
GIP plugins. Due to the bug in the /opt/lcg/libexec/lcg-info-dynamic-scheduler file the following command must be run in order to get a correct functionality:
# sed -i '{s/jobmanager/blah/}' /opt/lcg/libexec/lcg-info-dynamic-scheduler
Batch systems and the gLite CE
If you are installing your batch system server on the same node as the CE, and you want to use yaim or gLite to configure it, please choose one or the other and stick to it. If you use yaim and then make modifications via the gLite system, any rerun of yaim will reset the configuration. The same advice applies to management of WNs. If yaim fulfils your needs, this is the recommended route.
glite-CE + Torque server
install_node site-info.def glite-CE glite-torque-server-config
configure_node site-info.def gliteCE TORQUE_server
Note that the log-parser daemon must be started on whichever node is running the batch system. If your CE node is also the batch system head node, you have to run the log-parser here.
If you are running two CEs (typically LCG and gLite versions) please take care to ensure no collisions of pool account mapping. This is typically achieved either by allocating separate pool account ranges to each CE or by allowing them to share a gridmapdir.
DPM
A
VOMS enabled DPM (1.5.5) is now available. Upgrade from LCG-2_7_0 is supported.
install_node site-infoe.def glite-SE_dpm_mysql
configure_node site-info.def [SE_dpm_mysql|SE_dpm_disk]
dCache
The yaim script for configuring dCache has received many updates from GridPP. It offers extended functionality but is backward compatible.
Note that dcache may show errors if you have more than around 56 CAs. If this is the case, currently the only fix is to identify CAs you do not need to support and remove them.
Yaim does not yet support d-Cache with a postgresql based pnfs. To
accommodate sites who have already upgraded to this version of pnfs,
we now have two types of d-Cache SE.
glite-SE_dcache
This has no dependency on pnfs at all, so upgrades of either type
(postgresql or gdbm) should work at the rpm level.
glite-SE_dcache_gdbm
This has a dependency on pnfs (ie the gdbm version) and is necessary for a
new install. Please note however that pnfs_postgresql is the preferred
implementation and migration is non trivial.
FTA
New yaim configuration for FTA. Please take the fta-info.def file from yaim's examples directory and append it to your site-info file before configuring.
install_node site-info.def glite-file-transfer-agents-config
configure_node site-info.def FTA
Fixes with respect to RC1
The following most recent critical bug fixes are contained in the new release candidate 2:
Bug 15330: glite-wms-ui-cli-python masks commands from LCG UI
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15330
Bug 15642: When mapping all the VOs to one queue on a glite CE with
LSF the ...
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15642
TO BE CONFIRMED BY DEVELOPER - INCONSISTENT STATE IN SAVANNAH
Bug 15674: Blah submission from a glite 3.0 CE (glite flavour) to an
LSF queue does not work
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15674
Bug 15710: gLite 3.0 job wrapper has bad kill usage
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15710
Bug 15769: large job collection submission and cancel through WMproxy didn't work
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15769
Bug 15806: matchmaking slow for bulk submission
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15806
Bug 15874: FTS - Can't configure the http timeout in the
ChannelAgent
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15874
Bug 15934: Blah submission from a glite 3.0...
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15934
In addition, the following bug fixes in yaim have been included
Bug 15101: LFC : central LFC configured for all the VOs supported by a site
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15101
Bug 15131: Wrong permissions in LFC catalog when VO name = local group name
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15131
Bug 15484: DPM and LFC config does not allow for alternative database name and server
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15484
Bug 15622: Request for optional LFC_DB_HOST variable in yaim.
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15622
Bug 15764: GLITE_TMP is set but directory is not created
https://savannah.cern.ch/bugs/?func=detailitem&item_id=15764
Middleware components
The gLite 3.0 issue tracking page has information on what has been fixed in RC2
https://uimon.cern.ch/twiki/bin/view/LCG/Glite30IssueTracking
Yaim and configuration
- Yaim support for new gLite services (combined UI, combined WN, TORQUE_server)
- Support for VOs without VOMS (for gLite services)
- Missing WMS_HOST switch off the configuration of gLite UI part of the combined UI
- Return value of gLite configuration scripts is checked bug #15543
- GIP configuration fixed on glite CE bug #15434
- ACL publication fixed on gLite CE bug #15424
- rationalisation of DPM configuration
- LFC now suports a remote DB
- FTA now yaim configurable
- BDII - allow site BDII on gliteCE
- Condor config for lcg-RB
- No longer mandate home dir under /home for edginfo and edguser
- Bogus 'requires' removed from config_gip
- ERT plugin and software plugin for gliteCE (still requires manual step as plugin expects 'jobmanager')
- config_mkgridmap - support new VOMS capability syntax
- RGMA - set dir perms on /etc/tomcat5 and new CATALINA_OPTS
- dcache - new native info provider
Outstanding bugs
During the integration and testing process a list of outstanding issues was maintained. Here is a summary of the issues which have not yet been addressed and were considered important;
savannah issue 15050 - this has NOT been fixed. The impact is of the order of a few jobs (<5) per thousand.
savannah issue 15189 - status not updated for nodes of a large collection - now fixed but missed the cut for RC2, now fine for 400 jobs, but doesn't work for 1000 jobs in a collection.
savannah issue 15894 - dynamic scheduler plugin on glite-CE doesn't provide correct information. Temporary fix:
# sed -i '{s/jobmanager/blah/}' /opt/lcg/libexec/lcg-info-dynamic-scheduler
savannah issue 15643 - proxy renewal works, job aborts after renewal. Voms credentials are dropped.
savannah issue 15688 - Jobs stay in ready state. Situation still not entirely clear, can be just a configuration problem
Publishing software tags by user. Not solved yet, we will add a gridFTP server later.
In configuring a UI you may see complaints about the absence of files in vomsdir. Please ignore this as the script is making an invalid assumption about the naming convention of files in there.
Notes
Other issues to remain aware of;
Between LCG-2_7_0 and gLite 3.0 MySQL has been upgraded from 4.0 to 4.1. There has been a change in the password encryption, please keep this in mind.
Pointers to documentation on the components of this release are being compiled here
http://www.grid.kfki.hu/afs/gdebrecz/web/LCG/the-LCG-directory.html