gLite CLUSTER
Introduction
glite-CLUSTER is a node type that can publish information about clusters and subclusters in a site, referenced by any number of compute elements.
If you want to understand in detail why this node type is needed and the advantages of using it, please check the following
Technical Note from Stephen Burke, Flavia Donno and Maarten Litmaath.
Deployment Scenarios
glite-CLUSTER
can be deployed in the same host of the
lcg-CE
or in a different one. Check the sections below to know more about each deployment scenario.
glite-CLUSTER and lcg-CE
non-cluster mode
lcg-CE
can be configured as usual without worrying about the
glite-CLUSTER
node. This can be useful for small sites who don't want to worry about cluster/subcluster configurations because they have a very simple setup. In this case
lcg-CE
will publish a single cluster/subcluster.
cluster mode
lcg-CE
can work on
cluster mode
using the
glite-CLUSTER
node type by defining
LCGCE_CLUSTER_MODE=yes
. The
lcg-CE
can be in the same host or in a different host from the
glite-CLUSTER
node.
For the same host, please run:
yum install lcg-CE
yum install glite-CLUSTER
yum install glite-LRMS_utils, where LRMS is TORQUE, LSF, SGE or CONDOR
yaim -c -s site-info.def -n lcg-CE glite-CLUSTER glite-LRMS_utils
For different hosts, please run:
yum install lcg-CE
yum install glite-LRMS_utils, where LRMS is TORQUE, LSF, SGE or CONDOR
yaim -c -s site-info.def -n lcg-CE glite-LRMS_utils
yum install glite-CLUSTER
yaim -c -site-info.def -n glite-CLUSTER
In cluster mode, there are new
lcg-CE
, yaim configuration variables which must be set. Check the
lcg-CE configuration variables twiki for more details.
In order to configure the
glite-CLUSTER
, please check the
glite-CLUSTER configuration variables twiki.
Note on sw tags and WN configuration
If a
glite-CLUSTER
node is to be used with the lcg-CE on a separate machine, then it becomes possible for VO managers who want to set their application tags, to do so per subcluster, using the --sc option in the lcg-tags or lcg-ManageVOTag commands.
This also requires that a user can discover the relevant subcluster name on a given WN. The
glite-wn-info
command is used to do that using the configuration file
${GLITE_LOCATION}/etc/glite-wn-info.conf
, where the subcluster ID is set. YAIM can automatically configure
glite-wn-info.conf
if the
WN_LIST
file is properly configured as explained in the
WN_list section of the YAIM configuration guide.
Known issues
For
glite-CLUSTER 3.1.4
when installing
glite-CLUSTER
and
lcg-CE
in the same machine: In case your new or reconfigured subclusters are named differently than before, the old directory in
/opt/glite/var/info/
should be deleted, otherwise details of the old subcluster keep being published.
The cluster unique ID (i.e. set with the
CE_HOST_<host-name>_CLUSTER_UniqueID
in cluster mode) must not contain upper case letters, i.e. it may contain only lower case alpha numeric, or the three characters '.', '_' and '-'.
When setting up the
lcg-CE
with a
glite-CLUSTER
node on a separate machine, the VO application tag directories at
lcgce:$EDG_LOCATION/var/info/
should be shared with
cluster:$EDG_LOCATION/var/info/
.
glite-CLUSTER and CREAM
There are instructions in the
3.2 glite-CLUSTER release notes for modifying an existing, already configured cream CE to make use of a glite-CLUSTER node at the site. Currently YAIM can not setup the cream CE to do this automatically.
Note that It is not possible to co-locate a creamCE and glite-CLUSTER on the same node. They have to be installed in separate hosts.
glite-CLUSTER check
You can check whether glite-CLUSTER is properly configured by querying the information system. If you query the resource BDII of the glite-CLUSTER node, you should see something like the output below. This is basically the same as you have with the existing configuration, but the details should obviously reflect what you configured in yaim.
In particular, check that the references to GlueCEUniqueIDs in the GlueCluster object(s) correspond to the right queues. Also, check by querying the CE information that the GlueCE objects have the right reverse reference (GlueForeignKey) to the Cluster.
Note the following scenarios when querying the resource BDII:
- if a box hosts only a glite-CLUSTER, its resource bdii should publish GlueCluster + GlueSubCluster (but not GlueCE).
- if a box hosts only a CE configured in cluster mode, its resource bdii should publish GlueCE (but not GlueCluster + GlueSubCluster).
Nothe that if you query the site BDII, the results should be the same: some number of GlueCE objects, each linked to a single GlueCluster (many-to-one), and each GlueCluster linked to one GlueSubCluster (one-to-one).
ldapsearch -x -h localhost -p 2170 -b "mds-vo-name=resource,o=grid"
# extended LDIF
#
# LDAPv3
# base <mds-vo-name=resource,o=grid> with scope sub
# filter: (objectclass=*)
# requesting: ALL
#
# resource, grid
dn: Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: Mds
Mds-Vo-name: resource
# vtb-generic-21.cern.ch_org.glite.RTEPublisher_2855976528, resource, grid
dn: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublisher_28559765
28,Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: GlueService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceUniqueID: vtb-generic-21.cern.ch_org.glite.RTEPublisher_2855976528
GlueServiceName: BUDAPEST-RTEPublisher
GlueServiceType: org.glite.RTEPublisher
GlueServiceVersion: 1.0.0
GlueServiceEndpoint: gsiftp://vtb-generic-21.cern.ch:2811/opt/glite/var/info
GlueServiceStatus: OK
GlueServiceStatusInfo: globus-gridftp-server (pid 10588) is running...
GlueServiceSemantics: http://grid-deployment.web.cern.ch/grid-deployment/eis/d
ocs/ExpSwInstall/sw-install.html
GlueServiceStartTime: 2010-11-22T12:21:55+01:00
GlueServiceOwner: dteam
GlueServiceAccessControlBaseRule: VOMS:/dteam/Role=lcgadmin
GlueForeignKey: GlueSiteUniqueID=BUDAPEST
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
# GlueSubClusterUniqueID:gergosubcluster, vtb-generic-21.cern.ch_org.glite.RT
EPublisher_2855976528, resource, grid
dn: GlueServiceDataKey=GlueSubClusterUniqueID:gergosubcluster,GlueServiceUniqu
eID=vtb-generic-21.cern.ch_org.glite.RTEPublisher_2855976528,Mds-Vo-name=reso
urce,o=grid
objectClass: GlueTop
objectClass: GlueServiceData
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceDataKey: GlueSubClusterUniqueID:gergosubcluster
GlueChunkKey: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublishe
r_2855976528
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
# gergo.clus-ter, resource, grid
dn: GlueClusterUniqueID=gergo.clus-ter,Mds-Vo-name=resource,o=grid
objectClass: GlueClusterTop
objectClass: GlueCluster
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueClusterName: GergoCluster human readable
GlueClusterService: vtb-generic-12.cern.ch:2119/lcgpbs-dteam-jobmanager-dteam
GlueClusterUniqueID: gergo.clus-ter
GlueForeignKey: GlueSiteUniqueID=Budapest
GlueForeignKey: GlueCEUniqueID=vtb-generic-12.cern.ch:2119/lcgpbs-jobmanager-d
team
GlueInformationServiceURL: ldap://vtb-generic-21.cern.ch:2170/mds-vo-name=reso
urce,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
# glite-info-service_version, vtb-generic-21.cern.ch_org.glite.RTEPublisher_2
855976528, resource, grid
dn: GlueServiceDataKey=glite-info-service_version,GlueServiceUniqueID=vtb-gene
ric-21.cern.ch_org.glite.RTEPublisher_2855976528,Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: GlueServiceData
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceDataKey: glite-info-service_version
GlueServiceDataValue: 1.5
GlueChunkKey: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublishe
r_2855976528
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
# glite-info-service_hostname, vtb-generic-21.cern.ch_org.glite.RTEPublisher_
2855976528, resource, grid
dn: GlueServiceDataKey=glite-info-service_hostname,GlueServiceUniqueID=vtb-gen
eric-21.cern.ch_org.glite.RTEPublisher_2855976528,Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: GlueServiceData
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueServiceDataKey: glite-info-service_hostname
GlueServiceDataValue: vtb-generic-21.cern.ch
GlueChunkKey: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublishe
r_2855976528
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
# gergosubcluster, gergo.clus-ter, resource, grid
dn: GlueSubClusterUniqueID=gergosubcluster,GlueClusterUniqueID=gergo.clus-ter,
Mds-Vo-name=resource,o=grid
objectClass: GlueClusterTop
objectClass: GlueSubCluster
objectClass: GlueHostApplicationSoftware
objectClass: GlueHostArchitecture
objectClass: GlueHostBenchmark
objectClass: GlueHostMainMemory
objectClass: GlueHostNetworkAdapter
objectClass: GlueHostOperatingSystem
objectClass: GlueHostProcessor
objectClass: GlueInformationService
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueChunkKey: GlueClusterUniqueID=gergo.clus-ter
GlueHostApplicationSoftwareRunTimeEnvironment: GPU
GlueHostApplicationSoftwareRunTimeEnvironment: GPU-TEST-2
GlueHostArchitectureSMPSize: 12
GlueHostArchitecturePlatformType: intel
GlueHostBenchmarkSF00: 100
GlueHostBenchmarkSI00: 100
GlueHostMainMemoryRAMSize: 100
GlueHostMainMemoryVirtualSize: 100
GlueHostNetworkAdapterInboundIP: TRUE
GlueHostNetworkAdapterOutboundIP: TRUE
GlueHostOperatingSystemName: linux
GlueHostOperatingSystemRelease: gekko
GlueHostOperatingSystemVersion: 3.4
GlueHostProcessorClockSpeed: 100
GlueHostProcessorModel: 200
GlueHostProcessorVendor: 300
GlueHostProcessorOtherDescription: mydescription
GlueSubClusterName: GergoSubcluster human readable
GlueSubClusterUniqueID: gergosubcluster
GlueSubClusterPhysicalCPUs: 100
GlueSubClusterLogicalCPUs: 200
GlueSubClusterTmpDir: /tmp
GlueSubClusterWNTmpDir: /tmp
GlueInformationServiceURL: ldap://vtb-generic-21.cern.ch:2170/mds-vo-name=reso
urce,o=grid
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 3
# glite-version, vtb-generic-21.cern.ch_org.glite.RTEPublisher_2855976528, re
source, grid
dn: GlueServiceDataKey=glite-version,GlueServiceUniqueID=vtb-generic-21.cern.c
h_org.glite.RTEPublisher_2855976528,Mds-Vo-name=resource,o=grid
objectClass: GlueTop
objectClass: GlueServiceData
objectClass: GlueKey
GlueServiceDataKey: glite-version
GlueServiceDataValue: 3.1.0
GlueChunkKey: GlueServiceUniqueID=vtb-generic-21.cern.ch_org.glite.RTEPublishe
r_2855976528
# search result
search: 2
result: 0 Success
# numResponses: 9
# numEntries: 8
Use Cases
The following use cases represent the most common scenarios. Check the definitions below to be able to understand the diagrams.
- RTE Publisher: Run Time Environment Service Publisher. It publishes information about the glite-CLUSTER service in the information system.
- GlueCluster: A GlueCluster in the Glue Schema gives a representation of a set of physical resources (hosts or Worker Nodes or computers) behind a CE.
- GlueSubcluster: A GlueSubCluster refers to homogeneous set of hosts as regards the selected attributes. This entity provides details of the machines that offer execution environments to jobs.
- gridftp server: needed by the WNs to be able to copy in the glite-CLUSTER node which software is installed.
- /opt/glite/var/info/SubCluster1/VO1: Location where the information about the software installed in the WNs is copied in the glite-CLUSTER node.
- Head Node: It can be a lcg-CE or CREAM CE.
- GlueCE: A GlueCE entry in the Glue Schema represents a Computing Element which is an abstraction for an entity managing computing resources exposed to the Grid.
- Head Node Service Publisher: It publishes information about the lcg-CE or CREAM CE service in the information system.
- lcg-info-dynamic-software: plugin that publishes information about the software installed in the WNs in the GlueSubCluster.
- glite-info-service: plugin that actually publishes the service information on the resource BDII.
- glite-info-dynamic-_lrms_: plugin that actually publishes information relevant to the batch system queues in the GlueCE.
- LRMS: Local Resource Management System, that is the batch system.
- glite-wn-info: command used by the WNs to know under which GlueSubcluster they are represented.
- lcg-tags/lcg-ManageVOtags --subcluster: command used by the WNs to copy information about the software they have installed to the glite-CLUSTER node.
One Cluster/SubCluster, One Head Node, One GlueCE, One LRMS queue
Configuration variables
In the previous scenario, the following configuration variables are needed (the values are only an example):
- glite-CLUSTER
# The Cluster variables should contain the name of the cluster variable in upper case
CLUSTER_HOST="vtb-generic-74.cern.ch"
CLUSTERS="yaim"
CLUSTER_YAIM_CLUSTER_UniqueID=my-yaim
CLUSTER_YAIM_CLUSTER_Name="this is the yaim cluster"
CLUSTER_YAIM_SITE_UniqueID=yaim
CLUSTER_YAIM_CE_TYPE="jobmanager"
CLUSTER_YAIM_INFO_PORT=2170
CLUSTER_YAIM_INFO_TYPE=resource
# The CE host variables should contain the name of the CE hostname in lower case and replace '.' and '-' with '_'
CLUSTER_YAIM_CE_HOSTS="vtb-generic-64.cern.ch"
CE_HOST_vtb_generic_64_cern_ch_CE_TYPE="jobmanager"
CE_HOST_vtb_generic_64_cern_ch_QUEUES="dteam"
CE_HOST_vtb_generic_64_cern_ch_CE_InfoJobManager="lcgpbs"
# The Subcluster variables should contain the name of the subcluster variable in upper case
SUBCLUSTER_SLC4_SUBCLUSTER_UniqueID=slc4
SUBCLUSTER_SLC4_HOST_ApplicationSoftwareRunTimeEnvironment="LCG-2|LCG-2_1_0|LCG-2_1_1|LCG-2_2_0" # CE_RUNTIMEENV
SUBCLUSTER_SLC4_HOST_ArchitectureSMPSize=2 # CE_SMPSIZE
SUBCLUSTER_SLC4_HOST_ArchitecturePlatformType=i686 # CE_OS_ARCH
SUBCLUSTER_SLC4_HOST_BenchmarkSF00=0 # CE_SF00
SUBCLUSTER_SLC4_HOST_BenchmarkSI00=381 # CE_SI00
SUBCLUSTER_SLC4_HOST_MainMemoryRAMSize=513 # CE_MINPHYSMEM
SUBCLUSTER_SLC4_HOST_MainMemoryVirtualSize=1025 # CE_MINVIRTMEM
SUBCLUSTER_SLC4_HOST_NetworkAdapterInboundIP=FALSE # CE_INBOUNDIP
SUBCLUSTER_SLC4_HOST_NetworkAdapterOutboundIP=TRUE # CE_OUTBOUNDIP
SUBCLUSTER_SLC4_HOST_OperatingSystemName="Scientific Linux" # CE_OS
SUBCLUSTER_SLC4_HOST_OperatingSystemRelease=3.0.6 # CE_OS_RELEASE
SUBCLUSTER_SLC4_HOST_OperatingSystemVersion="SL" # CE_OS_VERSION
SUBCLUSTER_SLC4_HOST_ProcessorClockSpeed=1001 # CE_CPU_SPEED
SUBCLUSTER_SLC4_HOST_ProcessorModel=PIII # CE_CPU_MODEL
SUBCLUSTER_SLC4_HOST_ProcessorVendor=intel # CE_CPU_VENDOR
SUBCLUSTER_SLC4_SUBCLUSTER_Name="my subcluster YAIM"
SUBCLUSTER_SLC4_SUBCLUSTER_PhysicalCPUs=1 # CE_PHYSCPU
SUBCLUSTER_SLC4_SUBCLUSTER_LogicalCPUs=1 # CE_LOGCPU
SUBCLUSTER_SLC4_SUBCLUSTER_TmpDir=/tmp
SUBCLUSTER_SLC4_SUBCLUSTER_WNTmpDir=/tmp
- lcg-CE
CE_HOST=vtb-generic-64.cern.ch
CE_HOST_vtb_generic_64_cern_ch_CLUSTER_UniqueID=my-yaim
CE_HOST_vtb_generic_64_cern_ch_CE_InfoApplicationDir=/sw_dir
CE_HOST_vtb_generic_64_cern_ch_CE_TYPE=jobmanager
# Distributed in site-info.def
CE_HOST_vtb_generic_64_cern_ch_CE_InfoJobManager=lcgpbs
CE_HOST_vtb_generic_64_cern_ch_QUEUE_DTEAM_VOVIEW_DTEAM_CE_StateWaitingJobs=666666
CE_HOST_vtb_generic_64_cern_ch_QUEUES="dteam"
CE_HOST_vtb_generic_64_cern_ch_QUEUE_DTEAM_CE_AccessControlBaseRule="dteam"
- If you use glite-TORQUE_server
# The following "old variables" still need to be defined for the TORQUE server.
QUEUES="dteam"
DTEAM_GROUP_ENABLE="dteam"
CE_SMPSIZE=2
FAQ
- What are the implications/advantages of using the CLUSTER node type at our site? Any disadvantages? The advantages are described in the Technical Note. The risk is that if you publish the wrong thing it may affect job submission and/or installed capacity publication, but that's also true with the current system. It is possible to migrate gradually, i.e. you can have a mixture of CEs which are connected to the cluster node and others which keep the existing setup.
- How do we build the CLUSTER node type? Please, check the gLite web Pages to know how to install
glite-CLUSTER
. Then, check the YAIM configuration variable twiki to know how to configure glite-CLUSTER.
- What do I need to do to my current nodes (SEs, CEs, BDII, ... ) to make them interact with the new CLUSTER node type? The cluster node has a resource BDII like any other node, which allows the published information to be collected by the site BDII. It doesn't interact with the SEs, but it has a rather intimate connection with the CEs, because the GlueCE objects link to the GlueCluster objects and vice versa.
- If I deploy the glite-CLUSTER on a node with no CREAM CE installed do I need to setup the batch system specific support on that node? Yes, the information providers used by the glite-CLUSTER require the batch system software in order to query the local resource management system.
- Is there anything else I need to do to make any other site aware of and/or interact with the CLUSTER node type? Cluster publication doesn't change anything about the way the glue schema works, it's just about configuration, so if it's configured correctly nothing external to the site will notice.