How to set up an HTCondor CE service on a single host
The host should at least have 4 GB of RAM and 10 GB of disk space for simple tests,
whereas more memory and/or disk may be needed for realistic jobs.
First set up a mini HTCondor service following the
Admin Quick Start Guide:
https://research.cs.wisc.edu/htcondor/htcondor/documentation/
The
Long Term Support (LTS) Channel (see below) concerns
v9.0.x
whose EOL will be
Feb 2023:
it supports
X509 proxies for
authentication and delegation, whereas the releases in
the
Feature Channel only support the latter purpose, i.e. equipping jobs with such proxies.
Both channels support
SciTokens
for authentication. In the course of 2022 we will need
to make job submission with tokens work for HTCondor CEs across the infrastructure.
Example configurations are shown below.
Notes on using the Long Term Support Channel
Note: the Admin Quick Start Guide defaults to the Feature Channel.
To deploy the LTS a.k.a.
stable release, one can imitate these steps,
which also prevent a fatal error encountered on
CC7
hosts:
----------------------------------------------------------------------
yum remove epel-release-7
----------------------------------------------------------------------
curl -fsSL https://get.htcondor.org | \
GET_HTCONDOR_PASSWORD=<pick-a-password> \
/bin/bash -s -- --no-dry-run --channel stable
----------------------------------------------------------------------
If that worked as expected, your host is already running an HTCondor
batch service now.
Setting up the CE interface
For its CE interface, ensure the host has a certificate, the CAs and
the desired VOMS configuration details.
The IGTF Certificate Authorities can be installed
from the
ca-policy-egi-core
rpm available from the
EGI CA repository.
The VOMS details for WLCG VOs can be installed
from
wlcg-voms-*
rpms available from the
WLCG rpm repository.
For example:
----------------------------------------------------------------------
(cd /etc/yum.repos.d/ && curl -O https://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo)
----------------------------------------------------------------------
yum install ca-policy-egi-core
----------------------------------------------------------------------
yum install http://linuxsoft.cern.ch/wlcg/centos7/x86_64/wlcg-repo-1.0.0-1.el7.noarch.rpm
----------------------------------------------------------------------
yum install wlcg-voms-{alice,lhcb,dteam}
----------------------------------------------------------------------
Ensure a valid host certificate has been installed:
----------------------------------------------------------------------
openssl x509 -noout -dates -in /etc/grid-security/hostcert.pem
----------------------------------------------------------------------
The directories in question should resemble what is shown here:
----------------------------------------------------------------------
[root@mini-htc ~]# ll /etc/grid-security/
total 76
drwxr-xr-x. 2 root root 40960 May 18 18:48 certificates
-rw-r--r--. 1 root root 3198 Mar 13 04:07 gsi.conf
-r--r--r--. 1 root root 3060 May 18 15:18 hostcert.pem
-r--------. 1 root root 1828 May 18 15:18 hostkey.pem
drwxr-xr-x. 5 root root 44 May 18 15:08 vomsdir
----------------------------------------------------------------------
[root@mini-htc ~]# ll /etc/grid-security/vomsdir/
total 0
drwxr-xr-x. 2 root root 60 May 18 15:08 alice
drwxr-xr-x. 2 root root 37 May 18 15:08 dteam
drwxr-xr-x. 2 root root 60 May 18 15:08 lhcb
----------------------------------------------------------------------
[root@mini-htc ~]# ll /etc/grid-security/vomsdir/alice/
total 8
-rw-r--r--. 1 root root 101 Feb 11 2014 lcg-voms2.cern.ch.lsc
-rw-r--r--. 1 root root 97 Feb 11 2014 voms2.cern.ch.lsc
----------------------------------------------------------------------
[root@mini-htc ~]# ll /etc/grid-security/vomsdir/lhcb/
total 8
-rw-r--r--. 1 root root 101 Feb 11 2014 lcg-voms2.cern.ch.lsc
-rw-r--r--. 1 root root 97 Feb 11 2014 voms2.cern.ch.lsc
----------------------------------------------------------------------
[root@mini-htc ~]# ll /etc/grid-security/vomsdir/dteam/
total 4
-rw-r--r--. 1 root root 129 Jan 19 2017 voms2.hellasgrid.gr.lsc
----------------------------------------------------------------------
Ensure the CRLs are up to date:
----------------------------------------------------------------------
yum install fetch-crl
----------------------------------------------------------------------
systemctl enable fetch-crl-cron
----------------------------------------------------------------------
systemctl start fetch-crl-cron
----------------------------------------------------------------------
fetch-crl > /tmp/crl-$$.log 2>&1 < /dev/null &
----------------------------------------------------------------------
We will now set up the HTCondor CE following these steps:
https://htcondor.com/htcondor-ce/v5/installation/htcondor-ce/
First:
----------------------------------------------------------------------
yum install htcondor-ce-condor
----------------------------------------------------------------------
Copy the pool password:
----------------------------------------------------------------------
cp /etc/condor/passwords.d/POOL /etc/condor-ce/passwords.d/
----------------------------------------------------------------------
Open the HTCondor CE port:
----------------------------------------------------------------------
firewall-cmd --permanent --zone=public --add-port=9619/tcp
----------------------------------------------------------------------
firewall-cmd --reload
----------------------------------------------------------------------
The HTCondor CE daemon configuration should resemble the following (edit the files as indicated):
----------------------------------------------------------------------
[root@mini-htc ~]# ll /etc/condor-ce/config.d/
total 24
-rw-r--r--. 1 root root 1321 May 29 02:05 01-ce-auth.conf
-rw-r--r--. 1 root root 1714 Dec 21 22:11 01-ce-router.conf
-rw-r--r--. 1 root root 1362 Dec 21 22:11 01-pilot-env.conf
-rw-r--r--. 1 root root 1444 Dec 21 22:11 02-ce-condor.conf
-rw-r--r--. 1 root root 500 Dec 21 22:11 03-managed-fork.conf
-rw-r--r--. 1 root root 41 May 29 02:17 50-schedd2.conf
----------------------------------------------------------------------
[root@mini-htc ~]# grep ^AUTH /etc/condor-ce/config.d/01-ce-auth.conf
AUTH_SSL_SERVER_CERTFILE = /etc/grid-security/hostcert.pem
AUTH_SSL_SERVER_KEYFILE = /etc/grid-security/hostkey.pem
AUTH_SSL_SERVER_CADIR = /etc/grid-security/certificates
AUTH_SSL_SERVER_CAFILE =
AUTH_SSL_CLIENT_CERTFILE = /etc/grid-security/hostcert.pem
AUTH_SSL_CLIENT_KEYFILE = /etc/grid-security/hostkey.pem
AUTH_SSL_CLIENT_CADIR = /etc/grid-security/certificates
AUTH_SSL_CLIENT_CAFILE =
----------------------------------------------------------------------
[root@mini-htc ~]# cat /etc/condor-ce/config.d/50-schedd2.conf
JOB_ROUTER_SCHEDD2_POOL = localhost:9618
----------------------------------------------------------------------
User mapping details and examples:
----------------------------------------------------------------------
[root@mini-htc ~]# ll /etc/condor-ce/mapfiles.d/
total 20
-rw-r--r--. 1 root root 1305 Dec 21 22:11 10-gsi.conf
-rw-r--r--. 1 root root 1095 Dec 21 22:11 10-scitokens.conf
-rw-r--r--. 1 root root 78 May 29 02:07 11-gsi.conf
-rw-r--r--. 1 root root 99 May 29 02:07 11-scitokens.conf
-rw-r--r--. 1 root root 540 May 29 02:06 50-gsi-callout.conf
----------------------------------------------------------------------
[root@mini-htc ~]# cat /etc/condor-ce/mapfiles.d/11-gsi.conf
GSI /.*,\/alice\/Role=lcgadmin/ alicesgm
GSI /.*,\/alice\/Role=NULL/ alice001
----------------------------------------------------------------------
[root@mini-htc ~]# cat /etc/condor-ce/mapfiles.d/11-scitokens.conf
SCITOKENS /^https:\/\/wlcg\.cloud\.cnaf\.infn\.it\/,8c3c01a9-ee96-4f6e-989c-ad1e279244ae$/ wlcg001
----------------------------------------------------------------------
[root@mini-htc ~]# grep GSI /etc/condor-ce/mapfiles.d/50-gsi-callout.conf | tail -n 1
#GSI /(.*)/ GSS_ASSIST_GRIDMAP
----------------------------------------------------------------------
NOTE: the
GSS_ASSIST_GRIDMAP
line must be
commented out or removed !
Add the necessary grid job accounts:
----------------------------------------------------------------------
adduser alicesgm
----------------------------------------------------------------------
adduser alice001
----------------------------------------------------------------------
adduser wlcg001
----------------------------------------------------------------------
Result:
----------------------------------------------------------------------
[root@mini-htc ~]# tail -n 3 /etc/passwd
alicesgm:x:19984:19984::/home/alicesgm:/bin/bash
alice001:x:19985:19985::/home/alice001:/bin/bash
wlcg001:x:19986:19986::/home/wlcg001:/bin/bash
----------------------------------------------------------------------
We need to set an extra parameter for the HTCondor batch service as well:
----------------------------------------------------------------------
[root@mini-htc ~]# ll /etc/condor/config.d/
total 16
-rw-r--r--. 1 root root 1004 May 26 21:46 00-htcondor-9.0.config
-rw-r--r--. 1 root root 2501 May 26 21:46 00-minicondor
-rw-r--r--. 1 root root 451 Dec 21 22:11 50-condor-ce-defaults.conf
-rw-r--r--. 1 root root 39 May 29 02:26 99-extra.conf
----------------------------------------------------------------------
[root@mini-htc ~]# cat /etc/condor/config.d/99-extra.conf
QUEUE_SUPER_USER_MAY_IMPERSONATE = .*
----------------------------------------------------------------------
To ensure the services will run with the specified configuration, we restart them:
----------------------------------------------------------------------
systemctl stop condor-ce
----------------------------------------------------------------------
systemctl stop condor
----------------------------------------------------------------------
Ensure all related processes have gone, start the services and check if the processes are all back:
----------------------------------------------------------------------
ps afuxwww | grep -o '.*condor[_][^ ]*'
----------------------------------------------------------------------
systemctl start condor
----------------------------------------------------------------------
systemctl start condor-ce
----------------------------------------------------------------------
The list of processes should look as shown here:
----------------------------------------------------------------------
[root@mini-htc ~]# ps afuxwww | grep -o '.*condor[_][^ ]*'
condor 95485 0.0 0.0 71632 7040 ? Ss 16:23 0:00 /usr/sbin/condor_master
root 95528 0.0 0.0 23464 3996 ? S 16:23 0:00 \_ condor_procd
condor 95530 0.0 0.0 44660 5952 ? Ss 16:23 0:00 \_ condor_shared_port
condor 95531 0.0 0.0 45688 6720 ? Ss 16:23 0:00 \_ condor_collector
condor 95532 0.0 0.0 45448 6664 ? Ss 16:23 0:00 \_ condor_negotiator
condor 95533 0.0 0.1 46864 7632 ? Ss 16:23 0:00 \_ condor_schedd
condor 95534 0.0 0.0 45960 7044 ? Ss 16:23 0:00 \_ condor_startd
condor 95573 0.0 0.0 71672 5640 ? Ss 16:23 0:00 condor_master
root 95622 0.0 0.0 23600 4036 ? S 16:23 0:00 \_ condor_procd
condor 95623 0.0 0.0 44788 5956 ? Ss 16:23 0:00 \_ condor_shared_port
condor 95625 0.0 0.2 188472 17568 ? Ss 16:23 0:00 \_ condor_collector
condor 95628 0.0 0.1 46788 7496 ? Ss 16:23 0:00 \_ condor_schedd
condor 95629 0.0 0.0 45156 6484 ? Ss 16:23 0:00 \_ condor_job_router
----------------------------------------------------------------------
The host should now be ready for running test jobs submitted with X509 / VOMS proxies
and/or
SciTokens
according to the implemented configuration.