LCG Web>DPMupgrade>FirstUpgradeCircle (2020-08-25, JuliaAndreeva)

DPM upgrade task force

DPM upgrade task force

Introduction

WLCG is currently lacking a complete description of the storage services required for any kind of the LHC computing activities. Another important functionality which has to be implemented, is storage space accounting which would provide usage and capacity of all WLCG storage resources and would work across LHC experiments and GRID middleware platforms.

WLCG storage topology description as well and WLCG storage space accounting depend on the ability of the WLCG sites to publish their description (storage shares/space quotas and protocols which enable access to these storage shares) and storage accounting information. The requirements to the sites have been summarized in the Storage Resource Reporting (SRR) proposal document which has been discussed with the experiments and storage providers and has been agreed for implementation at the GDB in October 2017.

Among other storage implementations DPM is the most advanced one for providing an implementation of SRR. The new core of DPM - Disk Operations Management Engine (DOME) enables SRR publishing both for storage description and accounting information and and runs much smoother for HTTP, xrootd, gridftp. Enabling DOME requires reconfiguration, involving scheduled downtime, after simple package upgrade to the version 1.10.3 and higher.

Mandate of the task force

Coordinate the upgrade of the DPM sites to DPM version 1.10.3 or higher and reconfiguration required to enable DOME and correspondingly SRR.
Provide guidance and support sites for upgrade and reconfiguration
Validate SRRs published by DPM sites and make sure that they can be integrated with CRIC and the WLCG Storage Space Accounting system

Upgrade plan

The task can be addressed in two phases

Phase 1 - "early adopter"

A small number of early adopter sites plan and perform the upgrade/reconfiguration along with the DPM team. This is to gain and document experience and handle any issues which arise.

Phase 1 to be accomplished by the end of 2018

Phase 2 - "general transition"

Sites perform the necessary upgrades and reconfigurations, supported by the WG.

By summer 2019 80% of DPM storage (in terms of capacity) to be upgraded and reconfigured
By the end of 2019 80% of DPM sites to be upgraded and reconfigured
As previous experience shows the tail represented by small sites might take longer time

First phase activities

DPM sites used by the LHC VOs listed in CRIC

Click to get information for the DPM sites used by the LHC experiments in CRIC
Sites upgraded with DOME configuration

Phase 2 - General transition

Status of upgrade as of the 28th of August

Out of 55 DPM sites used by the LHC VOs 29 has upgraded to the version higher than 1.10
Out of 29 sites which upgraded to version higher than 1.10, 15 have been reconfigured to DOME
14 sites which have been upgraded but not yet re-configured for DOME according to CRIC, should be re-configured for DOME. In case they have been re-configured , but CRIC is lacking this info, CRIC has to be updated
26 sites need an upgrade and re-configuration. We create GGUS tickets for all those sites

Sites requiring upgrade and reconfiguration

Site	DPM Version (28.08.2019)	Upgrade is planned (date)	Comments	GGUS ticket	Contacts
GRIF-LPNHE	[u'1.10.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143067	grid.admin@grifNOSPAMPLEASE.fr
IN2P3 -IRES	[u'1.10.0']	Upgraded and reconfigured with DOME	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143070	grid.admin@iphcNOSPAMPLEASE.cnrs.fr
UKI-SCOTGRID-GLASGOW	[u'1.8.10', u'1.8.10']		planning to move ~90% of our capacity off DPM to a Ceph-based solution, and would rather not change our DPM configuration until after that work is complete. Plan to accomplish by the end of the year	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143076	uki-scotgrid-glasgow@physicsNOSPAMPLEASE.gla.ac.uk
UKI-SCOTGRID-ECDF	[u'1.10.0', u'1.10.0', u'1.10.0']	30th October	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143077	wlcg-support-ecdf@mlistNOSPAMPLEASE.is.ed.ac.uk
TW-NTU-HEP	[u'1.10.0']		DOME is enabled, waiting for SRR and CRIC update	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143078	sysadmin@hep1NOSPAMPLEASE.phys.ntu.edu.tw
UKI-NORTHGRID-SHEF-HEP	[u'1.8.10']	DONE	DPM decomissioned, no storage, use RAL storage instead	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143079	edg-site-admin@sheffieldNOSPAMPLEASE.ac.uk
TW-FTT	No DPM	DONE	DPM info was not uptodate. The site is running EOS SE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143080	ops@listsNOSPAMPLEASE.grid.sinica.edu.tw
Kharkov-KIPT-LCG2	[u'1.13.0']	DONE	Upgraded and reconfigured with DOME	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143081	grid_support@kiptNOSPAMPLEASE.kharkov.ua
IN2P3 -IPNL	[u'1.8.10']		Migrating to EOS. Should finish by mid of 2020	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143082	gridsupport@ipnlNOSPAMPLEASE.in2p3.fr
UKI-SOUTHGRID-BRIS-HEP	[u'1.9.0']		Run DMLite + HDFS plugin which does not support DOME. Plan to migrate to XRootD	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143083	lcg-site-admin@bristolNOSPAMPLEASE.ac.uk
IN2P3 -LPSC	[u'1.9.0']	planned before the end of November	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143084	grid.admin@lpscNOSPAMPLEASE.in2p3.fr
IR-IPM-HEP	[u'1.8.11']	DONE	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143088	grid-hep@ipmNOSPAMPLEASE.ir
GR-12-TEIKAV	[u'1.8.9']		Site is suspended	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143089	admingrid@teiemtNOSPAMPLEASE.gr
ICM	[u'1.10.0']	Upgrade is performed , re-configuration in progress		https://ggus.eu/index.php?mode=ticket_info&ticket_id=143091	plgrid-admins@icmNOSPAMPLEASE.edu.pl
Ru-Troitsk-INR-LCG2	[u'1.9.0']	DONE 27.09	DONE. Upgraded and reconfigured wth DOME	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143092	sli@inrNOSPAMPLEASE.ru
Hephy-Vienna	[u'1.10.0']		Will migrate to EOS in Q1 of 2020, DPM will be decommissioned	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143277	hephy-grid-admin@oeawNOSPAMPLEASE.ac.at
TR-10-ULAKBIM	[u'1.13.0']	Done 14.01.2020	Upgraded and reconfigured for DOME support	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143278	grid@ulakbimNOSPAMPLEASE.gov.tr
INFN-FRASCATI	[u'1.9.0']	planned before the end of December	DONE (dpm-1.13.0-1)	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143280	grid-prod@lnfNOSPAMPLEASE.infn.it
INFN-ROMA1	[u'1.13.0']	DONE		https://ggus.eu/index.php?mode=ticket_info&ticket_id=143276	grid-prod@roma1NOSPAMPLEASE.infn.it
ru-PNPI	[u'1.9.0', u'1.9.0']	consider to migrate to EOS since only ALICE storage is supported		https://ggus.eu/index.php?mode=ticket_info&ticket_id=143281	globus@pnpiNOSPAMPLEASE.nw.ru
IN2P3 -LAPP	[u'1.9.0']	13/11/2019	DONE (1.13.1)	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143282	support-grid@lappNOSPAMPLEASE.in2p3.fr
UKI-SOUTHGRID-CAM-HEP	No DPM any more	DONE	SE is decomissioned, ticket is closed. Site is running xCache.	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143283	lcg-admin@hepNOSPAMPLEASE.phy.cam.ac.uk
CYFRONET-LCG2	[u'1.13.2']	DONE	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143284	lcg-admin@cyf-krNOSPAMPLEASE.edu.pl
Australia-ATLAS	[u'1.9.0']	DONE	DONE with Dome	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143285	coepp-sysadmin@listsNOSPAMPLEASE.unimelb.edu.au
NIKHEF-ELPROD	[u'1.9.0']	Migrated to dCache	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143286	grid.sysadmin@nikhefNOSPAMPLEASE.nl
RO-07-NIPNE	[u'1.1.0']	by 01.11.2019	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143287	ciubancan@nipneNOSPAMPLEASE.ro

Sites which according to CRIC did perform an upgrade but require reconfiguration for DOME and SRR

Site	DPM Version (28.08.2019)	Reconfiguration is planned (date)	Comments	GGUS ticket	Contacts
UNIBE-LHEP	[u'1.13.2']	DOME+legacy	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143464	it-ops@lhepNOSPAMPLEASE.unibe.ch
PSNC	[u'1.13.0']			https://ggus.eu/index.php?mode=ticket_info&ticket_id=143474	egee@manNOSPAMPLEASE.poznan.pl
NCP-LCG2	[u'1.13.0']	Infrastructure problems on the site. Configuration work started but delayed unless the problem is fixed		https://ggus.eu/index.php?mode=ticket_info&ticket_id=143476	fsaeed@cernNOSPAMPLEASE.ch
BEIJING-LCG2	[u'1.12.0', u'1.12.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143463	lcg-admin@ihepNOSPAMPLEASE.ac.cn
UKI-SCOTGRID-DURHAM	[u'1.12.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143465	oper.ip3@durhamNOSPAMPLEASE.ac.uk
FMPhI -UNIBA	[u'1.13.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143477	gridmaster@dnpNOSPAMPLEASE.fmph.uniba.sk
UKI-NORTHGRID-LIV-HEP	[u'1.13.2', u'1.13.2']	DOME Configured with legacy mode still on	DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143466	gridteam@hepNOSPAMPLEASE.ph.liv.ac.uk
GR-07-UOI-HEPLAB	[u'1.13.0']	4 Oct 2019 : first re-configuration attempt to DOME failed...	Postponed unless the site migrates to Centos7, probably end of the year	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143467	grid@alphaNOSPAMPLEASE.physics.uoi.gr
TOKYO-LCG2	[u'1.12.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143468	lcg-admin@iceppNOSPAMPLEASE.s.u-tokyo.ac.jp
TW-NCHC	[u'1.12.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143470	lincy@nchcNOSPAMPLEASE.org.tw
HK-LCG2	[u'1.12.0']			https://ggus.eu/index.php?mode=ticket_info&ticket_id=143471	grid-prod@atlasNOSPAMPLEASE.cuhk.edu.hk
ZA-WITS-CORE	[u'1.13.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143478	scott.hazelhurst@witsNOSPAMPLEASE.ac.za
Taiwan-LCG2	[u'1.12.1']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143472	ops@listsNOSPAMPLEASE.grid.sinica.edu.tw
NCBJ-CIS	[u'1.12.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143473	admins@cisNOSPAMPLEASE.gov.pl
BUDAPEST	[u'1.13.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143657	gridadm@rmkiNOSPAMPLEASE.kfki.hu
UKI-SOUTHGRID-OX-HEP	[u'1.12.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=143656	lcg_manager@physicsNOSPAMPLEASE.ox.ac.uk
UKI-LT2-Brunel	[u'1.13.0']		DONE	https://ggus.eu/index.php?mode=ticket_info&ticket_id=144007	lcg-admin@brunelNOSPAMPLEASE.ac.uk

Recommended configuration

Official Documentation for the upgrade documentation

ATLAS (Rucio)
- Use at least DOME DPM 1.12.1 + XRootD 4.9.0 + davix 0.7.3 ... latest (stable) versions from EPEL recommended
  - dmlite is linked with xrootd packages available at release date and by moving to the latest dmlite it is necessery to use most recent xrootd packages
  - enable GridFTP redirection: puppet head+disknode configuration option gridftp_redirect
  - enable XRootD checksums: puppet head+disknode configuration option configure_dpm_xrootd_checksum (enabled by default since DPM 1.13)
  - optionally enable TPC XRootD delegation: puppet disknode configuration option configure_dpm_xrootd_delegation (enabled by default since DPM 1.13)
  - to support IPv4 only clients with enabled IPV6 in GridFTP plugin (default in gfal-2.17 and Dirac middleware) on dualstack DPM epsv_match must be enabled, see LCGDM-2817
- AGIS configuration (example for SE, panda)
  - GridFTP preferred protocol with priority 0 for tpc activities (requires GridFTP redirection)
  - XRootD for lan and wan read+write (write works only with XRootD checksums enabled)
  - rucio mover for panda queues (rucio mover use storage protocols according preferences defined in AGIS)
  - each protocol in AGIS should have monitoring enabled to be part of ATLAS SAM tests
  - EGI sites should also register each SE protocol with GOCDB (example: SRM, GridFTP, XRootD, WebDAV)
  - fully SRM-less operation requires additional configuration of the Storage Resource Reporting (SRR)
    - use cron to generate at least hourly storagesummary.json with SRR info by dpm-storage-summary.py script
    - since DOME DPM 1.13.2 SRR info automatically available via HTTP CGI and cron config mentioned above is no longer necessary
    - modify "Space method" and "Space Usage" for each DDM endpoint in AGIS ( example)
      - Space method: storage
      - Space Usage: URL of your storagesummary.json
- Argus blacklisting implemented in DOME DPM 1.13.3
- used in production since February 26 2019 at PRAGUELCG2
  - DOME enabled in June 2018, but without GridFTP redirection
  - troubles with DOME DPM 1.11 fixed in 1.12 - stable since March 5 (including GridFTP redirection)
    - 1.12 had still some non-critical issues with known workarounds (see "Detected problems" section)
    - fixed in DOME DPM 1.13 and on our production DPM since July 12
  - monitoring
    - SAM, check_mk
    - source / destination transfers
CMS (PhEDEx)
Dirac users
- internally use GFAL for transfers (unless you still use deprecated protocols)
- if your DPM supports IPv4 + IPv6 be avare IPv4 only clients can't access data using gsiftp protocol unless you follow instruction in LCGDM-2817
- LFC catalog
  - deprecated & EOL - you should think about migration
  - full file URL stored in catalog - can't easily switch from SRM protocol
- DFC catalog
  - possible to configure non-SRM transfer protocols
  - with GridFTP redirection enabled in DPM it should be almost transparent switching from GFAL2_SRM2 to GFAL2_GSIFTP

After reconfiguration for DOME make sure that SRR is enabled

How to enable SRR

After changes performed on your service, please, update information in CRIC

Authentication & authorization step

Go to WLCG cric server , click Core (menu on the top of the page) -> Services. Enable filtering, by clicking on the 'Filter' button and select your site. By default , you won't see implementation and implementation version columns in the table. In order to see this info, you need to click on 'Columns' and then select corresponding columns in the drop down list.

You should be able to list all CRIC entities (sites (GocDB /OIM and experiment-specific ones), federations, pledges, services, storage protocols and queues) without authentication. However, once you would like to see details of any particular entity, you would be asked to login.

Those who are registered in the CERN DB, please, use SSO authentication. Authentication with certificate is not yet enabled on this instance, will come soon.

Those who are not registered in the CERN DB would need to ask for CRIC local account. Please, send a mail to

cric-devs@cernNOSPAMPLEASE.ch with your name, family name and mail address to be used by CRIC to communicate with you.

As soon as you are logged in, you will be able to see details of any CRIC entity, however in order to edit in order to edit information, one would need to get specific privileges. * As soon as you are authenticated, you will see 'Request privileges' on the top of the page next to your login name. Please, click on it and follow up the request procedure which allows to request global admin privileges, site admin privileges or federation admin privileges. Ask for sites admin privileges for your site. You will be shortly informed that your privileges are enabled. Please re-login.

Editing storage info

Once you login with appropriate privileges, you should be able to edit information about your site. At the moment we are particularly interested in storage info at your site, namely its implementation, implementation version and SRR URL when it enabled.

CRIC creates virtual storage service per site/per VO/per media/per implementation. By default it creates 1 disk and 1 tape virtual storage for every VO which is served by a given T1 site. However, if for a given VO there are storage instances for the same media but different implementation (for example EOS and dCache instances for disk storage for ATLAS), CRIC should create two different disk virtual storage instances for this VO. Unfortunately, for the moment, there is no reliable primary source for this kind of information, so it is highly likely that only a single virtual storage will be created by CRIC in such cases. Would be great if you could correct it using CRIC UI and add other storage virtual instances with their implementation , implementation versions for your site and SRR URL when it is enabled. In the future we hope to get this information through SRR (Storage Resource Reporting).

In the service table view, click on a particular service name
You get a form with detailed information about service
Click on the 'Edit' button under the first block of information
You get another form. Please, correct 'Version' of your DPM implementation. In case Dome is enabled, please provide version number complemented 'with DOME' and provide "Resource Reporting URL" value
Click on 'Check input data' and save info

Creating a new virtual storage instance in CRIC

Staring from the entry page: https://wlcg-cric.cern.ch/
- in the horizontal menu on the top of the page, select 'Core' -> 'Create Storage Service'. You get a form to fill in
Keep service name field empty as the form suggests
Select your site form the drop down menu
Service type (SE) should not be touched
Select Disk or Tape media in the "Architecture" filed drop down menu
Provide value for implementation (EOS, Castor, Xrootd, dCache, DPM)
Provide value for implementation version
You can provide a value in the endpoint field or leave it empty if it does not make sense
Please, select value for the VO name. As mentioned above, the virtual storage in CRIC is created for a single VO even though several VOs can share the same physical storage service of the site
Leave 'ACTIVE' object state
All other attributes are optional, you can leave them empty

Creating a new protocol for a given virtual storage in CRIC

Currently, even if there is one single protocol shared by several virtual storage instance in CRIC, for each

virtual storage instance a new protocol instance has to be created

To create new protocol for a given virtual storage instance, select corresponding service in the service list and click on the name to get a detailed description of the virtual storage service
Below the table with the list of protocols, click on the 'Add protocol' button. You will get a form to fill.
Leave the name of the protocol empty, the system will generate it for you
"Flavour" and "endpoint" are mandatory attributes, other fields could be empty

Deleting virtual storage instance from CRIC

For the time being , deletion from the UI is not allowed. Change the object state to "Disabled" in order to make it disappear from the listing

Deleting protocols from CRIC

The protocol attached to a particular virtual storage can be deleted from the protocol list from the detailed page describing the virtual storage service.

DPM service monitoring for EGI.

In order to enable DPM service monitoring for EGI one needs to configure webdav (HTTPS) for the ops VO and register the endpoint on GOCDB

Registration of webdav service endpoint in GocDB

For registering on GOC-DB the webdav service endpoint, follow the HOWTO21 in order to fill in the proper information.

In particular:

register a new service endpoint, separated from the SRM one and with the "production" flag disabled;
on GOC-DB fill in the webdav URL containing also the VO ops folder. Example
it corresponds to the value of GLUE2 attribute GLUE2EndpointURL (containing the used port and without the VO folder);
verify that the webdav url is properly accessible.
check if the tests are ok (https://argo-mon.egi.eu/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_webdav&style=detail )and then switch the production flag to "yes"

Enable gridftp monitoring for ops VO (if you provide such protocol)

register a new service endpoint, associating the storage element hostname to the service type “globus-GRIDFTP”, with the "production" flag disabled;
in the “Extension Properties” section of the service endpoint page, fill in the following fields:
Name: SE_PATH
Value: /dpm/ui.savba.sk/home/ops #this is an example, set the proper path
check if the tests are ok (it might take some hours for detecting the new service endpoint) and then switch the production flag to "yes"

Detected problems

PRAGUELCG2
- forgot to tune open file handles limit on dedicated DB machine (during DPM upgrade we moved DB from SLC6 to CC7)
- 1.12 - dmlite-mysql-dirspaces.py doesn't fix data assigned to wrong (empty) spacetoken which can/will cause problems for SRM transfers
- 1.13 -Slowly diverging space counters (size(parentdir) != sum(size(childdir))) e.g. for ATLASDATADISK using pydmlite API or direct DB query - LCGDM-2801
- GridFTP transfers confirmed OK to client before processed by DPM (e.g. DB update for filesize, replica status, ...) LCGDM-2818, may be related to LCGDM-1961
  - GridFTP dsi plugin bug and ugly workaround in DPM doesn't always work
  - more details in dpm-devel thread "GridFTP redirection & Globus race condition avoided"
  - lead to lost files at our site when replica status was not updated by DPM (reason not fully understood), but for FTS/Rucio this was successful transfer
  - diverging directory space usage for concurrent GridFTP uploads LCGDM-2730 (may be related to this issue)
  - DPM developers suggest checksum query after upload as a workaround - code updated in 1.13
- unable to access dualstack DPM with gsiftp from IPv4 only machines with IPv6 enabled in GFAL ("GRIDFTP PLUGIN:IPV6=true") LCGDM-2817
  - dpm-devel thread "DPM GSIFTP IPv6 Behaviour"
    IPv6 GFAL config hardcoded in Dirac sources (discussed in diracgrid-forum)
- 1.11 + xrootd 4.9.x - xrootd checksums not implemented LCGDM-2726
- 1.12 - Unable to correctly disable gridftp redirection with puppet
- 1.12 - Problems with default non-optimal configuration options LCGDM-2743, LCGDM-2745, GGUS:139803
- - SRM upload/deletion doesn't immediately update directory size LCGDM-2731 - db updated => space accounting should work find
- - Removing directory with SRM not seen by DAVS LCGDM-2732
  - can't be fixed (don't mix SRM with DOME protocols at least within one directory / VO)
  - this sequence of commands also fails: gfal-stat root://host//file; gfal-copy file:///file srm://host/file; gfal-stat root://host//file
    - used during rucio upload with SRM preferred for writing and XRootD for reading
- 1.12 - Directory size not correctly updated with gfal-rename LCGDM-2733
- 1.12 - Unable to disable bad filesystem with DOME LCGDM-2740
- 1.13 SRR script robustness LCGDM-2714, LCGDM-2744 (mitigated by separate quotatoken for SRR recommended in documentation)
- 1.12 - DPM hammer itself with checksum requests LCGDM-2747
- 1.13 - Checksum issues (deadlock, popen3 blocking read) LCGDM-2791
- - Confusing usage of the head.db.poolsz configuration option LCGDM-2749 (I'm not really happy with the answer)
- 1.12 - Database connections are not reused - individual connection for each query LCGDM-2754
- 1.13 - Disabled DPM disknodes can be selected for GridFTP transfers LCGDM-2748
- 1.13 - Complain when setting quotatokens at a level deeper than the one where space is calculated LCGDM-2727
- 1.13 Problematic graceful restart of httpd LCGDM-2699, LCGDM-2707 + "DPM http graceful restart" discussion in dpm-upgrade mailing list
  - problems caused by sub-optimal default Apache configuration (ServerStart == ServerLimit can by design cause troubles during graceful restart)
    - processes not terminated till at least one thread deals with existing transfer
    - some transfers can hang forever LCGDB-2787 (rare but happens, probably fixed in lcgdm-dav 0.23)
    - apache is already at the ServerLimit and hesitate to fork new processes
      - quite old Apache bug makes this situation worse (no plans to fix it in CentOS7)
      - more graceful restart in short time range leads to zero apache childs that accepts new connections
  - testing configuration with ServerLimit = 2.5*ServerStart and MaxRequestWorkers = ServerLimit*ThreadLimit
- lcgdm-dav 0.23 graceful restart cat get stuck forever in case curl transfer hangs LCGDB-2787
- 1.12.1 - Failing WLCG / Argo webdav tests LCGDM-2751
- CentOS7 Apache 2.4.6 kills transfers from DPM to local filesystem during graceful restart
  - works fine with Apache 2.4.35 from RHEL8 recompiled for CentOS7
- apache memory usage on SLC6 LCGDM-2783 (workaround would be CentOS7 upgrade of all disknodes, )
- davix 0.7.3 - libdavix used internally by DPM for communication with DOME can cause daemon crashes with same backtrace as grid-hammer
  - happens only if case something gets stuck inside DPM and there are hundreds of DOME API requests
- installing legacy dmlite config files for DOME DPM LCGDM-2781
- file access failing after headnode restart & before enabling disknodes LCGDM-2792
- stale gridftp processes when gridftp redirection is enabled LCGDM-1988, LCGDM-2826
- cached results concurrency issue with delete / copy / stat LCGDM-2828
- atomic size updates for all parent directories LCGDM-2878
- file rename transaction can fail, but client still receive success (silent data loss with Rucio) LCGDM-2869
INFN-NAPOLI-ATLAS
- 1.12 - spacetoken vs. quotatoken size caused problems with SRM transfers
- DPM 1.12 - SRM reads physical disk empty space only during startup and is not aware of DOME transfers => failing SRM transfers that tried to use full disk LCGDM-2752
  - ~~this can be solved for ATLAS only DPM by movig to pure GridFTP~~
  - ~~sites where one VO needs SRM and the other use DOME - can be solved only by separating SRM VO vs. DOME VO to different pools~~
  - ~~sites that use both (SRM and non-SRM) transfers this issue currently have no reliable solution except for regular restarts of the legacy DPM~~
- 1.11 - DomeUserInfo::userid & DomeUserInfo::groupid should be a larger int type LCGDM-2717, LCGDM-2718
- 1.12 - Allow file deletion on readonly filesystems LCGDM-2734
Brunel
- stability / performance issues solved only with DPM 1.12 + XRootD 4.9.0 (details in dpm-upgrade mailing list)
Lancaster
- crashing xrootd daemon with DOME DPM 1.11 (1.12pre)
  - old SLC6 disknodes not managed by puppet
  - loading adapter plugin (part of legacy DPM)
- later stability issues with 1.12.1 (may be LCGDM-2791) solved by upgrade to 1.13.0
Beijing
- missing quotatoken definition for dteam (problems with TPC)
IRFU (GRIF)
- Files left on filesystem after draining LCGDM-2777, LCGDM-2778, LCGDM-2779 (Understood and more friendly in the next release 1.13)
KEK
- Host DN is not authorized for certificates with subject not matching "CN=headnode" and "CN=disknode" LCGDM-2790
  - Japanese CA issue host certificates with subject that ends with "CN=host/fqdn"
  - workaround it DN whitelist with glb.auth.authorizeDN[] DOME configuration option
  - 1.13 - added configuration options glb.auth.dnmatch-cnprefix and glb.auth.dnmatch-cnsuffix
Manchester
- troubles with authorization that seems to be related to the machine aliases, authorizedDN and certificates
AUVERGRID
- dpm-tester.py doesn't work correctly for GridFTP in case DPM headnode is also used as disknode
Oxford
- failing SRM after enabling gridftp redirection
Cosenza
- skipped adding quotatoken path before running dmlite-mysql-dirspaces (next version of this script provides better error message)
IN2P3 -LPC
- problem with t_space size for ATLASDATADISK (not sure what happened, but it was fixed after restoring database dump)

Other notes

IN2P3 -CPPM was using DOME with DPM 1.9.2

Meetings

DPM upgrade task force for WLCG Storage Reporting
March 7, 2019 - WLCG Operations Coordination with status update from DPM developers

Useful links

Participants

Fabrizio Furano (DPM)
Oliver Keeble (DPM and WLCG Steering Group)
Dimitrios Christidis (WLCg Storage Space Accounting)
Julia Andreeva (WLCG Operations Coordination)

GOCDB DPM reachable by srmPing on February 26

Site	Headnode	Size	DPM Version	XRootD protocol	GridFTP
TOKYO-LCG2	lcg-se01.icepp.jp	10559536	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.5
ICN-UNAM	tlapiacalli.nucleares.unam.mx	5248671	N/A	N/A	GridFTP Server 11.1
GRIF	node12.datagrid.cea.fr	4554739	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
UKI-NORTHGRID-MAN-HEP	bohr3226.tier2.hep.manchester.ac.uk	4537796	N/A	xrootd/0x30000	GridFTP Server 9.1
praguelcg2	golias100.farm.particle.cz	4456037	DPM/1.10.0-1	xrootd/0x40000	GridFTP Server 13.9
UKI-SCOTGRID-GLASGOW	svr018.gla.scotgrid.ac.uk	3816622	DPM/1.8.10-1	xrootd/0x10030000	GridFTP Server 12.4
Taiwan-LCG2	f-dpm001.grid.sinica.edu.tw	3269468	DPM/1.8.11-1	N/A	N/A
TR-10-ULAKBIM	torik1.ulakbim.gov.tr	3161019	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 11.3
INDIACMS-TIFR	se01.indiacms.res.in	3107872	DPM/1.9.0-1	xrootd/0x10030000	N/A
UKI-NORTHGRID-LANCS-HEP	fal-pygrid-30.lancs.ac.uk	3074570	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
IN2P3-CPPM	marsedpm.in2p3.fr	2642392	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.4
INFN-NAPOLI-ATLAS	t2-dpm-01.na.infn.it	2399002	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
RO-07-NIPNE	tbit00.nipne.ro	2367783	N/A	xrootd/0x30000	GridFTP Server 7.26
NIKHEF-ELPROD	tbn18.nikhef.nl	2353756	DPM/1.9.0-1	xrootd/0x10030000	N/A
IN2P3-LAPP	lapp-se01.in2p3.fr	2187150	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.5
GRIF	lpnse1.in2p3.fr	2113058	DPM/1.10.0-1	xrootd/0x10030000	N/A
INFN-FRASCATI	atlasse.lnf.infn.it	2111236	N/A	xrootd/0x10030000	GridFTP Server 11.8
IN2P3-IRES	sbgse1.in2p3.fr	2032105	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
IN2P3-LPC	clrlcgse01.in2p3.fr	1887617	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.5
INFN-ROMA1	grid-cert-03.roma1.infn.it	1701923	DPM/1.13.0-1	xrootd/0x10030000	GridFTP Server 13.20
GRIF	polgrid4.in2p3.fr	1682184	DPM/1.10.0-1	N/A	GridFTP Server 13.9
NCBJ-CIS	se.cis.gov.pl	1520624	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 13.9
UKI-LT2-RHUL	se2.ppgrid1.rhul.ac.uk	1460022	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 11.8
Australia-ATLAS	agh3.atlas.unimelb.edu.au	1433885	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.4
UKI-NORTHGRID-LIV-HEP	hepgrid11.ph.liv.ac.uk	1425584	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
UKI-LT2-Brunel	dc2-grid-64.brunel.ac.uk	1377142	DPM/1.10.4-1	xrootd/0x40000	GridFTP Server 13.9
GRIF	grid05.lal.in2p3.fr	1303158	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
HK-LCG2	se01.atlas.cuhk.edu.hk	1191172	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
IN2P3-LPSC	lpsc-se-dpm-server.in2p3.fr	1164763	N/A	xrootd/0x30000	GridFTP Server 11.3
UKI-SCOTGRID-ECDF	srm.glite.ecdf.ed.ac.uk	1141829	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.8
BUDAPEST	grid143.kfki.hu	1105939	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.8
TW-NCUHEP	grid71.phy.ncu.edu.tw	1099803	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 11.8
UNIBE-LHEP	dpm.lhep.unibe.ch	1070870	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.5
UKI-SCOTGRID-ECDF	srm-rdf.gridpp.ecdf.ed.ac.uk	1066397	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.8
Hephy-Vienna	hephyse.oeaw.ac.at	988060	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.8
Kharkov-KIPT-LCG2	cms-se0.kipt.kharkov.ua	955312	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 13.8
UKI-SOUTHGRID-OX-HEP	t2se01.physics.ox.ac.uk	939542	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.4
IN2P3-IPNL	lyogrid06.in2p3.fr	936017	N/A	xrootd/0x30000	GridFTP Server 9.1
BEIJING-LCG2	ccsrm.ihep.ac.cn	775972	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.8
N/A	lcgse01.phy.bris.ac.uk	728091	N/A	xrootd/0x10030000	N/A
UKI-SCOTGRID-DURHAM	se01.dur.scotgrid.ac.uk	645619	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
CYFRONET-LCG2	se01.grid.cyfronet.pl	627881	N/A	xrootd/0x30000	GridFTP Server 10.4
FMPhI-UNIBA	lcgdpmse.dnp.fmph.uniba.sk	624016	N/A	xrootd/0x10030000	GridFTP Server 13.8
TW-NCHC	se01.grid.nchc.org.tw	575234	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.8
TR-03-METU	eymir.grid.metu.edu.tr	537945	DPM/1.8.11-1	xrootd/0x10030000	GridFTP Server 11.3
UKI-NORTHGRID-SHEF-HEP	lcgse0.shef.ac.uk	531351	N/A	xrootd/0x30000	GridFTP Server 9.4
INFN-COSENZA	recas-se-01.cs.infn.it	447993	N/A	xrootd/0x10030000	GridFTP Server 13.8
CAMK	se.cta.camk.edu.pl	408004	N/A	N/A	GridFTP Server 9.1
PSNC	se.reef.man.poznan.pl	407575	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
RO-02-NIPNE	baaf02.nipne.ro	404017	N/A	xrootd/0x10030000	GridFTP Server 13.9
RECAS-NAPOLI	belle-dpm-01.na.infn.it	399981	DPM/1.8.11-1	xrootd/0x30000	GridFTP Server 10.4
Ru-Troitsk-INR-LCG2	grse001.inr.troitsk.ru	330473	N/A	xrootd/0x10030000	GridFTP Server 11.8
RECAS-NAPOLI	recas-km3netse01.na.infn.it	319985	N/A	xrootd/0x30000	GridFTP Server 9.4
prague_cesnet_lcg2	dpm1.egee.cesnet.cz	306985	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 12.8
CBPF	se.cat.cbpf.br	283972	DPM/1.8.9-1	xrootd/0x97020000	GridFTP Server 7.25
NCP-LCG2	pcncp22.ncp.edu.pk	266159	N/A	xrootd/0x10030000	GridFTP Server 13.9
ICM	se.grid.icm.edu.pl	251036	N/A	xrootd/0x10030000	N/A
UKI-SOUTHGRID-CAM-HEP	serv02.hep.phy.cam.ac.uk	248775	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
GR-07-UOI-HEPLAB	grid02.physics.uoi.gr	206898	N/A	xrootd/0x10030000	GridFTP Server 13.9
CBPF	se02.cat.cbpf.br	192017	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
TW-NTU-HEP	ntugrid6.phys.ntu.edu.tw	144006	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 12.6
UA_ICYB_ARC	se.uagrid.org.ua	135734	N/A	N/A	GridFTP Server 11.8
RO-13-ISS	seau.spacescience.ro	119522	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.4
GR-12-TEIKAV	se.grid.teiemt.gr	110972	DPM/1.8.9-1	xrootd/0x30000	GridFTP Server 7.18
HEPHY-UIBK	grid01.uibk.ac.at	99012	N/A	xrootd/0x10030000	GridFTP Server 12.5
N/A	lapp-testse01.in2p3.fr	59624	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.5
ru-PNPI	cluster.pnpi.nw.ru	56006	DPM/1.9.0-1	N/A	N/A
Australia-T2	coepp-dpm-01.ersa.edu.au	54973	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.4
UA-BITP	se.bitp.kiev.ua	41985	N/A	xrootd/0x30000	GridFTP Server 12.4
CYFRONET-LCG2	se03.grid.cyfronet.pl	31394	N/A	xrootd/0x10030000	GridFTP Server 12.5
UA-NSCMBR	se.biomed.kiev.ua	29770	DPM/1.9.0-1	N/A	GridFTP Server 11.8
AUVERGRID	cirigridse01.univ-bpclermont.fr	19790	DPM/1.10.0-1	N/A	GridFTP Server 13.9
GRIF	ipnsedpm.in2p3.fr	18704	DPM/1.9.0-1	xrootd/0x10030000	N/A
UNINA-EGEE	se.scope.unina.it	17307	DPM/1.8.8-1	N/A	GridFTP Server 6.38
HG-02-IASA	se01.marie.hellasgrid.gr	9998	DPM/1.8.10-1	N/A	GridFTP Server 12.5
UMB-BB	se.grid.umb.sk	8793	N/A	N/A	GridFTP Server 12.4
UPorto	hades.up.pt	8356	N/A	N/A	GridFTP Server 13.9
MA-01-CNRST	se1.cnrst.magrid.ma	7749	N/A	N/A	GridFTP Server 9.1
GRIDIFIN	segi.nipne.ro	7497	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.8
MA-04-CNRST-ATLAS	atlas-se1.cnrst.magrid.ma	6780	N/A	N/A	GridFTP Server 9.1
OBSPM	se-dpm-server-grid.obspm.fr	6594	DPM/1.9.0-1	N/A	GridFTP Server 11.8
CESGA	se2.egi.cesga.es	6214	DPM/1.8.10-1	N/A	GridFTP Server 9.1
AEGIS03-ELEF-LEDA	grid02.elfak.ni.ac.rs	5946	DPM/1.8.10-1	xrootd/0x30000	GridFTP Server 9.1
TASK	se.grid.task.gda.pl	2197	N/A	N/A	GridFTP Server 9.1
GRISU-UNINA	grisuse.scope.unina.it	2163	DPM/1.8.8-1	N/A	GridFTP Server 6.38
DZ-01-ARN	se01.grid.arn.dz	2063	DPM/1.10.0-1	N/A	GridFTP Server 13.9
RO-13-ISS	grid02.spacescience.ro	1969	DPM/1.10.0-1	N/A	GridFTP Server 13.8
CIRMMP	se-enmr.cerm.unifi.it	1475	DPM/1.8.10-1	N/A	GridFTP Server 11.1
GARR-01-DIR	gridsrv3-4.dir.garr.it	1268	DPM/1.8.10-1	N/A	GridFTP Server 9.1
UA-ISMA	gl-dpm.isma.kharkov.ua	1098	N/A	N/A	GridFTP Server 12.2
HK-HKU-CC-01	glite01.grid.hku.hk	1082	N/A	N/A	GridFTP Server 13.8
RO-03-UPB	se01.grid.pub.ro	1082	N/A	N/A	GridFTP Server 9.1
N/A	prod-se-03.ct.infn.it	1056	N/A	N/A	GridFTP Server 9.4
WCSS64	darkmass.wcss.wroc.pl	579	N/A	N/A	GridFTP Server 12.4
CNR-ILC-PISA	gridse.ilc.cnr.it	436	DPM/1.9.0-1	N/A	GridFTP Server 12.5
IR-IPM-HEP	se1.particles.ipm.ac.ir	301	DPM/1.8.11-1	xrootd/0x30000	GridFTP Server 11.3
USC-LCG2	se-emi.igfae.usc.es	298	DPM/1.10.0-1	N/A	GridFTP Server 13.8
WUT	alix.if.pw.edu.pl	169	N/A	N/A	GridFTP Server 6.38
RECAS-NAPOLI	recasna-se01.unina.it	150	DPM/1.8.7-3	N/A	GridFTP Server 6.38
NCP-LCG2	se02.ncp.edu.pk	134	N/A	xrootd/0x10030000	GridFTP Server 12.2
TU-Kosice	dpm.grid.tuke.sk	107	N/A	N/A	N/A
IISAS-Bratislava	se-sivvp.ui.savba.sk	107	N/A	N/A	GridFTP Server 13.8
NIHAM	alice003.nipne.ro	105	DPM/1.8.10-1	N/A	GridFTP Server 9.1
AEGIS02-RCUB	grid15.rcub.bg.ac.rs	102	N/A	N/A	GridFTP Server 6.38
NCP-LCG2	pcncp23.ncp.edu.pk	84	N/A	N/A	GridFTP Server 13.9
AstrogridPUC	astrose.astro.puc.cl	53	N/A	xrootd/0x10030000	GridFTP Server 13.9
UA-KNU	se.univ.kiev.ua	52	N/A	N/A	GridFTP Server 13.8
RO-11-NIPNE	lhcb-se.nipne.ro	52	N/A	N/A	GridFTP Server 13.8
USC-LCG2	se.igfae.usc.es	52	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
SUPERCOMPUTO-UNAM	se.grid.unam.mx	26	N/A	N/A	GridFTP Server 7.26
GRID-UNAM	dpm.grid.unam.mx	8	N/A	N/A	GridFTP Server 7.25
CYFRONET-LCG2	dpm.cyf-kr.edu.pl	N/A	DPM/1.8.9-1	xrootd/0x97020000	GridFTP Server 7.18
Australia-T2	b2se.mel.coepp.org.au	N/A	DPM/1.9.0-1	xrootd/0x10030000	GridFTP Server 12.4
AEGIS01-IPB-SCL	dpm.ipb.ac.rs	0	DPM/1.8.10-1	N/A	N/A
GRIF	grid03.lal.in2p3.fr	N/A	DPM/1.10.0-1	xrootd/0x10030000	GridFTP Server 13.9
HG-05-FORTH	se01.ariagni.hellasgrid.gr	N/A	DPM/1.8.10-1	N/A	GridFTP Server 9.1

-- JuliaAndreeva - 2018-09-25 -- JuliaAndreeva - 2020-08-25

Topic revision: r1 - 2020-08-25 - JuliaAndreeva

LCG Wikis

LCG Service
Coordination

LCG Grid
Deployment

LCG
Apps Area

Public webs

Welcome Guest

- Cern Search
- TWiki Search
- Google Search
LCG All webs

Copyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback