DPM upgrade task force

Introduction

WLCG is currently lacking a complete description of the storage services required for any kind of the LHC computing activities. Another important functionality which has to be implemented, is storage space accounting which would provide usage and capacity of all WLCG storage resources and would work across LHC experiments and GRID middleware platforms.

WLCG storage topology description as well and WLCG storage space accounting depend on the ability of the WLCG sites to publish their description (storage shares/space quotas and protocols which enable access to these storage shares) and storage accounting information. The requirements to the sites have been summarized in the Storage Resource Reporting (SRR) proposal document which has been discussed with the experiments and storage providers and has been agreed for implementation at the GDB in October 2017.

Among other storage implementations DPM is the most advanced one for providing an implementation of SRR. The new core of DPM - Disk Operations Management Engine (DOME) enables SRR publishing both for storage description and accounting information and and runs much smoother for HTTP, xrootd, gridftp. Enabling DOME requires reconfiguration, involving scheduled downtime, after simple package upgrade to the version 1.10.3 and higher.

Mandate of the task force

  • Coordinate the upgrade of the DPM sites to DPM version 1.10.3 or higher and reconfiguration required to enable DOME and correspondingly SRR.
  • Provide guidance and support sites for upgrade and reconfiguration
  • Validate SRRs published by DPM sites and make sure that they can be integrated with CRIC and the WLCG Storage Space Accounting system

Upgrade plan

The task can be addressed in two phases

  • Phase 1 - "early adopter"
A small number of early adopter sites plan and perform the upgrade/reconfiguration along with the DPM team. This is to gain and document experience and handle any issues which arise.

Phase 1 to be accomplished by the end of 2018

  • Phase 2 - "general transition"
Sites perform the necessary upgrades and reconfigurations, supported by the WG.

  • By summer 2019 80% of DPM storage (in terms of capacity) to be upgraded and reconfigured
  • By the end of 2019 80% of DPM sites to be upgraded and reconfigured
  • As previous experience shows the tail represented by small sites might take longer time

First phase activities

DPM sites used by the LHC VOs listed in CRIC

Phase 2 - General transition

Status of upgrade as of the 28th of August

  • Out of 55 DPM sites used by the LHC VOs 29 has upgraded to the version higher than 1.10
  • Out of 29 sites which upgraded to version higher than 1.10, 15 have been reconfigured to DOME
  • 14 sites which have been upgraded but not yet re-configured for DOME according to CRIC, should be re-configured for DOME. In case they have been re-configured , but CRIC is lacking this info, CRIC has to be updated
  • 26 sites need an upgrade and re-configuration. We create GGUS tickets for all those sites

Sites requiring upgrade and reconfiguration

Site DPM Version (28.08.2019) Upgrade is planned (date) Comments GGUS ticket Contacts
GRIF-LPNHE [u'1.10.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143067 grid.admin@grifNOSPAMPLEASE.fr
IN2P3 -IRES [u'1.10.0'] Upgraded and reconfigured with DOME DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143070 grid.admin@iphcNOSPAMPLEASE.cnrs.fr
UKI-SCOTGRID-GLASGOW [u'1.8.10', u'1.8.10']   planning to move ~90% of our capacity off DPM to a Ceph-based solution, and would rather not change our DPM configuration until after that work is complete. Plan to accomplish by the end of the year https://ggus.eu/index.php?mode=ticket_info&ticket_id=143076 uki-scotgrid-glasgow@physicsNOSPAMPLEASE.gla.ac.uk
UKI-SCOTGRID-ECDF [u'1.10.0', u'1.10.0', u'1.10.0'] 30th October DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143077 wlcg-support-ecdf@mlistNOSPAMPLEASE.is.ed.ac.uk
TW-NTU-HEP [u'1.10.0']   DOME is enabled, waiting for SRR and CRIC update https://ggus.eu/index.php?mode=ticket_info&ticket_id=143078 sysadmin@hep1NOSPAMPLEASE.phys.ntu.edu.tw
UKI-NORTHGRID-SHEF-HEP [u'1.8.10'] DONE DPM decomissioned, no storage, use RAL storage instead https://ggus.eu/index.php?mode=ticket_info&ticket_id=143079 edg-site-admin@sheffieldNOSPAMPLEASE.ac.uk
TW-FTT No DPM DONE DPM info was not uptodate. The site is running EOS SE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143080 ops@listsNOSPAMPLEASE.grid.sinica.edu.tw
Kharkov-KIPT-LCG2 [u'1.13.0'] DONE Upgraded and reconfigured with DOME https://ggus.eu/index.php?mode=ticket_info&ticket_id=143081 grid_support@kiptNOSPAMPLEASE.kharkov.ua
IN2P3 -IPNL [u'1.8.10']   Migrating to EOS. Should finish by mid of 2020 https://ggus.eu/index.php?mode=ticket_info&ticket_id=143082 gridsupport@ipnlNOSPAMPLEASE.in2p3.fr
UKI-SOUTHGRID-BRIS-HEP [u'1.9.0']   Run DMLite + HDFS plugin which does not support DOME. Plan to migrate to XRootD https://ggus.eu/index.php?mode=ticket_info&ticket_id=143083 lcg-site-admin@bristolNOSPAMPLEASE.ac.uk
IN2P3 -LPSC [u'1.9.0'] planned before the end of November DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143084 grid.admin@lpscNOSPAMPLEASE.in2p3.fr
IR-IPM-HEP [u'1.8.11'] DONE DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143088 grid-hep@ipmNOSPAMPLEASE.ir
GR-12-TEIKAV [u'1.8.9']   Site is suspended https://ggus.eu/index.php?mode=ticket_info&ticket_id=143089 admingrid@teiemtNOSPAMPLEASE.gr
ICM [u'1.10.0'] Upgrade is performed , re-configuration in progress   https://ggus.eu/index.php?mode=ticket_info&ticket_id=143091 plgrid-admins@icmNOSPAMPLEASE.edu.pl
Ru-Troitsk-INR-LCG2 [u'1.9.0'] DONE 27.09 DONE. Upgraded and reconfigured wth DOME https://ggus.eu/index.php?mode=ticket_info&ticket_id=143092 sli@inrNOSPAMPLEASE.ru
Hephy-Vienna [u'1.10.0']   Will migrate to EOS in Q1 of 2020, DPM will be decommissioned https://ggus.eu/index.php?mode=ticket_info&ticket_id=143277 hephy-grid-admin@oeawNOSPAMPLEASE.ac.at
TR-10-ULAKBIM [u'1.13.0'] Done 14.01.2020 Upgraded and reconfigured for DOME support https://ggus.eu/index.php?mode=ticket_info&ticket_id=143278 grid@ulakbimNOSPAMPLEASE.gov.tr
INFN-FRASCATI [u'1.9.0'] planned before the end of December DONE (dpm-1.13.0-1) https://ggus.eu/index.php?mode=ticket_info&ticket_id=143280 grid-prod@lnfNOSPAMPLEASE.infn.it
INFN-ROMA1 [u'1.13.0'] DONE   https://ggus.eu/index.php?mode=ticket_info&ticket_id=143276 grid-prod@roma1NOSPAMPLEASE.infn.it
ru-PNPI [u'1.9.0', u'1.9.0'] consider to migrate to EOS since only ALICE storage is supported   https://ggus.eu/index.php?mode=ticket_info&ticket_id=143281 globus@pnpiNOSPAMPLEASE.nw.ru
IN2P3 -LAPP [u'1.9.0'] 13/11/2019 DONE (1.13.1) https://ggus.eu/index.php?mode=ticket_info&ticket_id=143282 support-grid@lappNOSPAMPLEASE.in2p3.fr
UKI-SOUTHGRID-CAM-HEP No DPM any more DONE SE is decomissioned, ticket is closed. Site is running xCache. https://ggus.eu/index.php?mode=ticket_info&ticket_id=143283 lcg-admin@hepNOSPAMPLEASE.phy.cam.ac.uk
CYFRONET-LCG2 [u'1.13.2'] DONE DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143284 lcg-admin@cyf-krNOSPAMPLEASE.edu.pl
Australia-ATLAS [u'1.9.0'] DONE DONE with Dome https://ggus.eu/index.php?mode=ticket_info&ticket_id=143285 coepp-sysadmin@listsNOSPAMPLEASE.unimelb.edu.au
NIKHEF-ELPROD [u'1.9.0'] Migrated to dCache DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143286 grid.sysadmin@nikhefNOSPAMPLEASE.nl
RO-07-NIPNE [u'1.1.0'] by 01.11.2019 DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143287 ciubancan@nipneNOSPAMPLEASE.ro

Sites which according to CRIC did perform an upgrade but require reconfiguration for DOME and SRR

Site DPM Version (28.08.2019) Reconfiguration is planned (date) Comments GGUS ticket Contacts
UNIBE-LHEP [u'1.13.2'] DOME+legacy DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143464 it-ops@lhepNOSPAMPLEASE.unibe.ch
PSNC [u'1.13.0']     https://ggus.eu/index.php?mode=ticket_info&ticket_id=143474 egee@manNOSPAMPLEASE.poznan.pl
NCP-LCG2 [u'1.13.0'] Infrastructure problems on the site. Configuration work started but delayed unless the problem is fixed   https://ggus.eu/index.php?mode=ticket_info&ticket_id=143476 fsaeed@cernNOSPAMPLEASE.ch
BEIJING-LCG2 [u'1.12.0', u'1.12.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143463 lcg-admin@ihepNOSPAMPLEASE.ac.cn
UKI-SCOTGRID-DURHAM [u'1.12.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143465 oper.ip3@durhamNOSPAMPLEASE.ac.uk
FMPhI -UNIBA [u'1.13.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143477 gridmaster@dnpNOSPAMPLEASE.fmph.uniba.sk
UKI-NORTHGRID-LIV-HEP [u'1.13.2', u'1.13.2'] DOME Configured with legacy mode still on DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143466 gridteam@hepNOSPAMPLEASE.ph.liv.ac.uk
GR-07-UOI-HEPLAB [u'1.13.0'] 4 Oct 2019 : first re-configuration attempt to DOME failed... Postponed unless the site migrates to Centos7, probably end of the year https://ggus.eu/index.php?mode=ticket_info&ticket_id=143467 grid@alphaNOSPAMPLEASE.physics.uoi.gr
TOKYO-LCG2 [u'1.12.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143468 lcg-admin@iceppNOSPAMPLEASE.s.u-tokyo.ac.jp
TW-NCHC [u'1.12.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143470 lincy@nchcNOSPAMPLEASE.org.tw
HK-LCG2 [u'1.12.0']     https://ggus.eu/index.php?mode=ticket_info&ticket_id=143471 grid-prod@atlasNOSPAMPLEASE.cuhk.edu.hk
ZA-WITS-CORE [u'1.13.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143478 scott.hazelhurst@witsNOSPAMPLEASE.ac.za
Taiwan-LCG2 [u'1.12.1']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143472 ops@listsNOSPAMPLEASE.grid.sinica.edu.tw
NCBJ-CIS [u'1.12.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143473 admins@cisNOSPAMPLEASE.gov.pl
BUDAPEST [u'1.13.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143657 gridadm@rmkiNOSPAMPLEASE.kfki.hu
UKI-SOUTHGRID-OX-HEP [u'1.12.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=143656 lcg_manager@physicsNOSPAMPLEASE.ox.ac.uk
UKI-LT2-Brunel [u'1.13.0']   DONE https://ggus.eu/index.php?mode=ticket_info&ticket_id=144007 lcg-admin@brunelNOSPAMPLEASE.ac.uk

Recommended configuration

  • ATLAS (Rucio)
    • Use at least DOME DPM 1.12.1 + XRootD 4.9.0 + davix 0.7.3 ... latest (stable) versions from EPEL recommended
      • dmlite is linked with xrootd packages available at release date and by moving to the latest dmlite it is necessery to use most recent xrootd packages
      • enable GridFTP redirection: puppet head+disknode configuration option gridftp_redirect
      • enable XRootD checksums: puppet head+disknode configuration option configure_dpm_xrootd_checksum (enabled by default since DPM 1.13)
      • optionally enable TPC XRootD delegation: puppet disknode configuration option configure_dpm_xrootd_delegation (enabled by default since DPM 1.13)
      • to support IPv4 only clients with enabled IPV6 in GridFTP plugin (default in gfal-2.17 and Dirac middleware) on dualstack DPM epsv_match must be enabled, see LCGDM-2817
    • AGIS configuration (example for SE, panda)
      • GridFTP preferred protocol with priority 0 for tpc activities (requires GridFTP redirection)
      • XRootD for lan and wan read+write (write works only with XRootD checksums enabled)
      • rucio mover for panda queues (rucio mover use storage protocols according preferences defined in AGIS)
      • each protocol in AGIS should have monitoring enabled to be part of ATLAS SAM tests
      • EGI sites should also register each SE protocol with GOCDB (example: SRM, GridFTP, XRootD, WebDAV)
      • fully SRM-less operation requires additional configuration of the Storage Resource Reporting (SRR)
        • use cron to generate at least hourly storagesummary.json with SRR info by dpm-storage-summary.py script
        • since DOME DPM 1.13.2 SRR info automatically available via HTTP CGI and cron config mentioned above is no longer necessary
        • modify "Space method" and "Space Usage" for each DDM endpoint in AGIS ( example)
          • Space method: storage
          • Space Usage: URL of your storagesummary.json
    • Argus blacklisting implemented in DOME DPM 1.13.3
    • used in production since February 26 2019 at PRAGUELCG2
      • DOME enabled in June 2018, but without GridFTP redirection
      • troubles with DOME DPM 1.11 fixed in 1.12 - stable since March 5 (including GridFTP redirection)
        • 1.12 had still some non-critical issues with known workarounds (see "Detected problems" section)
        • fixed in DOME DPM 1.13 and on our production DPM since July 12
      • monitoring
  • CMS (PhEDEx)
  • Dirac users
    • internally use GFAL for transfers (unless you still use deprecated protocols)
    • if your DPM supports IPv4 + IPv6 be avare IPv4 only clients can't access data using gsiftp protocol unless you follow instruction in LCGDM-2817
    • LFC catalog
      • deprecated & EOL - you should think about migration
      • full file URL stored in catalog - can't easily switch from SRM protocol
    • DFC catalog
      • possible to configure non-SRM transfer protocols
      • with GridFTP redirection enabled in DPM it should be almost transparent switching from GFAL2_SRM2 to GFAL2_GSIFTP

After reconfiguration for DOME make sure that SRR is enabled

How to enable SRR

After changes performed on your service, please, update information in CRIC

Authentication & authorization step

  • Go to WLCG cric server , click Core (menu on the top of the page) -> Services. Enable filtering, by clicking on the 'Filter' button and select your site. By default , you won't see implementation and implementation version columns in the table. In order to see this info, you need to click on 'Columns' and then select corresponding columns in the drop down list.

  • You should be able to list all CRIC entities (sites (GocDB /OIM and experiment-specific ones), federations, pledges, services, storage protocols and queues) without authentication. However, once you would like to see details of any particular entity, you would be asked to login.

  • Those who are registered in the CERN DB, please, use SSO authentication. Authentication with certificate is not yet enabled on this instance, will come soon.

  • Those who are not registered in the CERN DB would need to ask for CRIC local account. Please, send a mail to
cric-devs@cernNOSPAMPLEASE.ch with your name, family name and mail address to be used by CRIC to communicate with you.

  • As soon as you are logged in, you will be able to see details of any CRIC entity, however in order to edit in order to edit information, one would need to get specific privileges. * As soon as you are authenticated, you will see 'Request privileges' on the top of the page next to your login name. Please, click on it and follow up the request procedure which allows to request global admin privileges, site admin privileges or federation admin privileges. Ask for sites admin privileges for your site. You will be shortly informed that your privileges are enabled. Please re-login.

Editing storage info

  • Once you login with appropriate privileges, you should be able to edit information about your site. At the moment we are particularly interested in storage info at your site, namely its implementation, implementation version and SRR URL when it enabled.

  • CRIC creates virtual storage service per site/per VO/per media/per implementation. By default it creates 1 disk and 1 tape virtual storage for every VO which is served by a given T1 site. However, if for a given VO there are storage instances for the same media but different implementation (for example EOS and dCache instances for disk storage for ATLAS), CRIC should create two different disk virtual storage instances for this VO. Unfortunately, for the moment, there is no reliable primary source for this kind of information, so it is highly likely that only a single virtual storage will be created by CRIC in such cases. Would be great if you could correct it using CRIC UI and add other storage virtual instances with their implementation , implementation versions for your site and SRR URL when it is enabled. In the future we hope to get this information through SRR (Storage Resource Reporting).

  • In the service table view, click on a particular service name
  • You get a form with detailed information about service
  • Click on the 'Edit' button under the first block of information
  • You get another form. Please, correct 'Version' of your DPM implementation. In case Dome is enabled, please provide version number complemented 'with DOME' and provide "Resource Reporting URL" value
  • Click on 'Check input data' and save info

Creating a new virtual storage instance in CRIC

  • Staring from the entry page: https://wlcg-cric.cern.ch/
    • in the horizontal menu on the top of the page, select 'Core' -> 'Create Storage Service'. You get a form to fill in
  • Keep service name field empty as the form suggests
  • Select your site form the drop down menu
  • Service type (SE) should not be touched
  • Select Disk or Tape media in the "Architecture" filed drop down menu
  • Provide value for implementation (EOS, Castor, Xrootd, dCache, DPM)
  • Provide value for implementation version
  • You can provide a value in the endpoint field or leave it empty if it does not make sense
  • Please, select value for the VO name. As mentioned above, the virtual storage in CRIC is created for a single VO even though several VOs can share the same physical storage service of the site
  • Leave 'ACTIVE' object state
  • All other attributes are optional, you can leave them empty

Creating a new protocol for a given virtual storage in CRIC

  • Currently, even if there is one single protocol shared by several virtual storage instance in CRIC, for each
virtual storage instance a new protocol instance has to be created
  • To create new protocol for a given virtual storage instance, select corresponding service in the service list and click on the name to get a detailed description of the virtual storage service
  • Below the table with the list of protocols, click on the 'Add protocol' button. You will get a form to fill.
  • Leave the name of the protocol empty, the system will generate it for you
  • "Flavour" and "endpoint" are mandatory attributes, other fields could be empty

Deleting virtual storage instance from CRIC

For the time being , deletion from the UI is not allowed. Change the object state to "Disabled" in order to make it disappear from the listing

Deleting protocols from CRIC

The protocol attached to a particular virtual storage can be deleted from the protocol list from the detailed page describing the virtual storage service.

DPM service monitoring for EGI.

In order to enable DPM service monitoring for EGI one needs to configure webdav (HTTPS) for the ops VO and register the endpoint on GOCDB

Registration of webdav service endpoint in GocDB

For registering on GOC-DB the webdav service endpoint, follow the HOWTO21 in order to fill in the proper information.

In particular:

Enable gridftp monitoring for ops VO (if you provide such protocol)

  • register a new service endpoint, associating the storage element hostname to the service type “globus-GRIDFTP”, with the "production" flag disabled;
  • in the “Extension Properties” section of the service endpoint page, fill in the following fields:
  • Name: SE_PATH
  • Value: /dpm/ui.savba.sk/home/ops #this is an example, set the proper path
  • check if the tests are ok (it might take some hours for detecting the new service endpoint) and then switch the production flag to "yes"

Detected problems

  • PRAGUELCG2
    • forgot to tune open file handles limit on dedicated DB machine (during DPM upgrade we moved DB from SLC6 to CC7)
    • DONE 1.12 - dmlite-mysql-dirspaces.py doesn't fix data assigned to wrong (empty) spacetoken which can/will cause problems for SRM transfers
    • DONE 1.13 -Slowly diverging space counters (size(parentdir) != sum(size(childdir))) e.g. for ATLASDATADISK using pydmlite API or direct DB query - LCGDM-2801
    • CLOSED GridFTP transfers confirmed OK to client before processed by DPM (e.g. DB update for filesize, replica status, ...) LCGDM-2818, may be related to LCGDM-1961
      • GridFTP dsi plugin bug and ugly workaround in DPM doesn't always work
      • more details in dpm-devel thread "GridFTP redirection & Globus race condition avoided"
      • lead to lost files at our site when replica status was not updated by DPM (reason not fully understood), but for FTS/Rucio this was successful transfer
      • diverging directory space usage for concurrent GridFTP uploads LCGDM-2730 (may be related to this issue)
      • DPM developers suggest checksum query after upload as a workaround - code updated in 1.13
    • DONE unable to access dualstack DPM with gsiftp from IPv4 only machines with IPv6 enabled in GFAL ("GRIDFTP PLUGIN:IPV6=true") LCGDM-2817
      • dpm-devel thread "DPM GSIFTP IPv6 Behaviour"
        IPv6 GFAL config hardcoded in Dirac sources (discussed in diracgrid-forum)
    • DONE 1.11 + xrootd 4.9.x - xrootd checksums not implemented LCGDM-2726
    • DONE 1.12 - Unable to correctly disable gridftp redirection with puppet
    • DONE 1.12 - Problems with default non-optimal configuration options LCGDM-2743, LCGDM-2745, GGUS:139803
    • CLOSED - SRM upload/deletion doesn't immediately update directory size LCGDM-2731 - db updated => space accounting should work find
    • CLOSED - Removing directory with SRM not seen by DAVS LCGDM-2732
      • can't be fixed (don't mix SRM with DOME protocols at least within one directory / VO)
      • this sequence of commands also fails: gfal-stat root://host//file; gfal-copy file:///file srm://host/file; gfal-stat root://host//file
        • used during rucio upload with SRM preferred for writing and XRootD for reading
    • DONE 1.12 - Directory size not correctly updated with gfal-rename LCGDM-2733
    • DONE 1.12 - Unable to disable bad filesystem with DOME LCGDM-2740
    • DONE 1.13 SRR script robustness LCGDM-2714, LCGDM-2744 (mitigated by separate quotatoken for SRR recommended in documentation)
    • DONE 1.12 - DPM hammer itself with checksum requests LCGDM-2747
    • DONE 1.13 - Checksum issues (deadlock, popen3 blocking read) LCGDM-2791
    • CLOSED - Confusing usage of the head.db.poolsz configuration option LCGDM-2749 (I'm not really happy with the answer)
    • DONE 1.12 - Database connections are not reused - individual connection for each query LCGDM-2754
    • DONE 1.13 - Disabled DPM disknodes can be selected for GridFTP transfers LCGDM-2748
    • DONE 1.13 - Complain when setting quotatokens at a level deeper than the one where space is calculated LCGDM-2727
    • DONE 1.13 Problematic graceful restart of httpd LCGDM-2699, LCGDM-2707 + "DPM http graceful restart" discussion in dpm-upgrade mailing list
      • problems caused by sub-optimal default Apache configuration (ServerStart == ServerLimit can by design cause troubles during graceful restart)
        • processes not terminated till at least one thread deals with existing transfer
        • some transfers can hang forever LCGDB-2787 (rare but happens, probably fixed in lcgdm-dav 0.23)
        • apache is already at the ServerLimit and hesitate to fork new processes
          • quite old Apache bug makes this situation worse (no plans to fix it in CentOS7)
          • more graceful restart in short time range leads to zero apache childs that accepts new connections
      • testing configuration with ServerLimit = 2.5*ServerStart and MaxRequestWorkers = ServerLimit*ThreadLimit
    • DONE lcgdm-dav 0.23 graceful restart cat get stuck forever in case curl transfer hangs LCGDB-2787
    • DONE 1.12.1 - Failing WLCG / Argo webdav tests LCGDM-2751
    • CentOS7 Apache 2.4.6 kills transfers from DPM to local filesystem during graceful restart
    • CLOSED apache memory usage on SLC6 LCGDM-2783 (workaround would be CentOS7 upgrade of all disknodes, )
    • DONE davix 0.7.3 - libdavix used internally by DPM for communication with DOME can cause daemon crashes with same backtrace as grid-hammer
      • happens only if case something gets stuck inside DPM and there are hundreds of DOME API requests
    • installing legacy dmlite config files for DOME DPM LCGDM-2781
    • file access failing after headnode restart & before enabling disknodes LCGDM-2792
    • stale gridftp processes when gridftp redirection is enabled LCGDM-1988, LCGDM-2826
    • cached results concurrency issue with delete / copy / stat LCGDM-2828
    • atomic size updates for all parent directories LCGDM-2878
    • DONE file rename transaction can fail, but client still receive success (silent data loss with Rucio) LCGDM-2869
  • INFN-NAPOLI-ATLAS
    • DONE 1.12 - spacetoken vs. quotatoken size caused problems with SRM transfers
    • DONE DPM 1.12 - SRM reads physical disk empty space only during startup and is not aware of DOME transfers => failing SRM transfers that tried to use full disk LCGDM-2752
      • this can be solved for ATLAS only DPM by movig to pure GridFTP
      • sites where one VO needs SRM and the other use DOME - can be solved only by separating SRM VO vs. DOME VO to different pools
      • sites that use both (SRM and non-SRM) transfers this issue currently have no reliable solution except for regular restarts of the legacy DPM
    • DONE 1.11 - DomeUserInfo::userid & DomeUserInfo::groupid should be a larger int type LCGDM-2717, LCGDM-2718
    • DONE 1.12 - Allow file deletion on readonly filesystems LCGDM-2734
  • Brunel
    • DONE stability / performance issues solved only with DPM 1.12 + XRootD 4.9.0 (details in dpm-upgrade mailing list)
  • Lancaster
    • DONE crashing xrootd daemon with DOME DPM 1.11 (1.12pre)
      • old SLC6 disknodes not managed by puppet
      • loading adapter plugin (part of legacy DPM)
    • later stability issues with 1.12.1 (may be LCGDM-2791) solved by upgrade to 1.13.0
  • Beijing
    • DONE missing quotatoken definition for dteam (problems with TPC)
  • IRFU (GRIF)
  • KEK
    • DONE Host DN is not authorized for certificates with subject not matching "CN=headnode" and "CN=disknode" LCGDM-2790
      • Japanese CA issue host certificates with subject that ends with "CN=host/fqdn"
      • workaround it DN whitelist with glb.auth.authorizeDN[] DOME configuration option
      • DONE 1.13 - added configuration options glb.auth.dnmatch-cnprefix and glb.auth.dnmatch-cnsuffix
  • Manchester
    • DONE troubles with authorization that seems to be related to the machine aliases, authorizedDN and certificates
  • AUVERGRID
    • DONE dpm-tester.py doesn't work correctly for GridFTP in case DPM headnode is also used as disknode
  • Oxford
    • failing SRM after enabling gridftp redirection
  • Cosenza
    • DONE skipped adding quotatoken path before running dmlite-mysql-dirspaces (next version of this script provides better error message)
  • IN2P3 -LPC
    • DONE problem with t_space size for ATLASDATADISK (not sure what happened, but it was fixed after restoring database dump)

Other notes

  • IN2P3 -CPPM was using DOME with DPM 1.9.2

Meetings

Useful links

Participants

  • Fabrizio Furano (DPM)
  • Oliver Keeble (DPM and WLCG Steering Group)
  • Dimitrios Christidis (WLCg Storage Space Accounting)
  • Julia Andreeva (WLCG Operations Coordination)

GOCDB DPM reachable by srmPing on February 26

Site Headnode Size DPM Version XRootD protocol GridFTP
TOKYO-LCG2 lcg-se01.icepp.jp 10559536 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.5
ICN-UNAM tlapiacalli.nucleares.unam.mx 5248671 N/A N/A GridFTP Server 11.1
GRIF node12.datagrid.cea.fr 4554739 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
UKI-NORTHGRID-MAN-HEP bohr3226.tier2.hep.manchester.ac.uk 4537796 N/A xrootd/0x30000 GridFTP Server 9.1
praguelcg2 golias100.farm.particle.cz 4456037 DPM/1.10.0-1 xrootd/0x40000 GridFTP Server 13.9
UKI-SCOTGRID-GLASGOW svr018.gla.scotgrid.ac.uk 3816622 DPM/1.8.10-1 xrootd/0x10030000 GridFTP Server 12.4
Taiwan-LCG2 f-dpm001.grid.sinica.edu.tw 3269468 DPM/1.8.11-1 N/A N/A
TR-10-ULAKBIM torik1.ulakbim.gov.tr 3161019 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 11.3
INDIACMS-TIFR se01.indiacms.res.in 3107872 DPM/1.9.0-1 xrootd/0x10030000 N/A
UKI-NORTHGRID-LANCS-HEP fal-pygrid-30.lancs.ac.uk 3074570 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
IN2P3-CPPM marsedpm.in2p3.fr 2642392 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.4
INFN-NAPOLI-ATLAS t2-dpm-01.na.infn.it 2399002 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
RO-07-NIPNE tbit00.nipne.ro 2367783 N/A xrootd/0x30000 GridFTP Server 7.26
NIKHEF-ELPROD tbn18.nikhef.nl 2353756 DPM/1.9.0-1 xrootd/0x10030000 N/A
IN2P3-LAPP lapp-se01.in2p3.fr 2187150 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.5
GRIF lpnse1.in2p3.fr 2113058 DPM/1.10.0-1 xrootd/0x10030000 N/A
INFN-FRASCATI atlasse.lnf.infn.it 2111236 N/A xrootd/0x10030000 GridFTP Server 11.8
IN2P3-IRES sbgse1.in2p3.fr 2032105 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
IN2P3-LPC clrlcgse01.in2p3.fr 1887617 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.5
INFN-ROMA1 grid-cert-03.roma1.infn.it 1701923 DPM/1.13.0-1 xrootd/0x10030000 GridFTP Server 13.20
GRIF polgrid4.in2p3.fr 1682184 DPM/1.10.0-1 N/A GridFTP Server 13.9
NCBJ-CIS se.cis.gov.pl 1520624 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 13.9
UKI-LT2-RHUL se2.ppgrid1.rhul.ac.uk 1460022 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 11.8
Australia-ATLAS agh3.atlas.unimelb.edu.au 1433885 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.4
UKI-NORTHGRID-LIV-HEP hepgrid11.ph.liv.ac.uk 1425584 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
UKI-LT2-Brunel dc2-grid-64.brunel.ac.uk 1377142 DPM/1.10.4-1 xrootd/0x40000 GridFTP Server 13.9
GRIF grid05.lal.in2p3.fr 1303158 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
HK-LCG2 se01.atlas.cuhk.edu.hk 1191172 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
IN2P3-LPSC lpsc-se-dpm-server.in2p3.fr 1164763 N/A xrootd/0x30000 GridFTP Server 11.3
UKI-SCOTGRID-ECDF srm.glite.ecdf.ed.ac.uk 1141829 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.8
BUDAPEST grid143.kfki.hu 1105939 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.8
TW-NCUHEP grid71.phy.ncu.edu.tw 1099803 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 11.8
UNIBE-LHEP dpm.lhep.unibe.ch 1070870 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.5
UKI-SCOTGRID-ECDF srm-rdf.gridpp.ecdf.ed.ac.uk 1066397 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.8
Hephy-Vienna hephyse.oeaw.ac.at 988060 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.8
Kharkov-KIPT-LCG2 cms-se0.kipt.kharkov.ua 955312 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 13.8
UKI-SOUTHGRID-OX-HEP t2se01.physics.ox.ac.uk 939542 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.4
IN2P3-IPNL lyogrid06.in2p3.fr 936017 N/A xrootd/0x30000 GridFTP Server 9.1
BEIJING-LCG2 ccsrm.ihep.ac.cn 775972 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.8
N/A lcgse01.phy.bris.ac.uk 728091 N/A xrootd/0x10030000 N/A
UKI-SCOTGRID-DURHAM se01.dur.scotgrid.ac.uk 645619 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
CYFRONET-LCG2 se01.grid.cyfronet.pl 627881 N/A xrootd/0x30000 GridFTP Server 10.4
FMPhI-UNIBA lcgdpmse.dnp.fmph.uniba.sk 624016 N/A xrootd/0x10030000 GridFTP Server 13.8
TW-NCHC se01.grid.nchc.org.tw 575234 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.8
TR-03-METU eymir.grid.metu.edu.tr 537945 DPM/1.8.11-1 xrootd/0x10030000 GridFTP Server 11.3
UKI-NORTHGRID-SHEF-HEP lcgse0.shef.ac.uk 531351 N/A xrootd/0x30000 GridFTP Server 9.4
INFN-COSENZA recas-se-01.cs.infn.it 447993 N/A xrootd/0x10030000 GridFTP Server 13.8
CAMK se.cta.camk.edu.pl 408004 N/A N/A GridFTP Server 9.1
PSNC se.reef.man.poznan.pl 407575 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
RO-02-NIPNE baaf02.nipne.ro 404017 N/A xrootd/0x10030000 GridFTP Server 13.9
RECAS-NAPOLI belle-dpm-01.na.infn.it 399981 DPM/1.8.11-1 xrootd/0x30000 GridFTP Server 10.4
Ru-Troitsk-INR-LCG2 grse001.inr.troitsk.ru 330473 N/A xrootd/0x10030000 GridFTP Server 11.8
RECAS-NAPOLI recas-km3netse01.na.infn.it 319985 N/A xrootd/0x30000 GridFTP Server 9.4
prague_cesnet_lcg2 dpm1.egee.cesnet.cz 306985 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 12.8
CBPF se.cat.cbpf.br 283972 DPM/1.8.9-1 xrootd/0x97020000 GridFTP Server 7.25
NCP-LCG2 pcncp22.ncp.edu.pk 266159 N/A xrootd/0x10030000 GridFTP Server 13.9
ICM se.grid.icm.edu.pl 251036 N/A xrootd/0x10030000 N/A
UKI-SOUTHGRID-CAM-HEP serv02.hep.phy.cam.ac.uk 248775 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
GR-07-UOI-HEPLAB grid02.physics.uoi.gr 206898 N/A xrootd/0x10030000 GridFTP Server 13.9
CBPF se02.cat.cbpf.br 192017 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
TW-NTU-HEP ntugrid6.phys.ntu.edu.tw 144006 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 12.6
UA_ICYB_ARC se.uagrid.org.ua 135734 N/A N/A GridFTP Server 11.8
RO-13-ISS seau.spacescience.ro 119522 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.4
GR-12-TEIKAV se.grid.teiemt.gr 110972 DPM/1.8.9-1 xrootd/0x30000 GridFTP Server 7.18
HEPHY-UIBK grid01.uibk.ac.at 99012 N/A xrootd/0x10030000 GridFTP Server 12.5
N/A lapp-testse01.in2p3.fr 59624 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.5
ru-PNPI cluster.pnpi.nw.ru 56006 DPM/1.9.0-1 N/A N/A
Australia-T2 coepp-dpm-01.ersa.edu.au 54973 DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.4
UA-BITP se.bitp.kiev.ua 41985 N/A xrootd/0x30000 GridFTP Server 12.4
CYFRONET-LCG2 se03.grid.cyfronet.pl 31394 N/A xrootd/0x10030000 GridFTP Server 12.5
UA-NSCMBR se.biomed.kiev.ua 29770 DPM/1.9.0-1 N/A GridFTP Server 11.8
AUVERGRID cirigridse01.univ-bpclermont.fr 19790 DPM/1.10.0-1 N/A GridFTP Server 13.9
GRIF ipnsedpm.in2p3.fr 18704 DPM/1.9.0-1 xrootd/0x10030000 N/A
UNINA-EGEE se.scope.unina.it 17307 DPM/1.8.8-1 N/A GridFTP Server 6.38
HG-02-IASA se01.marie.hellasgrid.gr 9998 DPM/1.8.10-1 N/A GridFTP Server 12.5
UMB-BB se.grid.umb.sk 8793 N/A N/A GridFTP Server 12.4
UPorto hades.up.pt 8356 N/A N/A GridFTP Server 13.9
MA-01-CNRST se1.cnrst.magrid.ma 7749 N/A N/A GridFTP Server 9.1
GRIDIFIN segi.nipne.ro 7497 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.8
MA-04-CNRST-ATLAS atlas-se1.cnrst.magrid.ma 6780 N/A N/A GridFTP Server 9.1
OBSPM se-dpm-server-grid.obspm.fr 6594 DPM/1.9.0-1 N/A GridFTP Server 11.8
CESGA se2.egi.cesga.es 6214 DPM/1.8.10-1 N/A GridFTP Server 9.1
AEGIS03-ELEF-LEDA grid02.elfak.ni.ac.rs 5946 DPM/1.8.10-1 xrootd/0x30000 GridFTP Server 9.1
TASK se.grid.task.gda.pl 2197 N/A N/A GridFTP Server 9.1
GRISU-UNINA grisuse.scope.unina.it 2163 DPM/1.8.8-1 N/A GridFTP Server 6.38
DZ-01-ARN se01.grid.arn.dz 2063 DPM/1.10.0-1 N/A GridFTP Server 13.9
RO-13-ISS grid02.spacescience.ro 1969 DPM/1.10.0-1 N/A GridFTP Server 13.8
CIRMMP se-enmr.cerm.unifi.it 1475 DPM/1.8.10-1 N/A GridFTP Server 11.1
GARR-01-DIR gridsrv3-4.dir.garr.it 1268 DPM/1.8.10-1 N/A GridFTP Server 9.1
UA-ISMA gl-dpm.isma.kharkov.ua 1098 N/A N/A GridFTP Server 12.2
HK-HKU-CC-01 glite01.grid.hku.hk 1082 N/A N/A GridFTP Server 13.8
RO-03-UPB se01.grid.pub.ro 1082 N/A N/A GridFTP Server 9.1
N/A prod-se-03.ct.infn.it 1056 N/A N/A GridFTP Server 9.4
WCSS64 darkmass.wcss.wroc.pl 579 N/A N/A GridFTP Server 12.4
CNR-ILC-PISA gridse.ilc.cnr.it 436 DPM/1.9.0-1 N/A GridFTP Server 12.5
IR-IPM-HEP se1.particles.ipm.ac.ir 301 DPM/1.8.11-1 xrootd/0x30000 GridFTP Server 11.3
USC-LCG2 se-emi.igfae.usc.es 298 DPM/1.10.0-1 N/A GridFTP Server 13.8
WUT alix.if.pw.edu.pl 169 N/A N/A GridFTP Server 6.38
RECAS-NAPOLI recasna-se01.unina.it 150 DPM/1.8.7-3 N/A GridFTP Server 6.38
NCP-LCG2 se02.ncp.edu.pk 134 N/A xrootd/0x10030000 GridFTP Server 12.2
TU-Kosice dpm.grid.tuke.sk 107 N/A N/A N/A
IISAS-Bratislava se-sivvp.ui.savba.sk 107 N/A N/A GridFTP Server 13.8
NIHAM alice003.nipne.ro 105 DPM/1.8.10-1 N/A GridFTP Server 9.1
AEGIS02-RCUB grid15.rcub.bg.ac.rs 102 N/A N/A GridFTP Server 6.38
NCP-LCG2 pcncp23.ncp.edu.pk 84 N/A N/A GridFTP Server 13.9
AstrogridPUC astrose.astro.puc.cl 53 N/A xrootd/0x10030000 GridFTP Server 13.9
UA-KNU se.univ.kiev.ua 52 N/A N/A GridFTP Server 13.8
RO-11-NIPNE lhcb-se.nipne.ro 52 N/A N/A GridFTP Server 13.8
USC-LCG2 se.igfae.usc.es 52 DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
SUPERCOMPUTO-UNAM se.grid.unam.mx 26 N/A N/A GridFTP Server 7.26
GRID-UNAM dpm.grid.unam.mx 8 N/A N/A GridFTP Server 7.25
CYFRONET-LCG2 dpm.cyf-kr.edu.pl N/A DPM/1.8.9-1 xrootd/0x97020000 GridFTP Server 7.18
Australia-T2 b2se.mel.coepp.org.au N/A DPM/1.9.0-1 xrootd/0x10030000 GridFTP Server 12.4
AEGIS01-IPB-SCL dpm.ipb.ac.rs 0 DPM/1.8.10-1 N/A N/A
GRIF grid03.lal.in2p3.fr N/A DPM/1.10.0-1 xrootd/0x10030000 GridFTP Server 13.9
HG-05-FORTH se01.ariagni.hellasgrid.gr N/A DPM/1.8.10-1 N/A GridFTP Server 9.1
-- JuliaAndreeva - 2018-09-25 -- JuliaAndreeva - 2020-08-25
Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2020-08-25 - JuliaAndreeva
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback