WLCG Operations Coordination Minutes, May 2, 2024

Highlights

Agenda

https://indico.cern.ch/event/1411424/

Attendance

  • local: Lorenzo (HammerCloud), Maarten (ALICE + WLCG), Steve (HammerCloud)
  • remote: Alexander (ATLAS), Andrea (CMS + WLCG), Benjamin (ATLAS), Borja (monitoring), Christoph (CMS), Dave D (FNAL), David B (IN2P3-CC), Eva (CERN IT-DA), Federica (IAM devs), Frédérique (LAPP), Jan (LHCb), Mario (ATLAS), Panos (CMS + WLCG), Petr (ATLAS + Prague), Stephan (CMS), Thomas (DESY)
  • apologies:

Operations News

  • the next meeting is planned for June 13 !

Special topics

HammerCloud status and plans

see the presentation

  • Steve also introduces Lorenzo, who has just joined the team
    and will be working on HammerCloud as well as the ETF

  • Lorenzo has studied computational physics in Padova and
    has previously worked on ILC-DIRAC for FCC studies

Discussion

  • Stephan:
    • what is the plan for the back-ends?
    • will you work with Andrea on those?
  • Steve:
    • presuming you meant the submit nodes: we will work on those with Andrea
  • Stephan:
    • do you have an estimate for the interface with Python-3 job submission tools?
  • Steve:
    • no
  • Andrea:
    • the amount of work is difficult to estimate
    • worked on it 2 years ago, but had to abandon it
    • Python-3 packages were uploaded, but untested
    • Puppet manifests were far from finished
  • Stephan:
    • will this work have more priority as of now?
  • Steve:
    • yes

htgettoken + HashiCorp Vault as a Service for Managing Grid Tokens

see the presentation

Discussion

  • Petr:
    • as HashiCorp will be bought by IBM,
      could that lead to a licensing problem for Vault?
  • Dave:
    • we could fork the code if needed
    • as there will not be many instances, we might even pay
  • Petr:
    • you might be charged per token
  • Thomas:

  • Petr:
    • does CMS plan to integrate Vault + htgettoken into their SW?
  • Dave:
    • for some use cases
    • for CRAB it does not seem to be needed

  • Petr:
    • would users actually use the htgettoken command?
  • Dave:
    • it would be called through wrappers
  • Petr:
    • users would have to know about access tokens for specific roles?
  • Dave:
    • there could be scripts, which can be common for popular use cases

  • Maarten:
    • one of the ideas behind the token transition is that ordinary users
      should not have to know anything about tokens
    • it is good to see various auxiliary services already in production,
      that we can take advantage of for LHC experiments and related VOs
    • the token transition timeline allows about 2 years still for
      the user experience to get sorted out in each experiment

Middleware News

  • Useful Links
  • Baselines/News
    • While the UMD-5 for EL9 is not ready yet, the BDII auxiliary rpms
      have been made available from the WLCG repository
      • For EL9 and EL8
      • The BDII itself is available from EPEL

  • EL9 vs. SHA-1
    • the situation is summarized here
    • further discussion in this fetch-crl ticket
      • the problem is with OpenSSL instead
    • DigiCert are not going to update their root CA (presumably OK in browsers)
      • might be removed from IGTF if nothing important depends on it
        • today there are 84 WLCG users with such certificates
          • 74 in ATLAS
      • also is the issuer of 2 TERENA CAs
        • today there are 418 WLCG users with such certificates
          • 178 in ATLAS
          • 149 in CMS
      • LHC experiment users could switch to CERN Grid CA certificates
      • but there could also be many services with affected certificates
    • to be followed up further
      • hopefully we can at least push our classic CAs faster toward SHA-2

Discussion

  • Petr:
    • should we still rely on EGI for the UMD?
  • Maarten:
    • there have been unfortunate delays due to various causes
    • the UMD has several important advantages over EPEL etc.
    • hopefully we will soon be able to start profiting from UMD-5
    • meanwhile we can e.g. use the WLCG repository as a stopgap

  • Thomas:
    • could multi-hop transfers be considered to reduce the number
      of hosts that need to have SHA-1 configured?
  • Petr:
    • we only do multi-hop transfers for special cases
  • Maarten:
    • on a large scale it would imply bottlenecks

  • a discussion then followed about what we can do to make progress

  • Jan:
    • can we do a campaign to move users and services off SHA-1 CAs?
    • find out how much each VO is affected?

  • Maarten:
    • will follow up with IGTF to see if classic CAs can be pushed
    • will try to get an idea of the number of services per VO
    • intend to provide updates regularly

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • normal to high activity on average in the last weeks
  • an XRootD server I/O performance problem was badly affecting analysis trains at KIT
    • thanks very much to the KIT experts for resolving that with urgency!

ATLAS

  • Everything going smooth
    • Good job mix with some larger reconstruction and group production campaigns
    • 650k slots on average, with several 750 peaks
    • 2M-3.5M file transfers / 3-6PB volume per day
    • Tape consolidation campaign slowly easing off, 2-3 weeks tail left
  • ~4k slots on SWT2 now extended by Google
  • Increase of ARM resources, total ~50k slots now
    • CERN, GLASGOW, INFN-T1, SWT2_GOOGLE extension
  • ALMA9 migration going well and almost finished

CMS

  • overall smooth data taking and computing operations
    • core usage between 310k and 580k cores
    • due to HPC/opportunistic contributions
    • almost all production activities now Run 3
    • back to a more usual production/analysis split of about 3:1
    • various monitoring outages the last weeks.
  • work on SRM to REST migration for tape endpoints continues
    • four done / four remaining
  • phasing out SRMv2/GSIftp/gridFTP at sites
    • 20 done / 20 remaining
  • three remaining DPM sites to migrate
  • token migration progressing steadily
  • waiting on python3 version/port of HammerCloud
  • we will ask sites to remove x86-64-v1 microarchitecture worker nodes with the end of SL7, i.e. June 30, for CMS (five sites, less than 100 Worker nodes)
  • online Oracle DB support issue ten days ago

LHCb

  • Jan will have items reported as needed next time

Task Forces and Working Groups

Accounting TF

Migration from DPM to the alternative solutions

Information System Evolution TF

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

Network Throughput WG


WG for Transition to Tokens and Globus Retirement

  • timeline for the transition from VOMS-Admin to IAM services for VO management
    • the legacy VOMS servers are being removed from the
      vomses configuration files used to create VOMS proxies
      • on QA hosts at CERN since today
      • on production hosts at CERN planned for Tuesday May 7
      • wlcg-voms rpms will be updated accordingly
    • only the VOMS endpoints of the IAM services on OpenShift will be used
      • they have been used in production since 2 years by ATLAS and CMS,
        since ~2 months by ALICE and LHCb
      • all corresponding LSC files are in production since 2 years
    • switches from VOMS-Admin to IAM are planned for this month
      • per experiment, when it gives the green light
    • ultimate deadline: end of June!

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2024-05-03 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback