WLCG Operations Coordination Minutes, Dec 2, 2021

Highlights

Agenda

https://indico.cern.ch/event/1101195/

Attendance

  • local:
  • remote: Alastair (RAL), Alberto (monitoring), Alessandra D (Napoli), Alessandra F (ATLAS + Manchester + WLCG), Alexey (CRIC), Andrew (TRIUMF), Borja (monitoring), Brian (RAL), Christoph (CMS), Darren (RAL), Dave M (FNAL), David Cameron (ATLAS + ARC), David Cohen (Technion), Edoardo (networks), Eric (IN2P3), Giuseppe (CMS), Henryk (LHCb + NCBJ-CIS), Julia (WLCG), Maarten (ALICE + WLCG), Marian (networks + monitoring), Masahiko (Tokyo), Matt D (Lancaster), Nikolay (monitoring), Panos (CRIC), Pedro (monitoring), Riccardo (WLCG), Rizart (WLCG), Shawn (networks + MWT2), Stephan (CMS), Thomas (DESY)
  • apologies:

Operations News

  • the next meeting is planned for Jan 27

Special topics

Network topology in CRIC. Information per site to be provided.

see the presentation

Discussion

  • Stephan:
    • the plans look geared towards standard grid sites,
      whereas we also have HPC, cloud and opportunistic resources
    • on the one hand the requested configuration details seem complex,
      on the other hand they may be too simplistic for all use cases
    • for example, let's assume MWT2 uses Google resources that were
      already used by another site: how might that be handled?
  • Shawn:
    • the information in CRIC is meant for static sites
    • cloud resources are often hidden behind what sites expose
    • support of dynamic sites is challenging, it could come in version 2
  • Stephan:
    • we already declare temporary RC sites in CRIC
    • it would be good to allow those network details to be exported in JSON format
    • the association of specific networks to specific experiments could
      be incompatible with opportunistic use of computing resources
  • Shawn:
    • such assocations are not meant to limit the use of resources
    • they allow resources to be matched with their main customers
  • Stephan:
    • can the first match be made to have the highest priority?
    • we want to avoid that many subnets might need to be listed
  • Shawn:
    • we can consider that indeed
    • such matters are exactly what we wanted to discuss

  • Julia:
    • there will be a convenient API for the use cases we want to support
  • Alexey:
    • JSON exports of those details are already possible now
    • opportunistic resources for ATLAS are always declared in CRIC
  • Shawn:
    • static sites are OK, dynamic resources may be tricky

  • Alastair:
    • the requirements on page 2 look fine, but then the presentation
      shifts more and more toward "give us all your information"
    • CRIC will never be the source of truth
    • it will at most have a copy of a site's own configuration details
    • the focus should be on the LHCONE use cases
  • Shawn:
    • the CRIC network information is about well-defined sites
    • it will help us with several important use cases
    • it also presents us with opportunities to identify issues with
      the quality of the provided data
  • Alastair:
    • sites not on LHCONE are less likely to be dedicated to WLCG and
      it may be more difficult to get such information from them
  • Julia:
    • we can at least implement consistency and sanity checks

  • Maarten:
    • the proposed changes need to be implemented and realistic example
      configurations should be provided before the campaign gets launched
  • Julia:
    • we will fix ambiguities and do tests with a few sites, there is no rush
    • a further presentation will be given in the Dec 8 GDB

WLCG Monitoring Task Force

see the presentation

Discussion

  • Alessandra F:
    • the presence of the experiments in the TF is fundamental
  • Julia:
    • we may also work with some experiments just on specific topics

Middleware News

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • Mostly business as usual
  • Thanks to all sites and best wishes for 2022

ATLAS

  • Running 600-700k cores with up to 300k from opportunistic EuroHPC
    • Mainly Run 2 reprocessing and multi-core event generation
  • SAM/ETF tests and Harvester can now submit submit with tokens to OSG 3.6 CEs
  • No issues seen so far with Frontier test (redirecting all traffic to CERN since 15 Nov)
    • Artificial stress now being added to find the limits
  • SRM+HTTPS: fts3-atlas upgraded on Tuesday and gfal configured to allow the use of https in srm <-> srm transfers. All smooth. Today rucio configured to allow https also in mixed protocol transfers https <-> srm.
  • Happy end of year holidays to all!

CMS

  • running smoothly at around 320k cores
    • usual production/analysis split of 3:1
    • HPC allocations contributing between 5k and 60k cores
    • main production activity Run 2 ultra-legacy Monte Carlo
    • processing of parked B data progressing well
  • successful operation of the detector/DAQ during pilot beam test and CRUZET
  • Identity and Access Management, IAM, server moved into production
  • WebDAV SAM test made mandatory, commissioning at a few sites ongoing
  • slow local/remote data access investigation continues at RAL
  • coordinating xrootd v5 upgrade with sites

LHCb

  • Smooth running at 140k cores
  • Due to issues, reprocessing (stripping) of 2016 is waiting for the second request for validation
  • Best wishes for 2022

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

  • Meeting to discuss integration of the new benchmark in the accounting workflow has been held on the 25th of November

Archival Storage WG

Containers WG

CREAM migration TF

This TF has been closed as of Dec 1.

Details here

Final summary:

  • 90 tickets
  • 84 done: 39 ARC, 40 HTCondor, 1 both, 1 K8s, 3 none
  • 6 unsolved

dCache upgrade TF

  • In the beginning of the next year will launch campaign to enable SRR by the dCache sites

Information System Evolution TF

  • Progress in enabling network topology in CRIC. See presentation

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

Network Throughput WG


  • perfSONAR infrastructure - 4.4.1 is the latest release (please update, we also recommend rebooting all nodes after update)
  • WLCG/OSG Network Monitoring Platform
    • Work is on-going to resolve issues reported to the perfSONAR ream
    • Issue causing perfSONARs to hit resource limits (number of threads) will be fixed in the next release
  • Meeting with CRIC team took place last week to discuss use cases for the perfSONAR topology in CRIC
  • Recent and upcoming WG updates:
  • WLCG Network Throughput Support Unit: see twiki for summary of recent activities.

Traceability WG

Transition to Tokens and Globus Retirement WG

  • Progressing via Authorization WG meetings
    • Example: design of token exchange workflows involving FTS and Rucio

Discussion

  • Stephan:
    • what is the status of token support in ARC?
    • we may need to be concerned about discontinuation of
      X509 support in the US pilot factories
  • David Cameron:
    • the latest version of ARC supports tokens for jobs that
      do not require the CE to do any data handling
    • the majority of sites can thus take advantage already
  • Maarten:
    • we foresee CE upgrade campaigns early next year,
      but they may take many months, as usual
    • it would be good for experiments to allow X509 still to
      be used for ARC CEs for the time being
      • and for HTCondor CEs in EGI
    • HTCondor-G (sic) should be fine with that
    • as of ~Feb, HTCondor CEs in OSG will only support tokens
    • the last HTCondor CE version featuring X509 job submission
      reaches its EOL in the autumn of next year
      • EGI sites should have upgraded by that time

  • Julia:
    • do all experiments know what they need to do for tokens?
  • Maarten:
    • they are all represented in the Authorization WG
    • ATLAS and CMS already prepared for the changes in OSG
    • for ALICE and LHCb these matters were less urgent
    • DIRAC developers started implementing token support months ago
    • for ALICE it is on the JAliEn roadmap

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

  • THANKS for your help in making 2021 another successful year for WLCG !
    • Further challenges and opportunities await us in 2022...
Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2021-12-03 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback