WLCG Operations Coordination Minutes, June 2, 2022

Highlights

Agenda

https://indico.cern.ch/event/1166315/

Attendance

  • local:
  • remote: Christoph (CMS), David B (IN2P3-CC), David C (ATLAS), David M (FNAL), Edoardo (networks), Eric (IN2P3), Julia (WLCG), Maarten (ALICE + WLCG), Masahiko (Tokyo), Miltiadis (WLCG), Nikolay (monitoring), Petr (Prague + ATLAS + WLCG), Raffaele, Renato (EGI), Romain, Sebastian (GoeGrid), Shawn (AGLT2 + networks), Stephan (CMS), Thomas (DESY), Xin (BNL)
  • apologies:

Operations News

  • the next meeting is planned for July 7

Special topics

WLCG site network monitoring

see the presentation

Discussion

  • Edoardo:
    • what is requested looks feasible, at least the documentation
    • is the total traffic to be reported, or should it be separated?
  • Shawn:
    • initially we would just like to see the totals, as that is easier
    • in the future we could e.g. have errors and discards added,
      differentiate between experiments, network types etc.
    • CRIC can be made to support further substructures

  • Petr:
    • might a push model be considered instead of the pull model?
  • Shawn:
    • the pull model looks easier to implement at this time
    • a push model can be considered for the future

  • Maarten:
    • some site admins may feel uncomfortable about providing some of
      the information you are requesting, because of security reasons
    • they may not want to expose network details, equipment used etc.
    • though the info should never be world-readable in CRIC,
      it might get exposed due to a configuration mistake or bug
    • you would need to open a ticket per site and see how far
      they are willing to go in these matters
  • Shawn:
    • most of the information is optional
    • the more info is provided, the more we can help with network issues
    • sensitive information will be protected
    • mandatory information is not sensitive
    • we can make adjustments depending on the feedback from sites

  • Thomas:
    • DESY-ZN is connected to the grid via DESY-HH:
      can such an arrangement be handled?
  • Shawn:
    • yes: DESY-ZN would just report the traffic numbers they see,
      whereas DESY-HH reports the totals for both sites together
    • AGLT2 is a similar case: each of its 2 constituent sites reports
      external traffic totals that include the inter-site traffic

  • Edoardo:
    • do you have plans to show the in/out counters somewhere?
  • Shawn:
    • the plan is to get them into MONIT
    • the displays would somewhat resemble the ESnet dashboard,
      as shown on page 15 of the attached presentation

  • Edoardo:
    • will there be links to perfSONAR instances in CRIC?
  • Shawn:
    • the CRIC devs are augmenting the schema to allow perfSONAR
      instances to be registered
    • we would like CRIC to become the source of truth for perfSONAR
  • Julia:
    • it is WIP at the prototype stage

  • Julia:
    • Victoria provided feedback on this meeting's Twiki page:
      what do people think of their suggestions?
  • Shawn:
    • a central repository at CERN has been considered,
      but the difficulty is in the authN/authZ of site admins
    • we may look further into that approach later

Middleware News

  • Useful Links
  • Baselines/News
    • A CE token support campaign was launched yesterday
      • All EGI sites supporting at least one of ALICE, ATLAS or LHCb were ticketed
        • CMS Operations had already launched such a campaign for all CMS sites
      • The sites were asked:
        • to upgrade their CEs to versions compatible with the use of tokens,
        • to configure the tokens of ATLAS and/or CMS as needed and,
        • for ARC CEs, enable the REST interface
      • Each experiment needs to check itself if the CEs at their sites actually work
        • The tickets can be used to convey observations or requests
      • The WLCGBaselineTable has been updated accordingly

Discussion

  • Julia:
    • what about ATLAS sites on OSG?
  • Maarten:
    • OSG is taking care of them
  • Shawn:
    • I can confirm that

Tier 0 News

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • Mostly business as usual
  • CVMFS problem affected 2 sites in Japan and Thailand
    • During a few weeks they often got stuck at the same obsolete revision
    • The Stratum-1 service at ASGC had run out of disk space (fixed)
  • Only a few NorduGrid sites still run the AliEn legacy MW
    • To be switched to the JAliEn stack in the next weeks

ATLAS

  • Smooth running with 500-700k cores including 200k from HPC
  • Ran a lot of multicore event generation in preparation for Run 3 simulation which should start soon
  • Some issues with ADC Oracle databases - incidents on 4 May and 27 May and read-only replica out of sync for 10 days
  • Deletion campaigns on disk and tape have freed ~30PB from each
  • Next week we plan to remove SRM and GridFTP disk endpoints for TPC from CRIC and Rucio (SRM will be kept for tape and may need to be kept for space reporting for some sites)
    • Only one US T3 site still relies on GridFTP and will be cut off

CMS

  • Overall rather smooth operations
  • Good CPU usage ~370k cores (usual split 75% production, 25% user) - Significant contribution from HPC 20-50k cores
  • WebDav transition: all Tier1 and Tier2 ok. Most Tier-3 sites are ready/in production; we expect three more sites to be ready very soon
  • Still waiting on a Python3 version of HammerCloud
  • First round of integrating 2022 pledges completed

LHCb

Site input for Network monitoring discussion

CA-VICTORIA-WESTGRID-T2

Suggest using git, twiki, or some other tool suitable for documentation, to store, serve and manage the site network information templates in one place, instead of standalone webservers/webpages for each site.

Task Forces and Working Groups

GDPR and WLCG services

  • Julia:
    • the latest version of the WLCG Privacy Notice can be customized
    • it is available from the WLCG document repository: WLCGPrivacyNotice2022.pdf
    • sites will be asked to publish either a locally defined privacy notice
      or the WLCG Privacy Notice, possibly customized as needed

Accounting TF

  • NTR

dCache upgrade TF

Progress is tracked via twiki page

Information System Evolution TF

  • NTR

IPv6 Validation and Deployment TF

Detailed status here.

Monitoring

  • the May GDB had 2 talks featuring monitoring:
    • WLCG Monitoring Task Force update
    • Tape Challenge Results (conclusions and future directions)

Network Throughput WG


WG for Transition to Tokens and Globus Retirement

  • A CE token support campaign was launched, as reported under the Middleware News

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

Edit | Attach | Watch | Print version | History: r10 < r9 < r8 < r7 < r6 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r10 - 2022-06-03 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback