Week of 240122

WLCG Operations Call details

  • The connection details for remote participation are provided on this agenda page.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to the wlcg-scod list (at cern.ch) to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Borja (Monitoring), Guillermo (Monitoring), Julia (WLCG), Laurence (Computing Services), Maarten (ALICE + WLCG), Panos (CMS + WLCG + Chair)
  • remote: Andrew (TRIUMF), Carmen (CNAF), Christoph (CMS), Daren (RAL), David M (FNAL), Federico (LHCb), Henryk (NCBJ), Ivan (BNL), Onno (NL-T1), Peter (ATLAS + Lancaster), Xavier (KIT)

Experiments round table:

  • ATLAS reports ( raw view) -
    • Token support has been enabled for the follow ATLAS storage
      • AGLT2_DATADISK
      • BNL-OSG2_DATADISK
      • CERN-PROD_DATADISK
      • DESY-ZN_DATADISK
      • IN2P3-CC_DATADISK
      • INFN-T1_DATADISK
      • PIC_DATADISK
      • PRAGUELCG2_DATADISK
      • SARA-MATRIX_DATADISK
      • RAL-LCG2-ECHO_DATADISK
      • TOKYO-LCG2_DATADISK
      • UAM-LCG2_DATADISK
      • UKI-NORTHGRID-LANCS-HEP-CEPH_DATADISK
      • UKI-SOUTHGRID-RALPP_DATADISK
      • RAL-LCG2-ECHO_SCRATCHDISK
      • SARA-MATRIX_SCRATCHDISK
      • TOKYO-LCG2_SCRATCHDISK
      • UAM-LCG2_SCRATCHDISK
        • Christoph: Are tokens actually used in production?
        • Ivan/Peter: Yes, a few DATADISK RSEs are using tokens for production transfers, others will follow.

  • CMS reports ( raw view) -
    • No major operational issues
    • Another DC pre-challenge scheduled for this week: Common US-ATLAS and US-CMS transfer test on 24th Jan

  • ALICE
    • NTR

Sites / Services round table:

  • BNL:
    • Downtime on 22.01.2024 (today) for dCache upgrade (15:00 to 19:00 CEST)
      • To 9.2.6 release. Preliminary tests with most recent release 9.2.9 show unusual latency with webdav listings
    • Storage:
      • SCRATCHDISK transfers already done with tokens. DATADISK tokens already enabled, but will start being used after the downtime
    • pre-DC24 tests
      • Preliminary tests from CERN to BNL have maximized the current WAN capability to 100 Gb/s Link
        • The 400G TA circuit is currently down, with an Estimated Time of Repair of 2/16
      • Within USATLAS ~200Gbps simultaneously to two T2s (AGLT2 and MWT2)
  • CNAF:
    • On 17/01 we experienced some connectivity issues between CNAF and CINECA, the facility where ~1/2 of our worker-nodes are located. We had to drain those nodes to prevent jobs from failing, reducing our computing resources by ~60%. --> Today this issue has been addressed and we are enabling the worker nodes to accept new jobs.
    • Maintenance on LHCOPN link at INFN-T1 GOCDB:34762 today from 4PM to 7PM UTC
      • Maarten: This downtime is recorded in the preprod instance of GOCDB. The production instance should be used since no one is actually should be checking preprod. Maarten will also check with GOCDB developers to see if a banner can be added to the preprod site which currently looks exactly the same as the production instance and is confusing people.
  • EGI:
  • FNAL:
    • Julia: When will we able to enable xrootd monitoring in FNAL?
    • David: Both dCache instances have been upgraded to 9.2 so FNAL should be technically ready to enable that. David will follow up with the responsible team.
  • IN2P3:
  • JINR: NTR
  • KISTI:
  • KIT:
    • Tuesday network maintenance on a switch caused several machines to lose network connectivity (GOCDB:34962, GOCDB:34963). A reboot was required to restore full functionality.
    • Thursday our LHCOPN link to CERN went down and was restored on Friday. Seems like alternative routes kicked in transparently, as far as we can tell.
    • On Friday we started a reboot of the server farm because of critical security patches. GOCDB:34967
    • Thanks for the new GOCDB: macro!
  • NCBJ:
    • dCache has been updated to 9.2
    • IAM tokens support has been enabled for CMS and WLCG
      • Maarten: WLCG VO should only be used for development and it shouldn't be enabled in production sites since virtually everyone can get permissions in this instance. The DTEAM VO should be used for testing instead.
      • Christoph: This should be more clear when someone is reading documentation, it's currently mentioned that the WLCG VO can be used for testing.
      • Maarten will follow up and send a clarification to the wlcg ops mailing list.
      • Federica: Are DTEAM users beening synced from VOMS to IAM?
      • Maarten: Yes, they are
      • Xavier: DTEAM is not only for testing but it's also used by service providers for monitoring e.g. FTS is using it
      • Xavier: How can people get the necessary scopes for their clients.
      • Maarten: For the time being one can email Maarten and he will assign the scopes but in the future all DTEAM members should assign all scops.
  • NDGF:
  • NL-T1: dCache upgrade on Wednesday to patch a WebDAV directory listing bug. GOCDB:34976
  • NRC-KI:
  • OSG:
  • PIC:
  • RAL: Antares will me unavailable Tuesday morning (0900-1400UTC) to allow CTA and EOS5 upgrade work. GOC:34955
  • TRIUMF: NTR

  • CERN computing services:
  • CERN storage services:
  • CERN databases:
  • GGUS: NTR
  • Monitoring:
    • WLCG SiteMon final reports for December sent around
  • Middleware: NTR
  • Networks:
  • Security:

AOB:

Edit | Attach | Watch | Print version | History: r23 < r22 < r21 < r20 < r19 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r23 - 2024-01-23 - NikolayVoytishin
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback