Week of 211206

WLCG Operations Call details

  • For remote participation we use Zoom: https://cern.zoom.us/j/99591482961
    • The pass code is provided on the wlcg-operations list.
    • You can contact the wlcg-ops-coord-chairpeople list (at cern.ch) if you do not manage to subscribe.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • remote: Kate (chair, DB), Maarten (WLCG, ALICE), Darren (NDGF), Joao (storage), Andrew (NL-T1), Michal (ATLAS), Pablo (computing), Christoph (CMS), Andrew (TRIUMF), Henryk (LHCb), Borja (monitoring), Francesco (CNAF), Chien-De Li (ASGC), Xavier (KIT), DaveM (FNAL)

Experiments round table:

  • ATLAS reports (raw view) -
    • Issues:
      • "Error recalling file from tape" staging errors at INFN-T1 (GGUS:155200)
      • "Server Error" transfer failures to INFN-T1 (GGUS:155012)
        • the tape storage areas were not configured in StoRM WebDAV endpoints
      • "Authentication Error" transfer failures to SARA-MATRIX (GGUS:155119)
        • there seems to be dCache doors that can't issue SE-token when FTS/gfal use HTTPS TURL returned by SRM interface
      • job submission affected by CephFS partially down (OTG:0067893)
      • dCache SRR - implementation is fragile and breaks at many sites
Maarten assured that the Accounting and dCache Upgrade task forces are working on SRR improvements. Fixes for the remaining issues are expected early next year. Michal reported that even sites running the latest dCache version experience issues.

  • CMS reports (raw view) -
    • WebDAV test in SAM for CMS
      • Added to CMS_CRITICAL_FULL metric last week (used for CMS internal site evaluation)
      • Planned to be added to CMS_CRITICAL this week (used by WLCG for site availability)

  • ALICE
    • NTR

Sites / Services round table:

  • ASGC: NTR
  • BNL: Nothing to report
  • CNAF: NTR
  • EGI:
  • FNAL: NTR
  • IN2P3: Site in downtime tomorrow for quarterly maintenance.
  • JINR:
  • KISTI:
  • KIT: Migration to tape for ATLAS was unreliable since almost two weeks. That's why their tape buffer ran full on 29th of November. As of today that should steadily get better.
  • NDGF: NTR
  • NL-T1: Nikhef: we have on going periodic problems with 3rd party copies to and from our dcache system. Investigations are on going but this is proving a difficult issue to debug.
  • NRC-KI:
  • OSG:
  • PIC:
  • RAL: NTR
  • TRIUMF: NTR

  • CERN computing services:
    • Please note OTG:0067855 point8 power intervention 16/12, affecting large proportion of lxbatch.
  • CERN storage services:
    • Gfal2 v2.20.2 deployed last week on FTS3-Atlas. Rucio switched SRM+HTTP-TPC transfers on Wednesday and THursday. No problems identified. Gfal2 release ready to be deployed on other production instances
  • CERN databases: NTR
  • GGUS: NTR
  • Monitoring:
    • Distributed draft SiteMon availability/reliability reports for Nov 2021
  • Middleware: NTR
  • Networks: NTR
  • Security:

AOB:

Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2021-12-06 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback