Week of 231120

WLCG Operations Call details

  • The connection details for remote participation are provided on this agenda page.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to the wlcg-scod list (at cern.ch) to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Julia (WLCG), Maarten (ALICE + WLCG), Steve M (storage)
  • remote: Andrew (TRIUMF), Brian (RAL), Christoph (CMS), David B (IN2P3-CC), David M (FNAL), Doug (BNL), Jens (NDGF), Onno (NLT1), Pepe (PIC), Steve T (computing), Xavier (KIT)

Experiments round table:

  • ALICE
    • Normal activity on average
    • KISTI unusable since Fri evening CET due to an OPN problem being looked into
      • Appears to have got fixed during the meeting!

Sites / Services round table:

  • ASGC:
  • BNL: NTR
  • CNAF:
  • EGI:
  • FNAL:
    • David:
      • starting as of the first week in the new year,
        we intend to upgrade our production dCache to 9.2
    • Julia:
      • might you upgrade a dev or test instance still this year?
      • to allow testing the new XRootD monitoring workflow already
    • David:
      • will follow up on that
    • Doug:
      • would it help if BNL upgraded with a similar timeline?
    • Julia:
      • FNAL are the highest priority because of the pile-up data
        being served to CMS jobs that may run anywhere on WLCG
      • in general, any such upgrade would be welcome, though
    • Doug:
      • might a message about this be sent to the WLCG ops list?
    • Julia:
      • we will need to discuss with DOMA colleagues what we
        would like to ask for between now and DC24
      • we do not want to rock the boat too much
  • IN2P3: dCache will be updated to 9.2 on next quarterly maintenance Tuesday December 12th.
  • JINR: NTR
  • KISTI:
  • KIT:
    • Xavier:
      • just a reminder of the site downtime planned for Dec 6-7
  • NDGF:
    • Jens:
      • the dCache upgrade to 9.2 had to be postponed
      • we now expect to do that early Dec
  • NL-T1: NTR
  • NRC-KI:
  • OSG:
  • PIC:
    • We were involved in the problems regarding SSL, GSI and HTCondor-CE 9.0.19 (GGUS:164007).
      Today we have updated to 9.0.20 and everything seems ok.
    • Problems with LHCb pilots. HTCondor/Alma9 WNs overestimate the memory used and the jobs are
      put on hold due to a PIC rule. We will not update the farm to Alma9 until this is clear.
    • LHCb is also doing a massive recall from tape. Staging issue performance.
      Reducing the number of active requests to 7k (GGUS:164197).
  • RAL: NTR
  • TRIUMF: NTR

  • CERN computing services: NTR
  • CERN storage services:
    • Steve M:
      • Petr Vokac helped the ATLAS FTS getting registered in the ATLAS IAM
    • Maarten:
      • that is good news as part of the preparations for DC24
  • CERN databases:
  • GGUS: NTR
  • Monitoring:
  • Middleware:
    • The HTCondor CE campaign on EGI is continuing with condor v9.0.20
      • Many thanks to Jaime Frey of the HTCondor team for implementing
        a very non-trivial fix for the issue encountered earlier!
      • Clients still presenting VOMS proxies can be configured to have
        the SSL method tried first, with fallback to GSI still working
      • When all clients of a given CE are mapped through tokens or SSL,
        the CE can be upgraded to HTCondor v23 (which does not support GSI)
        • Example configurations will be published in the coming months
        • Another campaign is foreseen in the early months of next year
    • Doug:
      • Would that fix have to be ported to HTCondor v23?
    • Maarten:
      • The fix had to be done specifically for the 9.0.20 release that
        is seen as a stepping stone toward HTCondor v23.
      • The problem arose due to the fact that both SSL and GSI use
        the same (VOMS) credential for authentication, which is
        not an issue for later releases that do not support GSI.
  • Networks:
  • Security:

AOB:

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2023-11-20 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback