Week of 230515

WLCG Operations Call details

  • The connection details for remote participation are provided on this agenda page.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to the wlcg-scod list (at cern.ch) to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Julia(WLCG), Maarten(WLCG + Alice), Panos (WLCG + CMS + Chair)
  • remote: Andrew W. (TRIUMF), David (FNAL), Douglas (BNL), Federico (LHCb), Jens (NDGF), Jose (PIC), Maria (Computing Services), Marian (Networks), Onno (NL-T1), Vincenzo (CNAF), Xavier (KIT)

Experiments round table:

  • ALICE
    • NTR

Sites / Services round table:

  • ASGC:
  • BNL: BNL will require two network interventions in order to upgrade the BNL site WAN perimeter network system to 400GbE. 1) June 14, 2023 21:30 UTC - June 15,2023 01:30 UTC - Degraded WAN. Several 30-120 sec disruptions of WAN connectivity for BNL Site (including LHCOPN and LHCONE) may occur as routes reconverge. 2) June 22, 2023 21:30 UTC - June 23,2023 01:30 UTC Degraded WAN. Same kind of disruptions as previous outage
  • CNAF: NTR
  • EGI:
  • FNAL:
  • IN2P3: Site will be in scheduled downtime on June 16th for quarterly maintenance. Details will be available the week before.
  • JINR:
  • KISTI:
  • KIT:
    • Downtime last week for LHCb went well. As far as we know, there were no major issues. Even the RHEL-8 crypto-policy work-around was not needed.
    • Tomorrow there will be another downtime for the ATLAS dCache SE, were we want to carry out the same adaptations as for LHCb.
    • In a second downtime tomorrow, our network experts want to upgrade the NXOSes of many switches.
  • NDGF:
  • NL-T1: NTR
  • NRC-KI:
  • OSG:
  • PIC: PIC will be in OUTAGE from 23-05-2023 08:00 (PIC local time) until 23-05-2023 15:00 (PIC local time) for a dCache upgrade.
  • RAL: NTR
  • TRIUMF: Scheduled Outage Tues May 16 UTC 17:30 - 22:30. dcache update, Network maintenance. GOCDB downtime created https://goc.egi.eu/portal/index.php?Page_Type=Downtime&id=33972 .
  • CERN computing services: NTR
  • CERN storage services:
  • CERN databases:
  • GGUS: NTR
  • Monitoring:
    • Distributed final SiteMon availability/reliability reports for April 2023
      • Marian: On Saturday ETF tests stopped working, most probably due to the ipv6 outage OTG:0077391.
      • Maarten: How will we find out what happened?
      • Marian: An OTG will be opened and Marian will also send a mail.
      • Maarten: scitoken configuration might have an issue since this problem should have been avoided. HTCondor-CE should have used a cached version of the key. Maarten will investigate.
      • Panos: (offline update after the meeting). Newer versions of HTCondor-CE are configured to cache and reuse the previously fetched public keys. Sites that failed probably had an older version running.
        • 9.0.x still needed for GSI support
        • P.V.: token issuer public key caching is implemented in scitoken-cpp library used by HTCondor to process tokens => correct caching behavior is independent on HTCondor(-CE) release
        • P.V.: we should really move to new curl version (RHEL8+) which implements happy eyeballs RFC:6555 which could limit impact of broken IPv6
  • Middleware: NTR
  • Networks:
    • OTG:0077391 - Incoming IPv6 Internet connections not working (outgoing worked fine)
  • Security:
    • ALICE IAM database upgrade from mysql 5 to 8, May 16th at 14:00 OTG:0077319
      • Marian: What network does IAM use? Is it LHCONE? Will investigate.
        • added after the meeting: it is the GPN and that cannot easily be changed
AOB:
Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r21 - 2023-05-16 - PetrVokacSecondary
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback