Week of 211101

WLCG Operations Call details

  • For remote participation we use Zoom: https://cern.zoom.us/j/99591482961
    • The pass code is provided on the wlcg-operations list.
    • You can contact the wlcg-ops-coord-chairpeople list (at cern.ch) if you do not manage to subscribe.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • remote: Andrew (TRIUMF), Borja (Chair, Monitoring), Chien-De (ASGC), Christoph (CMS), David (FNAL), Doug (BNL), Jens (NDGF), Jose E (ATLAS), Julia (WLCG), Laurence (Computing), Maarten (ALICE + WLCG), Mihai (Storage), Onno (NL-T1), Pinja (Security), Steven (Storage)

Experiments round table:

  • ATLAS reports (raw view) -
    • Problems in some sites due to update in root CA Certificate (ANSPGrid). Procedure for sites when root CA certificate is updated?
      • Maarten:
        • This was discussed a few weeks ago and also got mentioned in the Operations Coordination meeting.
        • This is an exceptional incident, normally CA renewals are transparent. In this case the CA was extended in a peculiar way, which revealed a bug in CAnL, the Common Authentication Library used at least by dCache, StoRM and VOMS-Admin, where the old version of the CA was hiding the new one and the only way to fix it will be to restart the service, which is an expensive operation for big SE installations.
        • The latest version of the library has the issue fixed, but older versions are still in widespread use.
        • We don't want to proactively try and avoid such issues, as they happen rarely (a few times over a 10-year period) and their prevention is expensive. The price to pay is that affected VOs will have to open tickets for sites to restart affected services.

  • ALICE
    • NTR

Sites / Services round table:

  • ASGC: We will upgrade DPM storage instances to version 1.15 during downtime on Nov 7th
  • BNL: BNL FTS instance downtime 2-Nov-21 13:00 (UTC) - 16:00 (UTC), work will be transferred to CERN instances ahead of downtime and back after the downtime.
  • CNAF: NC
  • EGI: NC
  • FNAL: (Downtime to be updated)
  • IN2P3: NC
  • JINR: NTR
  • KISTI: NC
  • KIT: We announced a downtime for all storage elements on Nov 9th, where we need to reboot a storage switch. The downtime very likely will be finished much quicker, though we took the liberty to define a large buffer in case catastrophe strikes.
  • NDGF: NTR
  • NL-T1: We've scheduled maintenance for the Sara-matrix dCache for this Thursday: https://goc.egi.eu/portal/index.php?Page_Type=Downtime&id=31343.
  • NRC-KI: NC
  • OSG: NC
  • PIC: NC
  • RAL: NTR
  • TRIUMF: NTR

  • CERN computing services: NTR
  • CERN storage services:
    • FTS configuration for SRM+HTTP-TPC was not working:
      • Configuration rolled back on FTS3-Atlas. Problem investigated and hotfix deployed on FTS3-Pilot last week.
      • New Gfal2 release (v2.20.2) to be released after a Pilot test with Atlas SRM+HTTP-TPC.
  • CERN databases:
    • DEVDB11 and DEVDB18 databases stopped and will be removed. These development database will be kept for a short time in case someone wants to extract any data and then deleted. Please contact the CERN Oracle Database Service if needed.
  • GGUS:
    • Last week's release ran into a few issues that could all be fixed quickly.
  • Monitoring: Draft availability reports for October 2021 sent around
  • Middleware: NTR
  • Networks: NTR
  • Security: Advisory of a high risk vulnerability sent today.
    • Doug: Was this also announced to OSG sites?
    • Maarten: Already announced to the EGI sites, and communicated to the OSG security team, so they can evaluate it.

AOB:

Edit | Attach | Watch | Print version | History: r21 < r20 < r19 < r18 < r17 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r21 - 2021-11-03 - NikolayVoytishin
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback