Week of 210308

WLCG Operations Call details

  • For remote participation we use Zoom: https://cern.zoom.us/j/99591482961
    • The pass code is provided on the wlcg-operations list.
    • You can contact the wlcg-ops-coord-chairpeople list (at cern.ch) if you do not manage to subscribe.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • remote: Kate (chair, DB), Julia (WLCG), Maarten (ALICE, WLCG), Borja (monitoring), Darren (NDGF), Pavlo (ATLAS), Ivan (ATLAS), Vladimir (LHCb), Alberto (monitoring), Xavier (KIT), Vincenzo (CNAF), David B (IN2P3), Daniel (security), Marian (monitoring), Andrew (TRIUMF), Dave M (FNAL), Pepe (PIC), Gavin (computing)

Experiments round table:

  • ALICE
    • NTR

  • LHCb reports ( raw view)
    • smooth running, no major issues
    • migration to CTA ongoing, according to schedule. CTA expected to be in production next Monday.

Sites / Services round table:

  • ASGC:
  • BNL: NTR
  • CNAF: NTR
  • EGI: nc
  • FNAL: NTR
  • IN2P3: tape system at site in downtime on Tuesday 16th -> SRM nearline service of ccsrm.in2p3.fr (including alias cclhcbtape.in2p3.fr) will be off for the whole day. There will be probably some perturbation on XRootD redirector for ALICE (ccxrdralice.in2p3.fr)
  • JINR: NTR
  • KISTI: nc
  • KIT:
    • On Monday we had updated our Condor-CEs unintentionally to a development version of HTCondor. That was fixed on Tuesday.
    • Some CRLs expired over the weekend on htcondor-ce-1-kit, which is fixed by now.
  • NDGF: NTR
  • NL-T1: Following up on the certificate issue reported last week. It turns out that 1024 bit proxies were refused by Centos 8. We tried to enable those by changing java.security, but that didn't work. It turns out Centos 8 has a new feature, crypto-policies, that silently overrules (some) settings in the java.security. A workaround to accept 1024 bit proxies by configuring crypto-policies is described in GGUS:150822. Then there is the fact that FTS creates derived proxies with a 1024 bit key length. This is reported in the same GGUS ticket, and the FTS developers will look into it. Until the FTS is adapted, sites considering using RHEL 8 based systems should be aware of crypto-policies.
Maarten remarked that Centos 8 is not yet certified for production use: some issues may be expected and sites should be careful
  • NRC-KI: nc
  • OSG: nc
  • PIC: On Tuesday March 30th (Tuesday in Holy Week), PIC will be in complete scheduled downtime due to the building's yearly electrical maintenance. Correspondingly, PIC will be in OUTAGE from 05:00 to 24:00 [CERN and PIC local time].
  • RAL: NTR
  • TRIUMF: NTR

  • CERN computing services: NTR
  • CERN storage services:
  • CERN databases: NTR
  • GGUS: NTR
  • Monitoring:
    • Testing new CRIC functionality towards the SiteMon migration
    • Draft reports for February delayed due to ARC-CE issues reported
      • We have already contacted the experiment representatives that should issue the corrections
    • ETF - CREAM-CE testing retirement postponed as it depends on SiteMon migrating to CRIC first
  • Middleware:
    • ARC CE plans:
      • As of March 9, update HTCondor-G for SAM ETF and in CMS pilot factories
        • ATLAS pilot factories were already updated with a pre-release
      • When things look OK there, sites will be asked to update to 6.10.1
        • And undo any privacy-related LDAP patches
  • Networks: NTR
  • Security: A security incident affected the system providing the web frontend for repositories.egi.eu but there is no evidence that any packages were compromised. The repository frontend has been rebuilt from scratch on a new system.

AOB: EGI repository is not reachable by IPv6. A ticket will be opened.

Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r22 - 2021-03-15 - NikolayVoytishin
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback