Week of 221219

WLCG Operations Call details

  • The connection details for remote participation are provided on this agenda page.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to the wlcg-scod list (at cern.ch) to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • remote: Kate (chair, DB), Julia (WLCG), Maarten (ALICE, WLCG), Xavier (KIT), Michal (ATLAS), Dave (FNAL), Onno (NL-T1), Henryk (LHCb), Andrew (TRIUMF), Doug (BNL), Pepe (PIC), Vincenzo (CNAF)

Experiments round table:

  • ATLAS reports (raw view) -
    • Issues:
      • 166 corrupted files at IN2P3-CC tapes
        • declared by DDM ops
      • "All pools are full" transfer failures to BNL (GGUS:159898)
        • The number of dCache movers were increased
      • Jobs at BNL failed with "Failed to execute payload" (GGUS:159917)
        • WNs ran out of file descriptors (some demanding jobs were set to run score instead of multicore)
      • Transfers to CERN-PROD_DATADISK fail with "Server Error" (GGUS:159914)

  • CMS reports (raw view) -
    • Opened a number of GGUS tickets to sites asking for adjustment of data access permissions
      • Several sites allowd unauthenticated read access to CMS data
      • Some adjustments needed for a new version of a CMS SAM test
      • Thanks to all sites that responded already
    • Best wishes for a nice winter/Xmas holiday period and a good start in 2023

  • ALICE
    • Mostly business as usual
    • Thanks to all sites and experts!
    • Best wishes for 2023 !

Xavier explained that GridKa downtime was partial. Both downtimes had to be extended due to issues. Sites were encouraged to use best practices linked in this page.

Sites / Services round table:

  • ASGC:
  • BNL: Tape downtime continuing till tomorrow - dCache downtime cancelled till a later date.
  • CNAF: NTR
  • EGI:
  • FNAL: NTR
  • IN2P3:
  • JINR: NTR
  • KISTI:
  • KIT: GridKa tape downtime had to be extended due to DB issues requiring external support.
  • NDGF:
  • NL-T1:
    • Last week's Surf(Sara) downtime took longer than expected; both the tape library replacement and the dCache upgrade from 7.2 to 8.2. With dCache, the problem was an overloaded core domain. The dCache core domain is the communications hub between all dCache components. Once we moved the core domain from a 48 core machine to a 128 core machine, dCache finally became stable. We're afraid this scalability problem may bite us again in the future so we have a support ticket open with the dCache developers.
The issues arise at the startup of components that needs to be done gradually and while verifying the load on core domain. Problem started with the upgrade of DCache to 6.1 and the Java version upgrade. To be verified with DCache developers.

  • NRC-KI:
  • OSG:
  • PIC: Downtime Dec 27th - Dec 30th due to cooling maintenance in PIC DC
  • RAL: NTR
  • TRIUMF: NTR

  • CERN computing services:
  • CERN storage services:
  • CERN databases: NTR
  • GGUS: NTR
  • Monitoring:
  • Middleware: NTR
  • Networks: NTR
  • Security:

AOB:

Season's Greetings!

  • THANKS for your help in making 2022 a successful year for WLCG !
    • Best wishes for the New Year !

  • Next meeting: Mon Jan 9
Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2022-12-19 - NikolayVoytishin
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback