Week of 190415

WLCG Operations Call details

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 Geneva time until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Portal
  • Whenever a particular topic needs to be discussed at the operations meeting requiring information from sites or experiments, it is highly recommended to announce it by email to wlcg-scod@cernSPAMNOTNOSPAMPLEASE.ch to allow the SCOD to make sure that the relevant parties have the time to collect the required information, or invite the right people at the meeting.

Best practices for scheduled downtimes

Monday

Attendance:

  • local: Ivan (ATLAS), Borja (Monitoring), Miroslav (Chair), Maarten (ALICE), Gavin (Compute), Vincent (Security), Andrei (DB), Enrico (ST)
  • remote: Andrew (NIKHEF), Marcelo (INFN), Sang Un (KISTI), Raja (LHCb), Dave (FNAL), Di (TRIUMF), Darren (RAL), Jens (NDGF), David (IN2P3)

Experiments round table:

  • ALICE
    • NTR

  • LHCb reports ( raw view) -
    • Activity
      • User jobs, MC productions, staging and some reprocessing this week.
    • Issues
      • RAL:
        • Continuing migration from Castor to ECHO
        • A disk server (gdss811) is down - causing various hold-ups and slow-downs of the different productions and the migration
      • PIC : Machine ran out of disk space (GGUS:140715) fixed now - thanks!
      • IN2P3 : Batch system issues (GGUS:140652) possibly ongoing

Sites / Services round table:

  • ASGC: NC
  • BNL: NTR
  • CNAF: NTR
  • EGI: NC
  • FNAL: NTR
  • IN2P3: several batch system issues last week due to different incidents on NFS storage used by the batch system. Instabilities on resource sharing impacting LHCb are still under investigations and a workaround has been set up to get a more stable situation. Apologies for all these instabilities.
  • JINR: NTR
  • KISTI: Planned downtime for storage layer upgrade today. All OK afterwards
  • KIT: NC
  • NDGF: NTR
  • NL-T1: A router firmware upgrade was done at Nikhef on Saturday 13th April. This was relatively trouble free with the exception of one storage node were the dpm-gridftp service failed and had to be restarted.
  • NRC-KI: NC
  • OSG: NC
  • PIC: NC
  • RAL: NTR
  • TRIUMF: NTR

  • CERN computing services: NTR
  • CERN storage services:
    • EOSATLAS crash and software update: OTG0049876
    • EOSCMS software update: OTG0049776
    • Certificate for s3.cern.ch is not trusted by IGTF, i.e., works for web browsers but not for grid sites. Still investigating on solutions
  • CERN databases: NTR
  • GGUS: NTR
  • Monitoring: NTR
  • MW Officer: NTR
  • Networks: NTR
  • Security: Several Jenkins & Confluence server compromise being reported globally (not within sites). Please make sure they are up to date and safe.

AOB:

  • NOTE: the operations meeting next Mon will be virtual .
  • You may provide relevant incidents, announcements etc. for the operations record.
  • Have a good Easter break !
Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r17 - 2019-04-16 - EnricoBocchi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback